Tag Archives: hospital

What is HAMR and How Does It Enable the High-Capacity Needs of the Future?

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hamr-hard-drives/

HAMR drive illustration

During Q4, Backblaze deployed 100 petabytes worth of Seagate hard drives to our data centers. The newly deployed Seagate 10 and 12 TB drives are doing well and will help us meet our near term storage needs, but we know we’re going to need more drives — with higher capacities. That’s why the success of new hard drive technologies like Heat-Assisted Magnetic Recording (HAMR) from Seagate are very relevant to us here at Backblaze and to the storage industry in general. In today’s guest post we are pleased to have Mark Re, CTO at Seagate, give us an insider’s look behind the hard drive curtain to tell us how Seagate engineers are developing the HAMR technology and making it market ready starting in late 2018.

What is HAMR and How Does It Enable the High-Capacity Needs of the Future?

Guest Blog Post by Mark Re, Seagate Senior Vice President and Chief Technology Officer

Earlier this year Seagate announced plans to make the first hard drives using Heat-Assisted Magnetic Recording, or HAMR, available by the end of 2018 in pilot volumes. Even as today’s market has embraced 10TB+ drives, the need for 20TB+ drives remains imperative in the relative near term. HAMR is the Seagate research team’s next major advance in hard drive technology.

HAMR is a technology that over time will enable a big increase in the amount of data that can be stored on a disk. A small laser is attached to a recording head, designed to heat a tiny spot on the disk where the data will be written. This allows a smaller bit cell to be written as either a 0 or a 1. The smaller bit cell size enables more bits to be crammed into a given surface area — increasing the areal density of data, and increasing drive capacity.

It sounds almost simple, but the science and engineering expertise required, the research, experimentation, lab development and product development to perfect this technology has been enormous. Below is an overview of the HAMR technology and you can dig into the details in our technical brief that provides a point-by-point rundown describing several key advances enabling the HAMR design.

As much time and resources as have been committed to developing HAMR, the need for its increased data density is indisputable. Demand for data storage keeps increasing. Businesses’ ability to manage and leverage more capacity is a competitive necessity, and IT spending on capacity continues to increase.

History of Increasing Storage Capacity

For the last 50 years areal density in the hard disk drive has been growing faster than Moore’s law, which is a very good thing. After all, customers from data centers and cloud service providers to creative professionals and game enthusiasts rarely go shopping looking for a hard drive just like the one they bought two years ago. The demands of increasing data on storage capacities inevitably increase, thus the technology constantly evolves.

According to the Advanced Storage Technology Consortium, HAMR will be the next significant storage technology innovation to increase the amount of storage in the area available to store data, also called the disk’s “areal density.” We believe this boost in areal density will help fuel hard drive product development and growth through the next decade.

Why do we Need to Develop Higher-Capacity Hard Drives? Can’t Current Technologies do the Job?

Why is HAMR’s increased data density so important?

Data has become critical to all aspects of human life, changing how we’re educated and entertained. It affects and informs the ways we experience each other and interact with businesses and the wider world. IDC research shows the datasphere — all the data generated by the world’s businesses and billions of consumer endpoints — will continue to double in size every two years. IDC forecasts that by 2025 the global datasphere will grow to 163 zettabytes (that is a trillion gigabytes). That’s ten times the 16.1 ZB of data generated in 2016. IDC cites five key trends intensifying the role of data in changing our world: embedded systems and the Internet of Things (IoT), instantly available mobile and real-time data, cognitive artificial intelligence (AI) systems, increased security data requirements, and critically, the evolution of data from playing a business background to playing a life-critical role.

Consumers use the cloud to manage everything from family photos and videos to data about their health and exercise routines. Real-time data created by connected devices — everything from Fitbit, Alexa and smart phones to home security systems, solar systems and autonomous cars — are fueling the emerging Data Age. On top of the obvious business and consumer data growth, our critical infrastructure like power grids, water systems, hospitals, road infrastructure and public transportation all demand and add to the growth of real-time data. Data is now a vital element in the smooth operation of all aspects of daily life.

All of this entails a significant infrastructure cost behind the scenes with the insatiable, global appetite for data storage. While a variety of storage technologies will continue to advance in data density (Seagate announced the first 60TB 3.5-inch SSD unit for example), high-capacity hard drives serve as the primary foundational core of our interconnected, cloud and IoT-based dependence on data.

HAMR Hard Drive Technology

Seagate has been working on heat assisted magnetic recording (HAMR) in one form or another since the late 1990s. During this time we’ve made many breakthroughs in making reliable near field transducers, special high capacity HAMR media, and figuring out a way to put a laser on each and every head that is no larger than a grain of salt.

The development of HAMR has required Seagate to consider and overcome a myriad of scientific and technical challenges including new kinds of magnetic media, nano-plasmonic device design and fabrication, laser integration, high-temperature head-disk interactions, and thermal regulation.

A typical hard drive inside any computer or server contains one or more rigid disks coated with a magnetically sensitive film consisting of tiny magnetic grains. Data is recorded when a magnetic write-head flies just above the spinning disk; the write head rapidly flips the magnetization of one magnetic region of grains so that its magnetic pole points up or down, to encode a 1 or a 0 in binary code.

Increasing the amount of data you can store on a disk requires cramming magnetic regions closer together, which means the grains need to be smaller so they won’t interfere with each other.

Heat Assisted Magnetic Recording (HAMR) is the next step to enable us to increase the density of grains — or bit density. Current projections are that HAMR can achieve 5 Tbpsi (Terabits per square inch) on conventional HAMR media, and in the future will be able to achieve 10 Tbpsi or higher with bit patterned media (in which discrete dots are predefined on the media in regular, efficient, very dense patterns). These technologies will enable hard drives with capacities higher than 100 TB before 2030.

The major problem with packing bits so closely together is that if you do that on conventional magnetic media, the bits (and the data they represent) become thermally unstable, and may flip. So, to make the grains maintain their stability — their ability to store bits over a long period of time — we need to develop a recording media that has higher coercivity. That means it’s magnetically more stable during storage, but it is more difficult to change the magnetic characteristics of the media when writing (harder to flip a grain from a 0 to a 1 or vice versa).

That’s why HAMR’s first key hardware advance required developing a new recording media that keeps bits stable — using high anisotropy (or “hard”) magnetic materials such as iron-platinum alloy (FePt), which resist magnetic change at normal temperatures. Over years of HAMR development, Seagate researchers have tested and proven out a variety of FePt granular media films, with varying alloy composition and chemical ordering.

In fact the new media is so “hard” that conventional recording heads won’t be able to flip the bits, or write new data, under normal temperatures. If you add heat to the tiny spot on which you want to write data, you can make the media’s coercive field lower than the magnetic field provided by the recording head — in other words, enable the write head to flip that bit.

So, a challenge with HAMR has been to replace conventional perpendicular magnetic recording (PMR), in which the write head operates at room temperature, with a write technology that heats the thin film recording medium on the disk platter to temperatures above 400 °C. The basic principle is to heat a tiny region of several magnetic grains for a very short time (~1 nanoseconds) to a temperature high enough to make the media’s coercive field lower than the write head’s magnetic field. Immediately after the heat pulse, the region quickly cools down and the bit’s magnetic orientation is frozen in place.

Applying this dynamic nano-heating is where HAMR’s famous “laser” comes in. A plasmonic near-field transducer (NFT) has been integrated into the recording head, to heat the media and enable magnetic change at a specific point. Plasmonic NFTs are used to focus and confine light energy to regions smaller than the wavelength of light. This enables us to heat an extremely small region, measured in nanometers, on the disk media to reduce its magnetic coercivity,

Moving HAMR Forward

HAMR write head

As always in advanced engineering, the devil — or many devils — is in the details. As noted earlier, our technical brief provides a point-by-point short illustrated summary of HAMR’s key changes.

Although hard work remains, we believe this technology is nearly ready for commercialization. Seagate has the best engineers in the world working towards a goal of a 20 Terabyte drive by 2019. We hope we’ve given you a glimpse into the amount of engineering that goes into a hard drive. Keeping up with the world’s insatiable appetite to create, capture, store, secure, manage, analyze, rapidly access and share data is a challenge we work on every day.

With thousands of HAMR drives already being made in our manufacturing facilities, our internal and external supply chain is solidly in place, and volume manufacturing tools are online. This year we began shipping initial units for customer tests, and production units will ship to key customers by the end of 2018. Prepare for breakthrough capacities.

The post What is HAMR and How Does It Enable the High-Capacity Needs of the Future? appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

AWS IoT Update – Better Value with New Pricing Model

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-iot-update-better-value-with-new-pricing-model/

Our customers are using AWS IoT to make their connected devices more intelligent. These devices collect & measure data in the field (below the ground, in the air, in the water, on factory floors and in hospital rooms) and use AWS IoT as their gateway to the AWS Cloud. Once connected to the cloud, customers can write device data to Amazon Simple Storage Service (S3) and Amazon DynamoDB, process data using Amazon Kinesis and AWS Lambda functions, initiate Amazon Simple Notification Service (SNS) push notifications, and much more.

New Pricing Model (20-40% Reduction)
Today we are making a change to the AWS IoT pricing model that will make it an even better value for you. Most customers will see a price reduction of 20-40%, with some receiving a significantly larger discount depending on their workload.

The original model was based on a charge for the number of messages that were sent to or from the service. This all-inclusive model was a good starting point, but also meant that some customers were effectively paying for parts of AWS IoT that they did not actually use. For example, some customers have devices that ping AWS IoT very frequently, with sparse rule sets that fire infrequently. Our new model is more fine-grained, with independent charges for each component (all prices are for devices that connect to the US East (Northern Virginia) Region):

Connectivity – Metered in 1 minute increments and based on the total time your devices are connected to AWS IoT. Priced at $0.08 per million minutes of connection (equivalent to $0.042 per device per year for 24/7 connectivity). Your devices can send keep-alive pings at 30 second to 20 minute intervals at no additional cost.

Messaging – Metered by the number of messages transmitted between your devices and AWS IoT. Pricing starts at $1 per million messages, with volume pricing falling as low as $0.70 per million. You may send and receive messages up to 128 kilobytes in size. Messages are metered in 5 kilobyte increments (up from 512 bytes previously). For example, an 8 kilobyte message is metered as two messages.

Rules Engine – Metered for each time a rule is triggered, and for the number of actions executed within a rule, with a minimum of one action per rule. Priced at $0.15 per million rules-triggered and $0.15 per million actions-executed. Rules that process a message in excess of 5 kilobytes are metered at the next multiple of the 5 kilobyte size. For example, a rule that processes an 8 kilobyte message is metered as two rules.

Device Shadow & Registry Updates – Metered on the number of operations to access or modify Device Shadow or Registry data, priced at $1.25 per million operations. Device Shadow and Registry operations are metered in 1 kilobyte increments of the Device Shadow or Registry record size. For example, an update to a 1.5 kilobyte Shadow record is metered as two operations.

The AWS Free Tier now offers a generous allocation of connection minutes, messages, triggered rules, rules actions, Shadow, and Registry usage, enough to operate a fleet of up to 50 devices. The new prices will take effect on January 1, 2018 with no effort on your part. At that time, the updated prices will be published on the AWS IoT Pricing page.

AWS IoT at re:Invent
We have an entire IoT track at this year’s AWS re:Invent. Here is a sampling:

We also have customer-led sessions from Philips, Panasonic, Enel, and Salesforce.

Jeff;

Hot Startups on AWS – October 2017

Post Syndicated from Tina Barr original https://aws.amazon.com/blogs/aws/hot-startups-on-aws-october-2017/

In 2015, the Centers for Medicare and Medicaid Services (CMS) reported that healthcare spending made up 17.8% of the U.S. GDP – that’s almost $3.2 trillion or $9,990 per person. By 2025, the CMS estimates this number will increase to nearly 20%. As cloud technology evolves in the healthcare and life science industries, we are seeing how companies of all sizes are using AWS to provide powerful and innovative solutions to customers across the globe. This month we are excited to feature the following startups:

  • ClearCare – helping home care agencies operate efficiently and grow their business.
  • DNAnexus – providing a cloud-based global network for sharing and managing genomic data.

ClearCare (San Francisco, CA)

ClearCare envisions a future where home care is the only choice for aging in place. Home care agencies play a critical role in the economy and their communities by significantly lowering the overall cost of care, reducing the number of hospital admissions, and bending the cost curve of aging. Patients receiving home care typically have multiple chronic conditions and functional limitations, driving over $190 billion in healthcare spending in the U.S. each year. To offset these costs, health insurance payers are developing in-home care management programs for patients. ClearCare’s goal is to help home care agencies leverage technology to improve costs, outcomes, and quality of life for the aging population. The company’s powerful software platform is specifically designed for use by non-medical, in-home care agencies to manage their businesses.

Founder and CEO Geoff Nudd created ClearCare because of his own grandmother’s need for care. Keeping family members and caregivers up to date on a loved one’s well being can be difficult, so Geoff created what is now ClearCare’s Family Room, which enables caregivers and agency staff to check schedules and receive real-time updates about what’s happening in the home. Since then, agencies have provided feedback on others areas of their businesses that could be streamlined. ClearCare has now built over 20 modules to help home care agencies optimize operations with services including a telephony service, billing and payroll, and more. ClearCare now serves over 4,000 home care agencies, representing 500,000 caregivers and 400,000 seniors.

Using AWS, ClearCare is able to spin up reliable infrastructure for proofs of concept and iterate on those systems to quickly get value to market. The company runs many AWS services including Amazon Elasticsearch Service, Amazon RDS, and Amazon CloudFront. Amazon EMR and Amazon Athena have enabled ClearCare to build a Hadoop-based ETL and data warehousing system that processes terabytes of data each day. By utilizing these managed services, ClearCare has been able to go from concept to customer delivery in less than three months.

To learn more about ClearCare, check out their website.

DNAnexus (Mountain View, CA)

DNAnexus is accelerating the application of genomic data in precision medicine by providing a cloud-based platform for sharing and managing genomic and biomedical data and analysis tools. The company was founded in 2009 by Stanford graduate student Andreas Sundquist and two Stanford professors Arend Sidow and Serafim Batzoglou, to address the need for scaling secondary analysis of next-generation sequencing (NGS) data in the cloud. The founders quickly learned that users needed a flexible solution to build complex analysis workflows and tools that enable them to share and manage large volumes of data. DNAnexus is optimized to address the challenges of security, scalability, and collaboration for organizations that are pursuing genomic-based approaches to health, both in clinics and research labs. DNAnexus has a global customer base – spanning North America, Europe, Asia-Pacific, South America, and Africa – that runs a million jobs each month and is doubling their storage year-over-year. The company currently stores more than 10 petabytes of biomedical and genomic data. That is equivalent to approximately 100,000 genomes, or in simpler terms, over 50 billion Facebook photos!

DNAnexus is working with its customers to help expand their translational informatics research, which includes expanding into clinical trial genomic services. This will help companies developing different medicines to better stratify clinical trial populations and develop companion tests that enable the right patient to get the right medicine. In collaboration with Janssen Human Microbiome Institute, DNAnexus is also launching Mosaic – a community platform for microbiome research.

AWS provides DNAnexus and its customers the flexibility to grow and scale research programs. Building the technology infrastructure required to manage these projects in-house is expensive and time-consuming. DNAnexus removes that barrier for labs of any size by using AWS scalable cloud resources. The company deploys its customers’ genomic pipelines on Amazon EC2, using Amazon S3 for high-performance, high-durability storage, and Amazon Glacier for low-cost data archiving. DNAnexus is also an AWS Life Sciences Competency Partner.

Learn more about DNAnexus here.

-Tina

AWS Hot Startups – August 2017

Post Syndicated from Tina Barr original https://aws.amazon.com/blogs/aws/aws-hot-startups-august-2017/

There’s no doubt about it – Artificial Intelligence is changing the world and how it operates. Across industries, organizations from startups to Fortune 500s are embracing AI to develop new products, services, and opportunities that are more efficient and accessible for their consumers. From driverless cars to better preventative healthcare to smart home devices, AI is driving innovation at a fast rate and will continue to play a more important role in our everyday lives.

This month we’d like to highlight startups using AI solutions to help companies grow. We are pleased to feature:

  • SignalBox – a simple and accessible deep learning platform to help businesses get started with AI.
  • Valossa – an AI video recognition platform for the media and entertainment industry.
  • Kaliber – innovative applications for businesses using facial recognition, deep learning, and big data.

SignalBox (UK)

In 2016, SignalBox founder Alain Richardt was hearing the same comments being made by developers, data scientists, and business leaders. They wanted to get into deep learning but didn’t know where to start. Alain saw an opportunity to commodify and apply deep learning by providing a platform that does the heavy lifting with an easy-to-use web interface, blueprints for common tasks, and just a single-click to productize the models. With SignalBox, companies can start building deep learning models with no coding at all – they just select a data set, choose a network architecture, and go. SignalBox also offers step-by-step tutorials, tips and tricks from industry experts, and consulting services for customers that want an end-to-end AI solution.

SignalBox offers a variety of solutions that are being used across many industries for energy modeling, fraud detection, customer segmentation, insurance risk modeling, inventory prediction, real estate prediction, and more. Existing data science teams are using SignalBox to accelerate their innovation cycle. One innovative UK startup, Energi Mine, recently worked with SignalBox to develop deep networks that predict anomalous energy consumption patterns and do time series predictions on energy usage for businesses with hundreds of sites.

SignalBox uses a variety of AWS services including Amazon EC2, Amazon VPC, Amazon Elastic Block Store, and Amazon S3. The ability to rapidly provision EC2 GPU instances has been a critical factor in their success – both in terms of keeping their operational expenses low, as well as speed to market. The Amazon API Gateway has allowed for operational automation, giving SignalBox the ability to control its infrastructure.

To learn more about SignalBox, visit here.

Valossa (Finland)

As students at the University of Oulu in Finland, the Valossa founders spent years doing research in the computer science and AI labs. During that time, the team witnessed how the world was moving beyond text, with video playing a greater role in day-to-day communication. This spawned an idea to use technology to automatically understand what an audience is viewing and share that information with a global network of content producers. Since 2015, Valossa has been building next generation AI applications to benefit the media and entertainment industry and is moving beyond the capabilities of traditional visual recognition systems.

Valossa’s AI is capable of analyzing any video stream. The AI studies a vast array of data within videos and converts that information into descriptive tags, categories, and overviews automatically. Basically, it sees, hears, and understands videos like a human does. The Valossa AI can detect people, visual and auditory concepts, key speech elements, and labels explicit content to make moderating and filtering content simpler. Valossa’s solutions are designed to provide value for the content production workflow, from media asset management to end-user applications for content discovery. AI-annotated content allows online viewers to jump directly to their favorite scenes or search specific topics and actors within a video.

Valossa leverages AWS to deliver the industry’s first complete AI video recognition platform. Using Amazon EC2 GPU instances, Valossa can easily scale their computation capacity based on customer activity. High-volume video processing with GPU instances provides the necessary speed for time-sensitive workflows. The geo-located Availability Zones in EC2 allow Valossa to bring resources close to their customers to minimize network delays. Valossa also uses Amazon S3 for video ingestion and to provide end-user video analytics, which makes managing and accessing media data easy and highly scalable.

To see how Valossa works, check out www.WhatIsMyMovie.com or enable the Alexa Skill, Valossa Movie Finder. To try the Valossa AI, sign up for free at www.valossa.com.

Kaliber (San Francisco, CA)

Serial entrepreneurs Ray Rahman and Risto Haukioja founded Kaliber in 2016. The pair had previously worked in startups building smart cities and online privacy tools, and teamed up to bring AI to the workplace and change the hospitality industry. Our world is designed to appeal to our senses – stores and warehouses have clearly marked aisles, products are colorfully packaged, and we use these designs to differentiate one thing from another. We tell each other apart by our faces, and previously that was something only humans could measure or act upon. Kaliber is using facial recognition, deep learning, and big data to create solutions for business use. Markets and companies that aren’t typically associated with cutting-edge technology will be able to use their existing camera infrastructure in a whole new way, making them more efficient and better able to serve their customers.

Computer video processing is rapidly expanding, and Kaliber believes that video recognition will extend to far more than security cameras and robots. Using the clients’ network of in-house cameras, Kaliber’s platform extracts key data points and maps them to actionable insights using their machine learning (ML) algorithm. Dashboards connect users to the client’s BI tools via the Kaliber enterprise APIs, and managers can view these analytics to improve their real-world processes, taking immediate corrective action with real-time alerts. Kaliber’s Real Metrics are aimed at combining the power of image recognition with ML to ultimately provide a more meaningful experience for all.

Kaliber uses many AWS services, including Amazon Rekognition, Amazon Kinesis, AWS Lambda, Amazon EC2 GPU instances, and Amazon S3. These services have been instrumental in helping Kaliber meet the needs of enterprise customers in record time.

Learn more about Kaliber here.

Thanks for reading and we’ll see you next month!

-Tina

 

AWS HIPAA Eligibility Update (July 2017) – Eight Additional Services

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-hipaa-eligibility-update-july-2017-eight-additional-services/

It is time for an update on our on-going effort to make AWS a great host for healthcare and life sciences applications. As you can see from our Health Customer Stories page, Philips, VergeHealth, and Cambia (to choose a few) trust AWS with Protected Health Information (PHI) and Personally Identifying Information (PII) as part of their efforts to comply with HIPAA and HITECH.

In May we announced that we added Amazon API Gateway, AWS Direct Connect, AWS Database Migration Service, and Amazon Simple Queue Service (SQS) to our list of HIPAA eligible services and discussed our how customers and partners are putting them to use.

Eight More Eligible Services
Today I am happy to share the news that we are adding another eight services to the list:

Amazon CloudFront can now be utilized to enhance the delivery and transfer of Protected Health Information data to applications on the Internet. By providing a completely secure and encryptable pathway, CloudFront can now be used as a part of applications that need to cache PHI. This includes applications for viewing lab results or imaging data, and those that transfer PHI from Healthcare Information Exchanges (HIEs).

AWS WAF can now be used to protect applications running on AWS which operate on PHI such as patient care portals, patient scheduling systems, and HIEs. Requests and responses containing encrypted PHI and PII can now pass through AWS WAF.

AWS Shield can now be used to protect web applications such as patient care portals and scheduling systems that operate on encrypted PHI from DDoS attacks.

Amazon S3 Transfer Acceleration can now be used to accelerate the bulk transfer of large amounts of research, genetics, informatics, insurance, or payer/payment data containing PHI/PII information. Transfers can take place between a pair of AWS Regions or from an on-premises system and an AWS Region.

Amazon WorkSpaces can now be used by researchers, informaticists, hospital administrators and other users to analyze, visualize or process PHI/PII data using on-demand Windows virtual desktops.

AWS Directory Service can now be used to connect the authentication and authorization systems of organizations that use or process PHI/PII to their resources in the AWS Cloud. For example, healthcare providers operating hybrid cloud environments can now use AWS Directory Services to allow their users to easily transition between cloud and on-premises resources.

Amazon Simple Notification Service (SNS) can now be used to send notifications containing encrypted PHI/PII as part of patient care, payment processing, and mobile applications.

Amazon Cognito can now be used to authenticate users into mobile patient portal and payment processing applications that use PHI/PII identifiers for accounts.

Additional HIPAA Resources
Here are some additional resources that will help you to build applications that comply with HIPAA and HITECH:

Keep in Touch
In order to make use of any AWS service in any manner that involves PHI, you must first enter into an AWS Business Associate Addendum (BAA). You can contact us to start the process.

Jeff;

Analyze OpenFDA Data in R with Amazon S3 and Amazon Athena

Post Syndicated from Ryan Hood original https://aws.amazon.com/blogs/big-data/analyze-openfda-data-in-r-with-amazon-s3-and-amazon-athena/

One of the great benefits of Amazon S3 is the ability to host, share, or consume public data sets. This provides transparency into data to which an external data scientist or developer might not normally have access. By exposing the data to the public, you can glean many insights that would have been difficult with a data silo.

The openFDA project creates easy access to the high value, high priority, and public access data of the Food and Drug Administration (FDA). The data has been formatted and documented in consumer-friendly standards. Critical data related to drugs, devices, and food has been harmonized and can easily be called by application developers and researchers via API calls. OpenFDA has published two whitepapers that drill into the technical underpinnings of the API infrastructure as well as how to properly analyze the data in R. In addition, FDA makes openFDA data available on S3 in raw format.

In this post, I show how to use S3, Amazon EMR, and Amazon Athena to analyze the drug adverse events dataset. A drug adverse event is an undesirable experience associated with the use of a drug, including serious drug side effects, product use errors, product quality programs, and therapeutic failures.

Data considerations

Keep in mind that this data does have limitations. In addition, in the United States, these adverse events are submitted to the FDA voluntarily from consumers so there may not be reports for all events that occurred. There is no certainty that the reported event was actually due to the product. The FDA does not require that a causal relationship between a product and event be proven, and reports do not always contain the detail necessary to evaluate an event. Because of this, there is no way to identify the true number of events. The important takeaway to all this is that the information contained in this data has not been verified to produce cause and effect relationships. Despite this disclaimer, many interesting insights and value can be derived from the data to accelerate drug safety research.

Data analysis using SQL

For application developers who want to perform targeted searching and lookups, the API endpoints provided by the openFDA project are “ready to go” for software integration using a standard API powered by Elasticsearch, NodeJS, and Docker. However, for data analysis purposes, it is often easier to work with the data using SQL and statistical packages that expect a SQL table structure. For large-scale analysis, APIs often have query limits, such as 5000 records per query. This can cause extra work for data scientists who want to analyze the full dataset instead of small subsets of data.

To address the concern of requiring all the data in a single dataset, the openFDA project released the full 100 GB of harmonized data files that back the openFDA project onto S3. Athena is an interactive query service that makes it easy to analyze data in S3 using standard SQL. It’s a quick and easy way to answer your questions about adverse events and aspirin that does not require you to spin up databases or servers.

While you could point tools directly at the openFDA S3 files, you can find greatly improved performance and use of the data by following some of the preparation steps later in this post.

Architecture

This post explains how to use the following architecture to take the raw data provided by openFDA, leverage several AWS services, and derive meaning from the underlying data.

Steps:

  1. Load the openFDA /drug/event dataset into Spark and convert it to gzip to allow for streaming.
  2. Transform the data in Spark and save the results as a Parquet file in S3.
  3. Query the S3 Parquet file with Athena.
  4. Perform visualization and analysis of the data in R and Python on Amazon EC2.

Optimizing public data sets: A primer on data preparation

Those who want to jump right into preparing the files for Athena may want to skip ahead to the next section.

Transforming, or pre-processing, files is a common task for using many public data sets. Before you jump into the specific steps for transforming the openFDA data files into a format optimized for Athena, I thought it would be worthwhile to provide a quick exploration on the problem.

Making a dataset in S3 efficiently accessible with minimal transformation for the end user has two key elements:

  1. Partitioning the data into objects that contain a complete part of the data (such as data created within a specific month).
  2. Using file formats that make it easy for applications to locate subsets of data (for example, gzip, Parquet, ORC, etc.).

With these two key elements in mind, you can now apply transformations to the openFDA adverse event data to prepare it for Athena. You might find the data techniques employed in this post to be applicable to many of the questions you might want to ask of the public data sets stored in Amazon S3.

Before you get started, I encourage those who are interested in doing deeper healthcare analysis on AWS to make sure that you first read the AWS HIPAA Compliance whitepaper. This covers the information necessary for processing and storing patient health information (PHI).

Also, the adverse event analysis shown for aspirin is strictly for demonstration purposes and should not be used for any real decision or taken as anything other than a demonstration of AWS capabilities. However, there have been robust case studies published that have explored a causal relationship between aspirin and adverse reactions using OpenFDA data. If you are seeking research on aspirin or its risks, visit organizations such as the Centers for Disease Control and Prevention (CDC) or the Institute of Medicine (IOM).

Preparing data for Athena

For this walkthrough, you will start with the FDA adverse events dataset, which is stored as JSON files within zip archives on S3. You then convert it to Parquet for analysis. Why do you need to convert it? The original data download is stored in objects that are partitioned by quarter.

Here is a small sample of what you find in the adverse events (/drugs/event) section of the openFDA website.

If you were looking for events that happened in a specific quarter, this is not a bad solution. For most other scenarios, such as looking across the full history of aspirin events, it requires you to access a lot of data that you won’t need. The zip file format is not ideal for using data in place because zip readers must have random access to the file, which means the data can’t be streamed. Additionally, the zip files contain large JSON objects.

To read the data in these JSON files, a streaming JSON decoder must be used or a computer with a significant amount of RAM must decode the JSON. Opening up these files for public consumption is a great start. However, you still prepare the data with a few lines of Spark code so that the JSON can be streamed.

Step 1:  Convert the file types

Using Apache Spark on EMR, you can extract all of the zip files and pull out the events from the JSON files. To do this, use the Scala code below to deflate the zip file and create a text file. In addition, compress the JSON files with gzip to improve Spark’s performance and reduce your overall storage footprint. The Scala code can be run in either the Spark Shell or in an Apache Zeppelin notebook on your EMR cluster.

If you are unfamiliar with either Apache Zeppelin or the Spark Shell, the following posts serve as great references:

 

import scala.io.Source
import java.util.zip.ZipInputStream
import org.apache.spark.input.PortableDataStream
import org.apache.hadoop.io.compress.GzipCodec

// Input Directory
val inputFile = "s3://download.open.fda.gov/drug/event/2015q4/*.json.zip";

// Output Directory
val outputDir = "s3://{YOUR OUTPUT BUCKET HERE}/output/2015q4/";

// Extract zip files from 
val zipFiles = sc.binaryFiles(inputFile);

// Process zip file to extract the json as text file and save it
// in the output directory 
val rdd = zipFiles.flatMap((file: (String, PortableDataStream)) => {
    val zipStream = new ZipInputStream(file.2.open)
    val entry = zipStream.getNextEntry
    val iter = Source.fromInputStream(zipStream).getLines
    iter
}).map(.replaceAll("\s+","")).saveAsTextFile(outputDir, classOf[GzipCodec])

Step 2:  Transform JSON into Parquet

With just a few more lines of Scala code, you can use Spark’s abstractions to convert the JSON into a Spark DataFrame and then export the data back to S3 in Parquet format.

Spark requires the JSON to be in JSON Lines format to be parsed correctly into a DataFrame.

// Output Parquet directory
val outputDir = "s3://{YOUR OUTPUT BUCKET NAME}/output/drugevents"
// Input json file
val inputJson = "s3://{YOUR OUTPUT BUCKET NAME}/output/2015q4/*”
// Load dataframe from json file multiline 
val df = spark.read.json(sc.wholeTextFiles(inputJson).values)
// Extract results from dataframe
val results = df.select("results")
// Save it to Parquet
results.write.parquet(outputDir)

Step 3:  Create an Athena table

With the data cleanly prepared and stored in S3 using the Parquet format, you can now place an Athena table on top of it to get a better understanding of the underlying data.

Because the openFDA data structure incorporates several layers of nesting, it can be a complex process to try to manually derive the underlying schema in a Hive-compatible format. To shorten this process, you can load the top row of the DataFrame from the previous step into a Hive table within Zeppelin and then extract the “create  table” statement from SparkSQL.

results.createOrReplaceTempView("data")

val top1 = spark.sql("select * from data tablesample(1 rows)")

top1.write.format("parquet").mode("overwrite").saveAsTable("drugevents")

val show_cmd = spark.sql("show create table drugevents”).show(1, false)

This returns a “create table” statement that you can almost paste directly into the Athena console. Make some small modifications (adding the word “external” and replacing “using with “stored as”), and then execute the code in the Athena query editor. The table is created.

For the openFDA data, the DDL returns all string fields, as the date format used in your dataset does not conform to the yyy-mm-dd hh:mm:ss[.f…] format required by Hive. For your analysis, the string format works appropriately but it would be possible to extend this code to use a Presto function to convert the strings into time stamps.

CREATE EXTERNAL TABLE  drugevents (
   companynumb  string, 
   safetyreportid  string, 
   safetyreportversion  string, 
   receiptdate  string, 
   patientagegroup  string, 
   patientdeathdate  string, 
   patientsex  string, 
   patientweight  string, 
   serious  string, 
   seriousnesscongenitalanomali  string, 
   seriousnessdeath  string, 
   seriousnessdisabling  string, 
   seriousnesshospitalization  string, 
   seriousnesslifethreatening  string, 
   seriousnessother  string, 
   actiondrug  string, 
   activesubstancename  string, 
   drugadditional  string, 
   drugadministrationroute  string, 
   drugcharacterization  string, 
   drugindication  string, 
   drugauthorizationnumb  string, 
   medicinalproduct  string, 
   drugdosageform  string, 
   drugdosagetext  string, 
   reactionoutcome  string, 
   reactionmeddrapt  string, 
   reactionmeddraversionpt  string)
STORED AS parquet
LOCATION
  's3://{YOUR TARGET BUCKET}/output/drugevents'

With the Athena table in place, you can start to explore the data by running ad hoc queries within Athena or doing more advanced statistical analysis in R.

Using SQL and R to analyze adverse events

Using the openFDA data with Athena makes it very easy to translate your questions into SQL code and perform quick analysis on the data. After you have prepared the data for Athena, you can begin to explore the relationship between aspirin and adverse drug events, as an example. One of the most common metrics to measure adverse drug events is the Proportional Reporting Ratio (PRR). It is defined as:

PRR = (m/n)/( (M-m)/(N-n) )
Where
m = #reports with drug and event
n = #reports with drug
M = #reports with event in database
N = #reports in database

Gastrointestinal haemorrhage has the highest PRR of any reaction to aspirin when viewed in aggregate. One question you may want to ask is how the PRR has trended on a yearly basis for gastrointestinal haemorrhage since 2005.

Using the following query in Athena, you can see the PRR trend of “GASTROINTESTINAL HAEMORRHAGE” reactions with “ASPIRIN” since 2005:

with drug_and_event as 
(select rpad(receiptdate, 4, 'NA') as receipt_year
    , reactionmeddrapt
    , count(distinct (concat(safetyreportid,receiptdate,reactionmeddrapt))) as reports_with_drug_and_event 
from fda.drugevents
where rpad(receiptdate,4,'NA') 
     between '2005' and '2015' 
     and medicinalproduct = 'ASPIRIN'
     and reactionmeddrapt= 'GASTROINTESTINAL HAEMORRHAGE'
group by reactionmeddrapt, rpad(receiptdate, 4, 'NA') 
), reports_with_drug as 
(
select rpad(receiptdate, 4, 'NA') as receipt_year
    , count(distinct (concat(safetyreportid,receiptdate,reactionmeddrapt))) as reports_with_drug 
 from fda.drugevents 
 where rpad(receiptdate,4,'NA') 
     between '2005' and '2015' 
     and medicinalproduct = 'ASPIRIN'
group by rpad(receiptdate, 4, 'NA') 
), reports_with_event as 
(
   select rpad(receiptdate, 4, 'NA') as receipt_year
    , count(distinct (concat(safetyreportid,receiptdate,reactionmeddrapt))) as reports_with_event 
   from fda.drugevents
   where rpad(receiptdate,4,'NA') 
     between '2005' and '2015' 
     and reactionmeddrapt= 'GASTROINTESTINAL HAEMORRHAGE'
   group by rpad(receiptdate, 4, 'NA')
), total_reports as 
(
   select rpad(receiptdate, 4, 'NA') as receipt_year
    , count(distinct (concat(safetyreportid,receiptdate,reactionmeddrapt))) as total_reports 
   from fda.drugevents
   where rpad(receiptdate,4,'NA') 
     between '2005' and '2015' 
   group by rpad(receiptdate, 4, 'NA')
)
select  drug_and_event.receipt_year, 
(1.0 * drug_and_event.reports_with_drug_and_event/reports_with_drug.reports_with_drug)/ (1.0 * (reports_with_event.reports_with_event- drug_and_event.reports_with_drug_and_event)/(total_reports.total_reports-reports_with_drug.reports_with_drug)) as prr
, drug_and_event.reports_with_drug_and_event
, reports_with_drug.reports_with_drug
, reports_with_event.reports_with_event
, total_reports.total_reports
from drug_and_event
    inner join reports_with_drug on  drug_and_event.receipt_year = reports_with_drug.receipt_year   
    inner join reports_with_event on  drug_and_event.receipt_year = reports_with_event.receipt_year
    inner join total_reports on  drug_and_event.receipt_year = total_reports.receipt_year
order by  drug_and_event.receipt_year


One nice feature of Athena is that you can quickly connect to it via R or any other tool that can use a JDBC driver to visualize the data and understand it more clearly.

With this quick R script that can be run in R Studio either locally or on an EC2 instance, you can create a visualization of the PRR and Reporting Odds Ratio (RoR) for “GASTROINTESTINAL HAEMORRHAGE” reactions from “ASPIRIN” since 2005 to better understand these trends.

# connect to ATHENA
conn <- dbConnect(drv, '<Your JDBC URL>',s3_staging_dir="<Your S3 Location>",user=Sys.getenv(c("USER_NAME"),password=Sys.getenv(c("USER_PASSWORD"))

# Declare Adverse Event
adverseEvent <- "'GASTROINTESTINAL HAEMORRHAGE'"

# Build SQL Blocks
sqlFirst <- "SELECT rpad(receiptdate, 4, 'NA') as receipt_year, count(DISTINCT safetyreportid) as event_count FROM fda.drugsflat WHERE rpad(receiptdate,4,'NA') between '2005' and '2015'"
sqlEnd <- "GROUP BY rpad(receiptdate, 4, 'NA') ORDER BY receipt_year"

# Extract Aspirin with adverse event counts
sql <- paste(sqlFirst,"AND medicinalproduct ='ASPIRIN' AND reactionmeddrapt=",adverseEvent, sqlEnd,sep=" ")
aspirinAdverseCount = dbGetQuery(conn,sql)

# Extract Aspirin counts
sql <- paste(sqlFirst,"AND medicinalproduct ='ASPIRIN'", sqlEnd,sep=" ")
aspirinCount = dbGetQuery(conn,sql)

# Extract adverse event counts
sql <- paste(sqlFirst,"AND reactionmeddrapt=",adverseEvent, sqlEnd,sep=" ")
adverseCount = dbGetQuery(conn,sql)

# All Drug Adverse event Counts
sql <- paste(sqlFirst, sqlEnd,sep=" ")
allDrugCount = dbGetQuery(conn,sql)

# Select correct rows
selAll =  allDrugCount$receipt_year == aspirinAdverseCount$receipt_year
selAspirin = aspirinCount$receipt_year == aspirinAdverseCount$receipt_year
selAdverse = adverseCount$receipt_year == aspirinAdverseCount$receipt_year

# Calculate Numbers
m <- c(aspirinAdverseCount$event_count)
n <- c(aspirinCount[selAspirin,2])
M <- c(adverseCount[selAdverse,2])
N <- c(allDrugCount[selAll,2])

# Calculate proptional reporting ratio
PRR = (m/n)/((M-m)/(N-n))

# Calculate reporting Odds Ratio
d = n-m
D = N-M
ROR = (m/d)/(M/D)

# Plot the PRR and ROR
g_range <- range(0, PRR,ROR)
g_range[2] <- g_range[2] + 3
yearLen = length(aspirinAdverseCount$receipt_year)
axis(1,1:yearLen,lab=ax)
plot(PRR, type="o", col="blue", ylim=g_range,axes=FALSE, ann=FALSE)
axis(1,1:yearLen,lab=ax)
axis(2, las=1, at=1*0:g_range[2])
box()
lines(ROR, type="o", pch=22, lty=2, col="red")

As you can see, the PRR and RoR have both remained fairly steady over this time range. With the R Script above, all you need to do is change the adverseEvent variable from GASTROINTESTINAL HAEMORRHAGE to another type of reaction to analyze and compare those trends.

Summary

In this walkthrough:

  • You used a Scala script on EMR to convert the openFDA zip files to gzip.
  • You then transformed the JSON blobs into flattened Parquet files using Spark on EMR.
  • You created an Athena DDL so that you could query these Parquet files residing in S3.
  • Finally, you pointed the R package at the Athena table to analyze the data without pulling it into a database or creating your own servers.

If you have questions or suggestions, please comment below.


Next Steps

Take your skills to the next level. Learn how to optimize Amazon S3 for an architecture commonly used to enable genomic data analysis. Also, be sure to read more about running R on Amazon Athena.

 

 

 

 

 


About the Authors

Ryan Hood is a Data Engineer for AWS. He works on big data projects leveraging the newest AWS offerings. In his spare time, he enjoys watching the Cubs win the World Series and attempting to Sous-vide anything he can find in his refrigerator.

 

 

Vikram Anand is a Data Engineer for AWS. He works on big data projects leveraging the newest AWS offerings. In his spare time, he enjoys playing soccer and watching the NFL & European Soccer leagues.

 

 

Dave Rocamora is a Solutions Architect at Amazon Web Services on the Open Data team. Dave is based in Seattle and when he is not opening data, he enjoys biking and drinking coffee outside.

 

 

 

 

WannaCry and Vulnerabilities

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/06/wannacry_and_vu.html

There is plenty of blame to go around for the WannaCry ransomware that spread throughout the Internet earlier this month, disrupting work at hospitals, factories, businesses, and universities. First, there are the writers of the malicious software, which blocks victims’ access to their computers until they pay a fee. Then there are the users who didn’t install the Windows security patch that would have prevented an attack. A small portion of the blame falls on Microsoft, which wrote the insecure code in the first place. One could certainly condemn the Shadow Brokers, a group of hackers with links to Russia who stole and published the National Security Agency attack tools that included the exploit code used in the ransomware. But before all of this, there was the NSA, which found the vulnerability years ago and decided to exploit it rather than disclose it.

All software contains bugs or errors in the code. Some of these bugs have security implications, granting an attacker unauthorized access to or control of a computer. These vulnerabilities are rampant in the software we all use. A piece of software as large and complex as Microsoft Windows will contain hundreds of them, maybe more. These vulnerabilities have obvious criminal uses that can be neutralized if patched. Modern software is patched all the time — either on a fixed schedule, such as once a month with Microsoft, or whenever required, as with the Chrome browser.

When the US government discovers a vulnerability in a piece of software, however, it decides between two competing equities. It can keep it secret and use it offensively, to gather foreign intelligence, help execute search warrants, or deliver malware. Or it can alert the software vendor and see that the vulnerability is patched, protecting the country — and, for that matter, the world — from similar attacks by foreign governments and cybercriminals. It’s an either-or choice. As former US Assistant Attorney General Jack Goldsmith has said, “Every offensive weapon is a (potential) chink in our defense — and vice versa.”

This is all well-trod ground, and in 2010 the US government put in place an interagency Vulnerabilities Equities Process (VEP) to help balance the trade-off. The details are largely secret, but a 2014 blog post by then President Barack Obama’s cybersecurity coordinator, Michael Daniel, laid out the criteria that the government uses to decide when to keep a software flaw undisclosed. The post’s contents were unsurprising, listing questions such as “How much is the vulnerable system used in the core Internet infrastructure, in other critical infrastructure systems, in the US economy, and/or in national security systems?” and “Does the vulnerability, if left unpatched, impose significant risk?” They were balanced by questions like “How badly do we need the intelligence we think we can get from exploiting the vulnerability?” Elsewhere, Daniel has noted that the US government discloses to vendors the “overwhelming majority” of the vulnerabilities that it discovers — 91 percent, according to NSA Director Michael S. Rogers.

The particular vulnerability in WannaCry is code-named EternalBlue, and it was discovered by the US government — most likely the NSA — sometime before 2014. The Washington Post reported both how useful the bug was for attack and how much the NSA worried about it being used by others. It was a reasonable concern: many of our national security and critical infrastructure systems contain the vulnerable software, which imposed significant risk if left unpatched. And yet it was left unpatched.

There’s a lot we don’t know about the VEP. The Washington Post says that the NSA used EternalBlue “for more than five years,” which implies that it was discovered after the 2010 process was put in place. It’s not clear if all vulnerabilities are given such consideration, or if bugs are periodically reviewed to determine if they should be disclosed. That said, any VEP that allows something as dangerous as EternalBlue — or the Cisco vulnerabilities that the Shadow Brokers leaked last August to remain unpatched for years isn’t serving national security very well. As a former NSA employee said, the quality of intelligence that could be gathered was “unreal.” But so was the potential damage. The NSA must avoid hoarding vulnerabilities.

Perhaps the NSA thought that no one else would discover EternalBlue. That’s another one of Daniel’s criteria: “How likely is it that someone else will discover the vulnerability?” This is often referred to as NOBUS, short for “nobody but us.” Can the NSA discover vulnerabilities that no one else will? Or are vulnerabilities discovered by one intelligence agency likely to be discovered by another, or by cybercriminals?

In the past few months, the tech community has acquired some data about this question. In one study, two colleagues from Harvard and I examined over 4,300 disclosed vulnerabilities in common software and concluded that 15 to 20 percent of them are rediscovered within a year. Separately, researchers at the Rand Corporation looked at a different and much smaller data set and concluded that fewer than six percent of vulnerabilities are rediscovered within a year. The questions the two papers ask are slightly different and the results are not directly comparable (we’ll both be discussing these results in more detail at the Black Hat Conference in July), but clearly, more research is needed.

People inside the NSA are quick to discount these studies, saying that the data don’t reflect their reality. They claim that there are entire classes of vulnerabilities the NSA uses that are not known in the research world, making rediscovery less likely. This may be true, but the evidence we have from the Shadow Brokers is that the vulnerabilities that the NSA keeps secret aren’t consistently different from those that researchers discover. And given the alarming ease with which both the NSA and CIA are having their attack tools stolen, rediscovery isn’t limited to independent security research.

But even if it is difficult to make definitive statements about vulnerability rediscovery, it is clear that vulnerabilities are plentiful. Any vulnerabilities that are discovered and used for offense should only remain secret for as short a time as possible. I have proposed six months, with the right to appeal for another six months in exceptional circumstances. The United States should satisfy its offensive requirements through a steady stream of newly discovered vulnerabilities that, when fixed, also improve the country’s defense.

The VEP needs to be reformed and strengthened as well. A report from last year by Ari Schwartz and Rob Knake, who both previously worked on cybersecurity policy at the White House National Security Council, makes some good suggestions on how to further formalize the process, increase its transparency and oversight, and ensure periodic review of the vulnerabilities that are kept secret and used for offense. This is the least we can do. A bill recently introduced in both the Senate and the House calls for this and more.

In the case of EternalBlue, the VEP did have some positive effects. When the NSA realized that the Shadow Brokers had stolen the tool, it alerted Microsoft, which released a patch in March. This prevented a true disaster when the Shadow Brokers exposed the vulnerability on the Internet. It was only unpatched systems that were susceptible to WannaCry a month later, including versions of Windows so old that Microsoft normally didn’t support them. Although the NSA must take its share of the responsibility, no matter how good the VEP is, or how many vulnerabilities the NSA reports and the vendors fix, security won’t improve unless users download and install patches, and organizations take responsibility for keeping their software and systems up to date. That is one of the important lessons to be learned from WannaCry.

This essay originally appeared in Foreign Affairs.

Alleged KickassTorrents Founder Released on Bail

Post Syndicated from Ernesto original https://torrentfreak.com/alleged-kickasstorrents-founder-released-on-bail-170523/

kickasstorrents_500x500Last summer, Polish law enforcement officers arrested Artem Vaulin, the alleged founder of KickassTorrents.

Polish authorities acted on a criminal complaint from the US Government, which accused him of criminal copyright infringement and money laundering.

Facing severe back problems, Vaulin was transferred to a hospital in December, and he later continued treatment in a Warsaw prison while awaiting progress in his extradition case.

After being held in custody for nearly ten months, a breakthrough came last week when Vaulin was released on bail. The Verge reports that bail was set at $108,000 and that the alleged KickassTorrents founder now lives in a rented appartment in Warsaw.

Vaulin isn’t allowed to leave the country, but will be enjoying relative freedom, and most importantly, the company of his wife and son.

Two days before his release, The Verge’s Greg Sandoval spoke with Vaulin, who couldn’t go into detail on his alleged involvement with KickassTorrents. However, the Ukrainian entrepreneur stressed that he wasn’t looking for trouble.

“I’m a businessman. When I start a business I consult lawyers. I was never told that anything I was involved in was against the law,” Vaulin told Sandoval.

“I’m not crazy. If someone came to me to tell me the United States was angry with something I do, whatever it was, I would stop,” he added.

While life on bail is a great improvement compared to the conditions in prison, the case is not over yet. In March, the Warsaw District Court ruled in first instance that Vaulin can be extradited, but the second instance decision is still pending.

Over in the United States, the defense team also has a motion pending. In February, Vaulin’s lawyers urged the Illinois District Court to dismiss the indictment because there’s no proof of actual criminal copyright infringement.

Lead counsel Ira Rothken is happy that his client has been released on bail, and he’s confident that they will appear as victors down the road.

“We are pleased that the Court freed Artem Vaulin from prison in Poland. This will allow him to better take care of his health, be with his family, and assist in his defense,” Rothken tells TorrentFreak.

“We look forward to the US Federal Court ruling on his pending motion to dismiss. If the US indictment is defective then extradition based on the indictment is erroneous – Artem shouldn’t have to leave his family behind,” he adds.

The full coverage on The Verge has some additional comments from the alleged KAT founder, which is well worth reading.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Amazon EC2 Container Service – Launch Recap, Customer Stories, and Code

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-ec2-container-service-launch-recap-customer-stories-and-code/

Today seems like a good time to recap some of the features that we have added to Amazon EC2 Container Service over the last year or so, and to share some customer success stories and code with you! The service makes it easy for you to run any number of Docker containers across a managed cluster of EC2 instances, with full console, API, CloudFormation, CLI, and PowerShell support. You can store your Linux and Windows Docker images in the EC2 Container Registry for easy access.

Launch Recap
Let’s start by taking a look at some of the newest ECS features and some helpful how-to blog posts that will show you how to use them:

Application Load Balancing – We added support for the application load balancer last year. This high-performance load balancing option runs at the application level and allows you to define content-based routing rules. It provides support for dynamic ports and can be shared across multiple services, making it easier for you to run microservices in containers. To learn more, read about Service Load Balancing.

IAM Roles for Tasks – You can secure your infrastructure by assigning IAM roles to ECS tasks. This allows you to grant permissions on a fine-grained, per-task basis, customizing the permissions to the needs of each task. Read IAM Roles for Tasks to learn more.

Service Auto Scaling – You can define scaling policies that scale your services (tasks) up and down in response to changes in demand. You set the desired minimum and maximum number of tasks, create one or more scaling policies, and Service Auto Scaling will take care of the rest. The documentation for Service Auto Scaling will help you to make use of this feature.

Blox – Scheduling, in a container-based environment, is the process of assigning tasks to instances. ECS gives you three options: automated (via the built-in Service Scheduler), manual (via the RunTask function), and custom (via a scheduler that you provide). Blox is an open source scheduler that supports a one-task-per-host model, with room to accommodate other models in the future. It monitors the state of the cluster and is well-suited to running monitoring agents, log collectors, and other daemon-style tasks.

Windows – We launched ECS with support for Linux containers and followed up with support for running Windows Server 2016 Base with Containers.

Container Instance Draining – From time to time you may need to remove an instance from a running cluster in order to scale the cluster down or to perform a system update. Earlier this year we added a set of lifecycle hooks that allow you to better manage the state of the instances. Read the blog post How to Automate Container Instance Draining in Amazon ECS to see how to use the lifecycle hooks and a Lambda function to automate the process of draining existing work from an instance while preventing new work from being scheduled for it.

CI/CD Pipeline with Code* – Containers simplify software deployment and are an ideal target for a CI/CD (Continuous Integration / Continuous Deployment) pipeline. The post Continuous Deployment to Amazon ECS using AWS CodePipeline, AWS CodeBuild, Amazon ECR, and AWS CloudFormation shows you how to build and operate a CI/CD pipeline using multiple AWS services.

CloudWatch Logs Integration – This launch gave you the ability to configure the containers that run your tasks to send log information to CloudWatch Logs for centralized storage and analysis. You simply install the Amazon ECS Container Agent and enable the awslogs log driver.

CloudWatch Events – ECS generates CloudWatch Events when the state of a task or a container instance changes. These events allow you to monitor the state of the cluster using a Lambda function. To learn how to capture the events and store them in an Elasticsearch cluster, read Monitor Cluster State with Amazon ECS Event Stream.

Task Placement Policies – This launch provided you with fine-grained control over the placement of tasks on container instances within clusters. It allows you to construct policies that include cluster constraints, custom constraints (location, instance type, AMI, and attribute), placement strategies (spread or bin pack) and to use them without writing any code. Read Introducing Amazon ECS Task Placement Policies to see how to do this!

EC2 Container Service in Action
Many of our customers from large enterprises to hot startups and across all industries, such as financial services, hospitality, and consumer electronics, are using Amazon ECS to run their microservices applications in production. Companies such as Capital One, Expedia, Okta, Riot Games, and Viacom rely on Amazon ECS.

Mapbox is a platform for designing and publishing custom maps. The company uses ECS to power their entire batch processing architecture to collect and process over 100 million miles of sensor data per day that they use for powering their maps. They also optimize their batch processing architecture on ECS using Spot Instances. The Mapbox platform powers over 5,000 apps and reaches more than 200 million users each month. Its backend runs on ECS allowing it to serve more than 1.3 billion requests per day. To learn more about their recent migration to ECS, read their recent blog post, We Switched to Amazon ECS, and You Won’t Believe What Happened Next.

Travel company Expedia designed their backends with a microservices architecture. With the popularization of Docker, they decided they would like to adopt Docker for its faster deployments and environment portability. They chose to use ECS to orchestrate all their containers because it had great integration with the AWS platform, everything from ALB to IAM roles to VPC integration. This made ECS very easy to use with their existing AWS infrastructure. ECS really reduced the heavy lifting of deploying and running containerized applications. Expedia runs 75% of all apps on AWS in ECS allowing it to process 4 billion requests per hour. Read Kuldeep Chowhan‘s blog post, How Expedia Runs Hundreds of Applications in Production Using Amazon ECS to learn more.

Realtor.com provides home buyers and sellers with a comprehensive database of properties that are currently for sale. Their move to AWS and ECS has helped them to support business growth that now numbers 50 million unique monthly users who drive up to 250,000 requests per second at peak times. ECS has helped them to deploy their code more quickly while increasing utilization of their cloud infrastructure. Read the Realtor.com Case Study to learn more about how they use ECS, Kinesis, and other AWS services.

Instacart talks about how they use ECS to power their same-day grocery delivery service:

Capital One talks about how they use ECS to automate their operations and their infrastructure management:

Code
Clever developers are using ECS as a base for their own work. For example:

Rack is an open source PaaS (Platform as a Service). It focuses on infrastructure automation, runs in an isolated VPC, and uses a single-tenant build service for security.

Empire is also an open source PaaS. It provides a Heroku-like workflow and is targeted at small and medium sized startups, with an emphasis on microservices.

Cloud Container Cluster Visualizer (c3vis) helps to visualize resource utilization within ECS clusters:

Stay Tuned
We have plenty of new features in the works for ECS, so stay tuned!

Jeff;

 

WannaCry Ransomware

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/05/wannacry_ransom.html

Criminals go where the money is, and cybercriminals are no exception.

And right now, the money is in ransomware.

It’s a simple scam. Encrypt the victim’s hard drive, then extract a fee to decrypt it. The scammers can’t charge too much, because they want the victim to pay rather than give up on the data. But they can charge individuals a few hundred dollars, and they can charge institutions like hospitals a few thousand. Do it at scale, and it’s a profitable business.

And scale is how ransomware works. Computers are infected automatically, with viruses that spread over the internet. Payment is no more difficult than buying something online ­– and payable in untraceable bitcoin -­- with some ransomware makers offering tech support to those unsure of how to buy or transfer bitcoin. Customer service is important; people need to know they’ll get their files back once they pay.

And they want you to pay. If they’re lucky, they’ve encrypted your irreplaceable family photos, or the documents of a project you’ve been working on for weeks. Or maybe your company’s accounts receivable files or your hospital’s patient records. The more you need what they’ve stolen, the better.

The particular ransomware making headlines is called WannaCry, and it’s infected some pretty serious organizations.

What can you do about it? Your first line of defense is to diligently install every security update as soon as it becomes available, and to migrate to systems that vendors still support. Microsoft issued a security patch that protects against WannaCry months before the ransomware started infecting systems; it only works against computers that haven’t been patched. And many of the systems it infects are older computers, no longer normally supported by Microsoft –­ though it did belatedly release a patch for those older systems. I know it’s hard, but until companies are forced to maintain old systems, you’re much safer upgrading.

This is easier advice for individuals than for organizations. You and I can pretty easily migrate to a new operating system, but organizations sometimes have custom software that breaks when they change OS versions or install updates. Many of the organizations hit by WannaCry had outdated systems for exactly these reasons. But as expensive and time-consuming as updating might be, the risks of not doing so are increasing.

Your second line of defense is good antivirus software. Sometimes ransomware tricks you into encrypting your own hard drive by clicking on a file attachment that you thought was benign. Antivirus software can often catch your mistake and prevent the malicious software from running. This isn’t perfect, of course, but it’s an important part of any defense.

Your third line of defense is to diligently back up your files. There are systems that do this automatically for your hard drive. You can invest in one of those. Or you can store your important data in the cloud. If your irreplaceable family photos are in a backup drive in your house, then the ransomware has that much less hold on you. If your e-mail and documents are in the cloud, then you can just reinstall the operating system and bypass the ransomware entirely. I know storing data in the cloud has its own privacy risks, but they may be less than the risks of losing everything to ransomware.

That takes care of your computers and smartphones, but what about everything else? We’re deep into the age of the “Internet of things.”

There are now computers in your household appliances. There are computers in your cars and in the airplanes you travel on. Computers run our traffic lights and our power grids. These are all vulnerable to ransomware. The Mirai botnet exploited a vulnerability in internet-enabled devices like DVRs and webcams to launch a denial-of-service attack against a critical internet name server; next time it could just as easily disable the devices and demand payment to turn them back on.

Re-enabling a webcam will be cheap; re-enabling your car will cost more. And you don’t want to know how vulnerable implanted medical devices are to these sorts of attacks.

Commercial solutions are coming, probably a convenient repackaging of the three lines of defense described above. But it’ll be yet another security surcharge you’ll be expected to pay because the computers and internet-of-things devices you buy are so insecure. Because there are currently no liabilities for lousy software and no regulations mandating secure software, the market rewards software that’s fast and cheap at the expense of good. Until that changes, ransomware will continue to be profitable line of criminal business.

This essay previously appeared in the New York Daily News.

[$] Vulnerability hoarding and Wcry

Post Syndicated from jake original https://lwn.net/Articles/722924/rss

A virulent ransomware worm attacked a wide swath of Windows
machines worldwide in mid-May. The malware, known as Wcry, Wanna, or
WannaCry, infected a number of systems at high-profile organizations as
well as striking at critical pieces of the infrastructure—like hospitals, banks,
and train stations. While the threat seems to have largely abated—for
now—the origin of some of its code, which is apparently the US National Security
Agency (NSA), should give one pause.

Roundup of AWS HIPAA Eligible Service Announcements

Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/roundup-of-aws-hipaa-eligible-service-announcements/

At AWS we have had a number of HIPAA eligible service announcements. Patrick Combes, the Healthcare and Life Sciences Global Technical Leader at AWS, and Aaron Friedman, a Healthcare and Life Sciences Partner Solutions Architect at AWS, have written this post to tell you all about it.

-Ana


We are pleased to announce that the following AWS services have been added to the BAA in recent weeks: Amazon API Gateway, AWS Direct Connect, AWS Database Migration Service, and Amazon SQS. All four of these services facilitate moving data into and through AWS, and we are excited to see how customers will be using these services to advance their solutions in healthcare. While we know the use cases for each of these services are vast, we wanted to highlight some ways that customers might use these services with Protected Health Information (PHI).

As with all HIPAA-eligible services covered under the AWS Business Associate Addendum (BAA), PHI must be encrypted while at-rest or in-transit. We encourage you to reference our HIPAA whitepaper, which details how you might configure each of AWS’ HIPAA-eligible services to store, process, and transmit PHI. And of course, for any portion of your application that does not touch PHI, you can use any of our 90+ services to deliver the best possible experience to your users. You can find some ideas on architecting for HIPAA on our website.

Amazon API Gateway
Amazon API Gateway is a web service that makes it easy for developers to create, publish, monitor, and secure APIs at any scale. With PHI now able to securely transit API Gateway, applications such as patient/provider directories, patient dashboards, medical device reports/telemetry, HL7 message processing and more can securely accept and deliver information to any number and type of applications running within AWS or client presentation layers.

One particular area we are excited to see how our customers leverage Amazon API Gateway is with the exchange of healthcare information. The Fast Healthcare Interoperability Resources (FHIR) specification will likely become the next-generation standard for how health information is shared between entities. With strong support for RESTful architectures, FHIR can be easily codified within an API on Amazon API Gateway. For more information on FHIR, our AWS Healthcare Competency partner, Datica, has an excellent primer.

AWS Direct Connect
Some of our healthcare and life sciences customers, such as Johnson & Johnson, leverage hybrid architectures and need to connect their on-premises infrastructure to the AWS Cloud. Using AWS Direct Connect, you can establish private connectivity between AWS and your datacenter, office, or colocation environment, which in many cases can reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience than Internet-based connections.

In addition to a hybrid-architecture strategy, AWS Direct Connect can assist with the secure migration of data to AWS, which is the first step to using the wide array of our HIPAA-eligible services to store and process PHI, such as Amazon S3 and Amazon EMR. Additionally, you can connect to third-party/externally-hosted applications or partner-provided solutions as well as securely and reliably connect end users to those same healthcare applications, such as a cloud-based Electronic Medical Record system.

AWS Database Migration Service (DMS)
To date, customers have migrated over 20,000 databases to AWS through the AWS Database Migration Service. Customers often use DMS as part of their cloud migration strategy, and now it can be used to securely and easily migrate your core databases containing PHI to the AWS Cloud. As your source database remains fully operational during the migration with DMS, you minimize downtime for these business-critical applications as you migrate your databases to AWS. This service can now be utilized to securely transfer such items as patient directories, payment/transaction record databases, revenue management databases and more into AWS.

Amazon Simple Queue Service (SQS)
Amazon Simple Queue Service (SQS) is a message queueing service for reliably communicating among distributed software components and microservices at any scale. One way that we envision customers using SQS with PHI is to buffer requests between application components that pass HL7 or FHIR messages to other parts of their application. You can leverage features like SQS FIFO to ensure your messages containing PHI are passed in the order they are received and delivered in the order they are received, and available until a consumer processes and deletes it. This is important for applications with patient record updates or processing payment information in a hospital.

Let’s get building!
We are beyond excited to see how our customers will use our newly HIPAA-eligible services as part of their healthcare applications. What are you most excited for? Leave a comment below.

AWS Hot Startups – April 2017

Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/aws-hot-startups-april-2017/

Spring is here, the flowers are blooming and Tina Barr is back with more great startups for you to check out!

-Ana


Welcome back to another month of hot AWS-powered startups! Today we have three exciting startups:

  • Beekeeper – simplifying employee communication in the workplace.
  • Betterment – making investing easier for everyone.
  • ClearSlide – a leading sales engagement platform.

Be sure to check out our March hot startups in case you missed them.

Beekeeper (Zurich, Switzerland)
Beekeeper logoFlavio Pfaffhauser and Christian Grossmann, both graduates of ETH Zurich, were passionate about building a technology that would connect and bring people together. What started as a student’s social community soon turned into Beekeeper – a communication platform for the workplace that allows employees to interact wherever they are. As Flavio and Christian learned how to build a social platform that engaged people properly, businesses began requesting a platform that could be adapted to their specific processes and needs. The platform started with the concept of helping people feel as if they are sitting right next to each other, whether they’re at a desk or in the field. Founded in 2012, Beekeeper is focused on improving information sharing, communication and peer collaboration, and the company strongly believes that listening to employees is crucial for organizations.

The “Mobile First, Desktop Friendly” platform has a simple and intuitive interface that easily integrates multiple operating systems into one ecosystem. The interface can be styled and customized to match a company’s brand and identity. Employees can connect with their colleagues anytime and anywhere with private and group chats, video and file sharing, and feedback surveys. With Beekeeper’s analytical dashboard leadership teams can identify trending topics of discussion and track employee engagement and app usage in real-time. Beekeeper is currently connecting users in 137 countries across industries including hospitality, construction, transportation, and more.

Beekeeper likes using AWS because it allows their engineers to focus on the things that really matter; solving customer issues. The company builds its infrastructure using services like Amazon EC2, Amazon S3, and Amazon RDS, all of which allow the technical teams to offload administrative tasks. Amazon Elastic Transcoder and Amazon QuickSight are used to build analytical dashboards and Amazon Redshift for data warehousing.

Check out the Beekeeper blog to keep up with their latest news!

Betterment (New York, NY)
Betterment logo
Betterment is on a mission to make investing easier and more accessible for everyone, no matter their financial goal. In 2008, Jon Stein founded Betterment with the intent to reinvent the industry and save future investors from making the same common mistakes he had been making. At that time, most people only had a couple of options when it came to investing their money – either do it yourself or hire another person to do it for you. Unfortunately, financial advisors are sometimes paid to recommend certain investments even if it’s not what is best for their clients. Betterment only chooses investments that are in their customer’s best interest and align with their financial goals. Today, they are the largest, independent online investment advisor managing more than $8 billion in assets for over 240,000 customers.

Betterment uses technology to make investing easier and more efficient, while also helping to increase after-tax returns. They offer a wide range of financial planning services that are personalized to their customer’s life goals. To start an investment plan, customers can input their age, retirement status, and annual income and Betterment will recommend how much money to invest and which type of account is the right choice. They will invest and manage it in a way that many traditional investment services can’t at a lower cost.

The engineers at Betterment are constantly working to build industry-changing technology as quickly as possible to help customers maximize their money. AWS gives Betterment the flexibility to easily provision infrastructure and offload functions to various services that once required entire teams to manage. When they first started in the cloud, Betterment was using standard implementations of Amazon EC2, Amazon RDS, and Amazon S3. Since they’ve gone all in with AWS, they have been leveraging services like Amazon Redshift, AWS Lambda, AWS Database Migration Service, Amazon Kinesis, Amazon DynamoDB, and more. Today, they are using over 20 AWS services to develop, test, and deploy features and enhancements on a daily basis.

Learn more about Betterment here.

ClearSlide (San Francisco, CA)
ClearSlide is one of today’s leading sales engagement platforms, offering a complete and integrated tool that makes every customer interaction successful. Since their founding in 2009, ClearSlide has looked for ways to improve customer experiences and have developed numerous enablement tools for sales leaders and teams, marketing, customer support teams, and more. The platform puts content, communication channels, and insights at their customer’s fingertips to help drive better decisions and manage opportunities. ClearSlide serves thousands of companies including Comcast, the Sacramento Kings, The Economist, and so far their customers have generated over 750 million minutes of engagement!

ClearSlide offers a solution for all parts of the sales process. For sales leaders, ClearSlide provides engagement dashboards to improve deal visibility, coaching, and sales forecast accuracy. For marketing and sales enablement teams, they guide sellers to the right content, at the right time, in the right context, and provide insight to maximize content ROI. For sales reps, ClearSlide integrates communications, content, and analytics in a single platform experience. Communications can be made across email, in-person or online meetings, web, or social. Today, ClearSlide customers report a 10-20% increase in closed deals, 25% decrease in onboarding time for new reps, and a 50-80% reduction in selling costs.

ClearSlide uses a range of AWS services, but Amazon EC2 and Amazon RDS have made the biggest impact on their business. EC2 enables them to easily scale compute capacity, which is critical for a fast-growing startup. It also provides consistency during deployment – from development and integration to staging and production. RDS reduces overhead and allows ClearSlide to scale their database infrastructure. Since AWS takes care of time-consuming database management tasks, ClearSlide sees a reduction in operations costs and can focus on being more strategic with their customers.

Watch this video to learn how LiveIntent reduced sales cycles by 22% using ClearSlide. Get all the latest updates by following them on Twitter!

Thanks for checking out another month of awesome AWS-powered startups!

-Tina

 

Querying OpenStreetMap with Amazon Athena

Post Syndicated from Seth Fitzsimmons original https://aws.amazon.com/blogs/big-data/querying-openstreetmap-with-amazon-athena/

This is a guest post by Seth Fitzsimmons, member of the 2017 OpenStreetMap US board of directors. Seth works with clients including the Humanitarian OpenStreetMap Team, Mapzen, the American Red Cross, and World Bank to craft innovative geospatial solutions.

OpenStreetMap (OSM) is a free, editable map of the world, created and maintained by volunteers and available for use under an open license. Companies and non-profits like Mapbox, Foursquare, Mapzen, the World Bank, the American Red Cross and others use OSM to provide maps, directions, and geographic context to users around the world.

In the 12 years of OSM’s existence, editors have created and modified several billion features (physical things on the ground like roads or buildings). The main PostgreSQL database that powers the OSM editing interface is now over 2TB and includes historical data going back to 2007. As new users join the open mapping community, more and more valuable data is being added to OpenStreetMap, requiring increasingly powerful tools, interfaces, and approaches to explore its vastness.

This post explains how anyone can use Amazon Athena to quickly query publicly available OSM data stored in Amazon S3 (updated weekly) as an AWS Public Dataset. Imagine that you work for an NGO interested in improving knowledge of and access to health centers in Africa. You might want to know what’s already been mapped, to facilitate the production of maps of surrounding villages, and to determine where infrastructure investments are likely to be most effective.

Note: If you run all the queries in this post, you will be charged approximately $1 based on the number of bytes scanned. All queries used in this post can be found in this GitHub gist.

What is OpenStreetMap?

As an open content project, regular OSM data archives are made available to the public via planet.openstreetmap.org in a few different formats (XML, PBF). This includes both snapshots of the current state of data in OSM as well as historical archives.

Working with “the planet” (as the data archives are referred to) can be unwieldy. Because it contains data spanning the entire world, the size of a single archive is on the order of 50 GB. The format is bespoke and extremely specific to OSM. The data is incredibly rich, interesting, and useful, but the size, format, and tooling can often make it very difficult to even start the process of asking complex questions.

Heavy users of OSM data typically download the raw data and import it into their own systems, tailored for their individual use cases, such as map rendering, driving directions, or general analysis. Now that OSM data is available in the Apache ORC format on Amazon S3, it’s possible to query the data using Athena without even downloading it.

How does Athena help?

You can use Athena along with data made publicly available via OSM on AWS. You don’t have to learn how to install, configure, and populate your own server instances and go through multiple steps to download and transform the data into a queryable form. Thanks to AWS and partners, a regularly updated copy of the planet file (available within hours of OSM’s weekly publishing schedule) is hosted on S3 and made available in a format that lends itself to efficient querying using Athena.

Asking questions with Athena involves registering the OSM planet file as a table and making SQL queries. That’s it. Nothing to download, nothing to configure, nothing to ingest. Athena distributes your queries and returns answers within seconds, even while querying over 9 years and billions of OSM elements.

You’re in control. S3 provides high availability for the data and Athena charges you per TB of data scanned. Plus, we’ve gone through the trouble of keeping scanning charges as small as possible by transcoding OSM’s bespoke format as ORC. All the hard work of transforming the data into something highly queryable and making it publicly available is done; you just need to bring some questions.

Registering Tables

The OSM Public Datasets consist of three tables:

  • planet
    Contains the current versions of all elements present in OSM.
  • planet_history
    Contains a historical record of all versions of all elements (even those that have been deleted).
  • changesets
    Contains information about changesets in which elements were modified (and which have a foreign key relationship to both the planet and planet_history tables).

To register the OSM Public Datasets within your AWS account so you can query them, open the Athena console (make sure you are using the us-east-1 region) to paste and execute the following table definitions:

planet

CREATE EXTERNAL TABLE planet (
  id BIGINT,
  type STRING,
  tags MAP<STRING,STRING>,
  lat DECIMAL(9,7),
  lon DECIMAL(10,7),
  nds ARRAY<STRUCT<ref: BIGINT>>,
  members ARRAY<STRUCT<type: STRING, ref: BIGINT, role: STRING>>,
  changeset BIGINT,
  timestamp TIMESTAMP,
  uid BIGINT,
  user STRING,
  version BIGINT
)
STORED AS ORCFILE
LOCATION 's3://osm-pds/planet/';

planet_history

CREATE EXTERNAL TABLE planet_history (
    id BIGINT,
    type STRING,
    tags MAP<STRING,STRING>,
    lat DECIMAL(9,7),
    lon DECIMAL(10,7),
    nds ARRAY<STRUCT<ref: BIGINT>>,
    members ARRAY<STRUCT<type: STRING, ref: BIGINT, role: STRING>>,
    changeset BIGINT,
    timestamp TIMESTAMP,
    uid BIGINT,
    user STRING,
    version BIGINT,
    visible BOOLEAN
)
STORED AS ORCFILE
LOCATION 's3://osm-pds/planet-history/';

changesets

CREATE EXTERNAL TABLE changesets (
    id BIGINT,
    tags MAP<STRING,STRING>,
    created_at TIMESTAMP,
    open BOOLEAN,
    closed_at TIMESTAMP,
    comments_count BIGINT,
    min_lat DECIMAL(9,7),
    max_lat DECIMAL(9,7),
    min_lon DECIMAL(10,7),
    max_lon DECIMAL(10,7),
    num_changes BIGINT,
    uid BIGINT,
    user STRING
)
STORED AS ORCFILE
LOCATION 's3://osm-pds/changesets/';

 

Under the Hood: Extract, Transform, Load

So, what happens behind the scenes to make this easier for you? In a nutshell, the data is transcoded from the OSM PBF format into Apache ORC.

There’s an AWS Lambda function (running every 15 minutes, triggered by CloudWatch Events) that watches planet.openstreetmap.org for the presence of weekly updates (using rsync). If that function detects that a new version has become available, it submits a set of AWS Batch jobs to mirror, transcode, and place it as the “latest” version. Code for this is available at osm-pds-pipelines GitHub repo.

To facilitate the transcoding into a format appropriate for Athena, we have produced an open source tool, OSM2ORC. The tool also includes an Osmosis plugin that can be used with complex filtering pipelines and outputs an ORC file that can be uploaded to S3 for use with Athena, or used locally with other tools from the Hadoop ecosystem.

What types of questions can OpenStreetMap answer?

There are many uses for OpenStreetMap data; here are three major ones and how they may be addressed using Athena.

Case Study 1: Finding Local Health Centers in West Africa

When the American Red Cross mapped more than 7,000 communities in West Africa in areas affected by the Ebola epidemic as part of the Missing Maps effort, they found themselves in a position where collecting a wide variety of data was both important and incredibly beneficial for others. Accurate maps play a critical role in understanding human communities, especially for populations at risk. The lack of detailed maps for West Africa posed a problem during the 2014 Ebola crisis, so collecting and producing data around the world has the potential to improve disaster responses in the future.

As part of the data collection, volunteers collected locations and information about local health centers, something that will facilitate treatment in future crises (and, more importantly, on a day-to-day basis). Combined with information about access to markets and clean drinking water and historical experiences with natural disasters, this data was used to create a vulnerability index to select communities for detailed mapping.

For this example, you find all health centers in West Africa (many of which were mapped as part of Missing Maps efforts). This is something that healthsites.io does for the public (worldwide and editable, based on OSM data), but you’re working with the raw data.

Here’s a query to fetch information about all health centers, tagged as nodes (points), in Guinea, Sierra Leone, and Liberia:

SELECT * from planet
WHERE type = 'node'
  AND tags['amenity'] IN ('hospital', 'clinic', 'doctors')
  AND lon BETWEEN -15.0863 AND -7.3651
  AND lat BETWEEN 4.3531 AND 12.6762;

Buildings, as “ways” (polygons, in this case) assembled from constituent nodes (points), can also be tagged as medical facilities. In order to find those, you need to reassemble geometries. Here you’re taking the average of all nodes that make up a building (which will be the approximate center point, which is close enough for this purpose). Here is a query that finds both buildings and points that are tagged as medical facilities:

-- select out nodes and relevant columns
WITH nodes AS (
  SELECT
    type,
    id,
    tags,
    lat,
    lon
  FROM planet
  WHERE type = 'node'
),
-- select out ways and relevant columns
ways AS (
  SELECT
    type,
    id,
    tags,
    nds
  FROM planet
  WHERE type = 'way'
    AND tags['amenity'] IN ('hospital', 'clinic', 'doctors')
),
-- filter nodes to only contain those present within a bounding box
nodes_in_bbox AS (
  SELECT *
  FROM nodes
  WHERE lon BETWEEN -15.0863 AND -7.3651
    AND lat BETWEEN 4.3531 AND 12.6762
)
-- find ways intersecting the bounding box
SELECT
  ways.type,
  ways.id,
  ways.tags,
  AVG(nodes.lat) lat,
  AVG(nodes.lon) lon
FROM ways
CROSS JOIN UNNEST(nds) AS t (nd)
JOIN nodes_in_bbox nodes ON nodes.id = nd.ref
GROUP BY (ways.type, ways.id, ways.tags)
UNION ALL
SELECT
  type,
  id,
  tags,
  lat,
  lon
FROM nodes_in_bbox
WHERE tags['amenity'] IN ('hospital', 'clinic', 'doctors');

You could go a step further and query for additional tags included with these (for example, opening_hours) and use that as a metric for measuring “completeness” of the dataset and to focus on additional data to collect (and locations to fill out).

Case Study 2: Generating statistics about mapathons

OSM has a history of holding mapping parties. Mapping parties are events where interested people get together and wander around outside, gathering and improving information about sites (and sights) that they pass. Another form of mapping party is the mapathon, which brings together armchair and desk mappers to focus on improving data in another part of the world.

Mapathons are a popular way of enlisting volunteers for Missing Maps, a collaboration between many NGOs, education institutions, and civil society groups that aims to map the most vulnerable places in the developing world to support international and local NGOs and individuals. One common way that volunteers participate is to trace buildings and roads from aerial imagery, providing baseline data that can later be verified by Missing Maps staff and volunteers working in the areas being mapped.

(Image and data from the American Red Cross)

Data collected during these events lends itself to a couple different types of questions. People like competition, so Missing Maps has developed a set of leaderboards that allow people to see where they stand relative to other mappers and how different groups compare. To facilitate this, hashtags (such as #missingmaps) are included in OSM changeset comments. To do similar ad hoc analysis, you need to query the list of changesets, filter by the presence of certain hashtags in the comments, and group things by username.

Now, find changes made during Missing Maps mapathons at George Mason University (using the #gmu hashtag):

SELECT *
FROM changesets
WHERE regexp_like(tags['comment'], '(?i)#gmu');

This includes all tags associated with a changeset, which typically include a mapper-provided comment about the changes made (often with additional hashtags corresponding to OSM Tasking Manager projects) as well as information about the editor used, imagery referenced, etc.

If you’re interested in the number of individual users who have mapped as part of the Missing Maps project, you can write a query such as:

SELECT COUNT(DISTINCT uid)
FROM changesets
WHERE regexp_like(tags['comment'], '(?i)#missingmaps');

25,610 people (as of this writing)!

Back at GMU, you’d like to know who the most prolific mappers are:

SELECT user, count(*) AS edits
FROM changesets
WHERE regexp_like(tags['comment'], '(?i)#gmu')
GROUP BY user
ORDER BY count(*) DESC;

Nice job, BrokenString!

It’s also interesting to see what types of features were added or changed. You can do that by using a JOIN between the changesets and planet tables:

SELECT planet.*, changesets.tags
FROM planet
JOIN changesets ON planet.changeset = changesets.id
WHERE regexp_like(changesets.tags['comment'], '(?i)#gmu');


Using this as a starting point, you could break down the types of features, highlight popular parts of the world, or do something entirely different.

Case Study 3: Building Condition

With building outlines having been produced by mappers around (and across) the world, local Missing Maps volunteers (often from local Red Cross / Red Crescent societies) go around with Android phones running OpenDataKit  and OpenMapKit to verify that the buildings in question actually exist and to add additional information about them, such as the number of stories, use (residential, commercial, etc.), material, and condition.

This data can be used in many ways: it can provide local geographic context (by being included in map source data) as well as facilitate investment by development agencies such as the World Bank.

Here are a collection of buildings mapped in Dhaka, Bangladesh:

(Map and data © OpenStreetMap contributors)

For NGO staff to determine resource allocation, it can be helpful to enumerate and show buildings in varying conditions. Building conditions within an area can be a means of understanding where to focus future investments.

Querying for buildings is a bit more complicated than working with points or changesets. Of the three core OSM element types—node, way, and relation, only nodes (points) have geographic information associated with them. Ways (lines or polygons) are composed of nodes and inherit vertices from them. This means that ways must be reconstituted in order to effectively query by bounding box.

This results in a fairly complex query. You’ll notice that this is similar to the query used to find buildings tagged as medical facilities above. Here you’re counting buildings in Dhaka according to building condition:

-- select out nodes and relevant columns
WITH nodes AS (
  SELECT
    id,
    tags,
    lat,
    lon
  FROM planet
  WHERE type = 'node'
),
-- select out ways and relevant columns
ways AS (
  SELECT
    id,
    tags,
    nds
  FROM planet
  WHERE type = 'way'
),
-- filter nodes to only contain those present within a bounding box
nodes_in_bbox AS (
  SELECT *
  FROM nodes
  WHERE lon BETWEEN 90.3907 AND 90.4235
    AND lat BETWEEN 23.6948 AND 23.7248
),
-- fetch and expand referenced ways
referenced_ways AS (
  SELECT
    ways.*,
    t.*
  FROM ways
  CROSS JOIN UNNEST(nds) WITH ORDINALITY AS t (nd, idx)
  JOIN nodes_in_bbox nodes ON nodes.id = nd.ref
),
-- fetch *all* referenced nodes (even those outside the queried bounding box)
exploded_ways AS (
  SELECT
    ways.id,
    ways.tags,
    idx,
    nd.ref,
    nodes.id node_id,
    ARRAY[nodes.lat, nodes.lon] coordinates
  FROM referenced_ways ways
  JOIN nodes ON nodes.id = nd.ref
  ORDER BY ways.id, idx
)
-- query ways matching the bounding box
SELECT
  count(*),
  tags['building:condition']
FROM exploded_ways
GROUP BY tags['building:condition']
ORDER BY count(*) DESC;


Most buildings are unsurveyed (125,000 is a lot!), but of those that have been, most are average (as you’d expect). If you were to further group these buildings geographically, you’d have a starting point to determine which areas of Dhaka might benefit the most.

Conclusion

OSM data, while incredibly rich and valuable, can be difficult to work with, due to both its size and its data model. In addition to the time spent downloading large files to work with locally, time is spent installing and configuring tools and converting the data into more queryable formats. We think Amazon Athena combined with the ORC version of the planet file, updated on a weekly basis, is an extremely powerful and cost-effective combination. This allows anyone to start querying billions of records with simple SQL in no time, giving you the chance to focus on the analysis, not the infrastructure.

To download the data and experiment with it using other tools, the latest OSM ORC-formatted file is available via OSM on AWS at s3://osm-pds/planet/planet-latest.orcs3://osm-pds/planet-history/history-latest.orc, and s3://osm-pds/changesets/changesets-latest.orc.

We look forward to hearing what you find out!

AWS Hot Startups – March 2017

Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/aws-hot-startups-march-2017/

As the madness of March rounds up, take a break from all the basketball and check out the cool startups Tina Barr brings you for this month!

-Ana


The arrival of spring brings five new startups this month:

  • Amino Apps – providing social networks for hundreds of thousands of communities.
  • Appboy – empowering brands to strengthen customer relationships.
  • Arterys – revolutionizing the medical imaging industry.
  • Protenus – protecting patient data for healthcare organizations.
  • Syapse – improving targeted cancer care with shared data from across the country.

In case you missed them, check out February’s hot startups here.

Amino Apps (New York, NY)
Amino Logo
Amino Apps was founded on the belief that interest-based communities were underdeveloped and outdated, particularly when it came to mobile. CEO Ben Anderson and CTO Yin Wang created the app to give users access to hundreds of thousands of communities, each of them a complete social network dedicated to a single topic. Some of the largest communities have over 1 million members and are built around topics like popular TV shows, video games, sports, and an endless number of hobbies and other interests. Amino hosts communities from around the world and is currently available in six languages with many more on the way.

Navigating the Amino app is easy. Simply download the app (iOS or Android), sign up with a valid email address, choose a profile picture, and start exploring. Users can search for communities and join any that fit their interests. Each community has chatrooms, multimedia content, quizzes, and a seamless commenting system. If a community doesn’t exist yet, users can create it in minutes using the Amino Creator and Manager app (ACM). The largest user-generated communities are turned into their own apps, which gives communities their own piece of real estate on members’ phones, as well as in app stores.

Amino’s vast global network of hundreds of thousands of communities is run on AWS services. Every day users generate, share, and engage with an enormous amount of content across hundreds of mobile applications. By leveraging AWS services including Amazon EC2, Amazon RDS, Amazon S3, Amazon SQS, and Amazon CloudFront, Amino can continue to provide new features to their users while scaling their service capacity to keep up with user growth.

Interested in joining Amino? Check out their jobs page here.

Appboy (New York, NY)
In 2011, Bill Magnuson, Jon Hyman, and Mark Ghermezian saw a unique opportunity to strengthen and humanize relationships between brands and their customers through technology. The trio created Appboy to empower brands to build long-term relationships with their customers and today they are the leading lifecycle engagement platform for marketing, growth, and engagement teams. The team recognized that as rapid mobile growth became undeniable, many brands were becoming frustrated with the lack of compelling and seamless cross-channel experiences offered by existing marketing clouds. Many of today’s top mobile apps and enterprise companies trust Appboy to take their marketing to the next level. Appboy manages user profiles for nearly 700 million monthly active users, and is used to power more than 10 billion personalized messages monthly across a multitude of channels and devices.

Appboy creates a holistic user profile that offers a single view of each customer. That user profile in turn powers contextual cross-channel messaging, lifecycle engagement automation, and robust campaign insights and optimization opportunities. Appboy offers solutions that allow brands to create push notifications, targeted emails, in-app and in-browser messages, news feed cards, and webhooks to enhance the user experience and increase customer engagement. The company prides itself on its interoperability, connecting to a variety of complimentary marketing tools and technologies so brands can build the perfect stack to enable their strategies and experiments in real time.

AWS makes it easy for Appboy to dynamically size all of their service components and automatically scale up and down as needed. They use an array of services including Elastic Load Balancing, AWS Lambda, Amazon CloudWatch, Auto Scaling groups, and Amazon S3 to help scale capacity and better deal with unpredictable customer loads.

To keep up with the latest marketing trends and tactics, visit the Appboy digital magazine, Relate. Appboy was also recently featured in the #StartupsOnAir video series where they gave insight into their AWS usage.

Arterys (San Francisco, CA)
Getting test results back from a physician can often be a time consuming and tedious process. Clinicians typically employ a variety of techniques to manually measure medical images and then make their assessments. Arterys founders Fabien Beckers, John Axerio-Cilies, Albert Hsiao, and Shreyas Vasanawala realized that much more computation and advanced analytics were needed to harness all of the valuable information in medical images, especially those generated by MRI and CT scanners. Clinicians were often skipping measurements and making assessments based mostly on qualitative data. Their solution was to start a cloud/AI software company focused on accelerating data-driven medicine with advanced software products for post-processing of medical images.

Arterys’ products provide timely, accurate, and consistent quantification of images, improve speed to results, and improve the quality of the information offered to the treating physician. This allows for much better tracking of a patient’s condition, and thus better decisions about their care. Advanced analytics, such as deep learning and distributed cloud computing, are used to process images. The first Arterys product can contour cardiac anatomy as accurately as experts, but takes only 15-20 seconds instead of the 45-60 minutes required to do it manually. Their computing cloud platform is also fully HIPAA compliant.

Arterys relies on a variety of AWS services to process their medical images. Using deep learning and other advanced analytic tools, Arterys is able to render images without latency over a web browser using AWS G2 instances. They use Amazon EC2 extensively for all of their compute needs, including inference and rendering, and Amazon S3 is used to archive images that aren’t needed immediately, as well as manage costs. Arterys also employs Amazon Route 53, AWS CloudTrail, and Amazon EC2 Container Service.

Check out this quick video about the technology that Arterys is creating. They were also recently featured in the #StartupsOnAir video series and offered a quick demo of their product.

Protenus (Baltimore, MD)
Protenus Logo
Protenus founders Nick Culbertson and Robert Lord were medical students at Johns Hopkins Medical School when they saw first-hand how Electronic Health Record (EHR) systems could be used to improve patient care and share clinical data more efficiently. With increased efficiency came a huge issue – an onslaught of serious security and privacy concerns. Over the past two years, 140 million medical records have been breached, meaning that approximately 1 in 3 Americans have had their health data compromised. Health records contain a repository of sensitive information and a breach of that data can cause major havoc in a patient’s life – namely identity theft, prescription fraud, Medicare/Medicaid fraud, and improper performance of medical procedures. Using their experience and knowledge from former careers in the intelligence community and involvement in a leading hedge fund, Nick and Robert developed the prototype and algorithms that launched Protenus.

Today, Protenus offers a number of solutions that detect breaches and misuse of patient data for healthcare organizations nationwide. Using advanced analytics and AI, Protenus’ health data insights platform understands appropriate vs. inappropriate use of patient data in the EHR. It also protects privacy, aids compliance with HIPAA regulations, and ensures trust for patients and providers alike.

Protenus built and operates its SaaS offering atop Amazon EC2, where Dedicated Hosts and encrypted Amazon EBS volume are used to ensure compliance with HIPAA regulation for the storage of Protected Health Information. They use Elastic Load Balancing and Amazon Route 53 for DNS, enabling unique, secure client specific access points to their Protenus instance.

To learn more about threats to patient data, read Hospitals’ Biggest Threat to Patient Data is Hiding in Plain Sight on the Protenus blog. Also be sure to check out their recent video in the #StartupsOnAir series for more insight into their product.

Syapse (Palo Alto, CA)
Syapse provides a comprehensive software solution that enables clinicians to treat patients with precision medicine for targeted cancer therapies — treatments that are designed and chosen using genetic or molecular profiling. Existing hospital IT doesn’t support the robust infrastructure and clinical workflows required to treat patients with precision medicine at scale, but Syapse centralizes and organizes patient data to clinicians at the point of care. Syapse offers a variety of solutions for oncologists that allow them to access the full scope of patient data longitudinally, view recommended treatments or clinical trials for similar patients, and track outcomes over time. These solutions are helping health systems across the country to improve patient outcomes by offering the most innovative care to cancer patients.

Leading health systems such as Stanford Health Care, Providence St. Joseph Health, and Intermountain Healthcare are using Syapse to improve patient outcomes, streamline clinical workflows, and scale their precision medicine programs. A group of experts known as the Molecular Tumor Board (MTB) reviews complex cases and evaluates patient data, documents notes, and disseminates treatment recommendations to the treating physician. Syapse also provides reports that give health system staff insight into their institution’s oncology care, which can be used toward quality improvement, business goals, and understanding variables in the oncology service line.

Syapse uses Amazon Virtual Private Cloud, Amazon EC2 Dedicated Instances, and Amazon Elastic Block Store to build a high-performance, scalable, and HIPAA-compliant data platform that enables health systems to make precision medicine part of routine cancer care for patients throughout the country.

Be sure to check out the Syapse blog to learn more and also their recent video on the #StartupsOnAir video series where they discuss their product, HIPAA compliance, and more about how they are using AWS.

Thank you for checking out another month of awesome hot startups!

-Tina Barr

 

Alleged KickassTorrents Owner Can be Extradited to The US, Court Rules

Post Syndicated from Ernesto original https://torrentfreak.com/alleged-kickasstorrents-owner-can-be-extradited-to-the-us-court-rules-170301/

kickasstorrents_500x500Last summer Polish law enforcement officers arrested Artem Vaulin, the alleged founder of KickassTorrents, who’s been held in custody since.

Polish authorities acted on a criminal complaint from the US Government, which accused him of criminal copyright infringement and money laundering.

Facing severe back problems, Vaulin was transferred to a hospital in December. In both the US and Poland his legal has been fighting the extradition request in court, which resulted in a setback this week.

Following a series of hearings that started early last month, the Warsaw District Court ruled in first instance that the alleged KickassTorrents owner can be extradited.

While the ruling is bad news, the decision isn’t final yet. In Poland, the legality of extraditions is decided in two stages. If the court also agrees in second instance, the Minister of Justice then issues the final decision.

The Court has not yet released the details of its initial decision. According to the Court’s press service “the written opinion on the decision is postponed for the duration of 7 days, because of the complexity and breadth of the case.”

If the lower court grants the extradition request in full, Vaulin can also take the case to the Supreme Court for a review. Meanwhile, the alleged KickassTorrents operator remains in a local hospital where he’s still being treated for his back problems.

If the extradition is granted, Vaulin will have to face a criminal trial in the United States.

The Department of Justice accuses Vaulin and his co-conspirators of operating the largest piracy haven on the Internet, which made millions of dollars per year. As such, they are being held responsible for unlawfully distributing over $1 billion of copyrighted materials

In addition to the Polish extradition process, an equally crucial ruling is forthcoming in the US as well.

Last month Vaulin’s legal counsel asked the Illinois District Court to dismiss the criminal indictment and set his client free, arguing that there’s no proof of actual criminal copyright infringement.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

AWS Marketplace Adds Healthcare & Life Sciences Category

Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/aws-marketplace-adds-healthcare-life-sciences-category/

Wilson To and Luis Daniel Soto are our guest bloggers today, telling you about a new industry vertical category that is being added to the AWS Marketplace.Check it out!

-Ana


AWS Marketplace is a managed and curated software catalog that helps customers innovate faster and reduce costs, by making it easy to discover, evaluate, procure, immediately deploy and manage 3rd party software solutions.  To continue supporting our customers, we’re now adding a new industry vertical category: Healthcare & Life Sciences.

healthpost

This new category brings together best-of-breed software tools and solutions from our growing vendor ecosystem that have been adapted to, or built from the ground up, to serve the healthcare and life sciences industry.

Healthcare
Within the AWS Marketplace HCLS category, you can find solutions for Clinical information systems, population health and analytics, health administration and compliance services. Some offerings include:

  1. Allgress GetCompliant HIPAA Edition – Reduce the cost of compliance management and adherence by providing compliance professionals improved efficiency by automating the management of their compliance processes around HIPAA.
  2. ZH Healthcare BlueEHS – Deploy a customizable, ONC-certified EHR that empowers doctors to define their clinical workflows and treatment plans to enhance patient outcomes.
  3. Dicom Systems DCMSYS CloudVNA – DCMSYS Vendor Neutral Archive offers a cost-effective means of consolidating disparate imaging systems into a single repository, while providing enterprise-wide access and archiving of all medical images and other medical records.

Life Sciences

  1. National Instruments LabVIEW – Graphical system design software that provides scientists and engineers with the tools needed to create and deploy measurement and control systems through simple yet powerful networks.
  2. NCBI Blast – Analysis tools and datasets that allow users to perform flexible sequence similarity searches.
  3. Acellera AceCloud – Innovative tools and technologies for the study of biophysical phenomena. Acellera leverages the power of AWS Cloud to enable molecular dynamics simulations.

Healthcare and life sciences companies deal with huge amounts of data, and many of their data sets are some of the most complex in the world. From physicians and nurses to researchers and analysts, these users are typically hampered by their current systems. Their legacy software cannot let them efficiently store or effectively make use of the immense amounts of data they work with. And protracted and complex software purchasing cycles keep them from innovating at speed to stay ahead of market and industry trends. Data analytics and business intelligence solutions in AWS Marketplace offer specialized support for these industries, including:

  • Tableau Server – Enable teams to visualize across costs, needs, and outcomes at once to make the most of resources. The solution helps hospitals identify the impact of evidence-based medicine, wellness programs, and patient engagement.
  • TIBCO Spotfire and JasperSoft. TIBCO provides technical teams powerful data visualization, data analytics, and predictive analytics for Amazon Redshift, Amazon RDS, and popular database sources via AWS Marketplace.
  • Qlik Sense Enterprise. Qlik enables healthcare organizations to explore clinical, financial and operational data through visual analytics to discover insights which lead to improvements in care, reduced costs and delivering higher value to patients.

With more than 5,000 listings across more than 35 categories, AWS Marketplace simplifies software licensing and procurement by enabling customers to accept user agreements, choose pricing options, and automate the deployment of software and associated AWS resources with just a few clicks. AWS Marketplace also simplifies billing for customers by delivering a single invoice detailing business software and AWS resource usage on a monthly basis.

With AWS Marketplace, we can help drive operational efficiencies and reduce costs in these ways:

  • Easily bring in new solutions to solve increasingly complex issues, gain quick insight into the huge amounts of data users handle.
  • Healthcare data will be more actionable. We offer pay-as-you-go solutions that make it considerably easier and more cost-effective to ingest, store, analyze, and disseminate data.
  • Deploy healthcare and life sciences software with 1-Click ease — then evaluate and deploy it in minutes. Users can now speed up their historically slow cycles in software procurement and implementation.
  • Pay only for what’s consumed — and manage software costs on your AWS bill.
  • In addition to the already secure AWS Cloud, AWS Marketplace offers industry-leading solutions to help you secure operating systems, platforms, applications and data that can integrate with existing controls in your AWS Cloud and hybrid environment.

Click here to see who the current list of vendors are in our new Healthcare & Life Sciences category.

Come on In
If you are a healthcare ISV and would like to list and sell your products on AWS, visit our Sell in AWS Marketplace page.

– Wilson To and Luis Daniel Soto

US and KickassTorrents Go Head to Head in Court

Post Syndicated from Ernesto original https://torrentfreak.com/us-and-kickasstorrents-go-head-to-head-in-court-170202/

kickasstorrents_500x500This week KickassTorrents’ alleged owner Artem Vaulin asked the Illinois District Court to dismiss the criminal indictment and set him free.

The fundamental flaw of the case, according to defense lawyer Ira Rothken, is that torrent files themselves are not copyrighted content.

In addition, he argued that the secondary copyright infringement claims would fail as these are non-existent under criminal law.

District Court Judge John Lee previously questioned the evidence in the case and according to Rothken, it is certainly not enough to keep his client behind bars. This is also what he told the court during the hearing this week, stressing that torrents themselves are not copyrighted.

“We believe that the indictment against Artem Vaulin in the KAT torrent files case is defective and should be dismissed. Torrent files are not content files. The reproduction and distribution of torrent files are not a crime,” Rothken tells TF.

“If a third party uses torrent files to infringe it is after they leave the KAT site behind and such conduct is too random, inconsistent, and attenuated to impose criminal liability on Mr. Vaulin. The government cannot use the civil judge-made law in Grokster as a theory in a criminal case.”

Furthermore, Rothken argued that the US indictment is flawed because it fails to allege an actual criminal copyright infringement anywhere in the world, the United States included. The defense likened KickassTorrents to general search engines such as Google instead.

On the other side of the aisle stood US Department of Justice prosecutor Devlin Su. He urged the court to wait for the extradition hearing in Poland before ruling on the request, noting that Vaulin should come to the US voluntarily if he wanted to speed things up.

According to the prosecution, KickassTorrents operated as a piracy flea market, with an advertising revenue of about $12.5 million to $22.3 million. Comparing it with Google is nonsense, Su argued.

“Google is not dedicated to uploading and distributing copyrighted works,” Law360 quotes the prosecutor.

It is now up to the Illinois District Court to decide how to move forward. The defense is hoping for an outright dismissal, while the U.S. wants to move forward.

Meanwhile, over in Poland, Vaulin remains in custody after he was denied bail. Facing severe health issues, the Ukrainian was transferred from Polish prison to a local hospital a few weeks ago, where he remains under heavy guard.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Security and the Internet of Things

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/02/security_and_th.html

Last year, on October 21, your digital video recorder ­- or at least a DVR like yours ­- knocked Twitter off the internet. Someone used your DVR, along with millions of insecure webcams, routers, and other connected devices, to launch an attack that started a chain reaction, resulting in Twitter, Reddit, Netflix, and many sites going off the internet. You probably didn’t realize that your DVR had that kind of power. But it does.

All computers are hackable. This has as much to do with the computer market as it does with the technologies. We prefer our software full of features and inexpensive, at the expense of security and reliability. That your computer can affect the security of Twitter is a market failure. The industry is filled with market failures that, until now, have been largely ignorable. As computers continue to permeate our homes, cars, businesses, these market failures will no longer be tolerable. Our only solution will be regulation, and that regulation will be foisted on us by a government desperate to “do something” in the face of disaster.

In this article I want to outline the problems, both technical and political, and point to some regulatory solutions. Regulation might be a dirty word in today’s political climate, but security is the exception to our small-government bias. And as the threats posed by computers become greater and more catastrophic, regulation will be inevitable. So now’s the time to start thinking about it.

We also need to reverse the trend to connect everything to the internet. And if we risk harm and even death, we need to think twice about what we connect and what we deliberately leave uncomputerized.

If we get this wrong, the computer industry will look like the pharmaceutical industry, or the aircraft industry. But if we get this right, we can maintain the innovative environment of the internet that has given us so much.

**********

We no longer have things with computers embedded in them. We have computers with things attached to them.

Your modern refrigerator is a computer that keeps things cold. Your oven, similarly, is a computer that makes things hot. An ATM is a computer with money inside. Your car is no longer a mechanical device with some computers inside; it’s a computer with four wheels and an engine. Actually, it’s a distributed system of over 100 computers with four wheels and an engine. And, of course, your phones became full-power general-purpose computers in 2007, when the iPhone was introduced.

We wear computers: fitness trackers and computer-enabled medical devices ­- and, of course, we carry our smartphones everywhere. Our homes have smart thermostats, smart appliances, smart door locks, even smart light bulbs. At work, many of those same smart devices are networked together with CCTV cameras, sensors that detect customer movements, and everything else. Cities are starting to embed smart sensors in roads, streetlights, and sidewalk squares, also smart energy grids and smart transportation networks. A nuclear power plant is really just a computer that produces electricity, and ­- like everything else we’ve just listed -­ it’s on the internet.

The internet is no longer a web that we connect to. Instead, it’s a computerized, networked, and interconnected world that we live in. This is the future, and what we’re calling the Internet of Things.

Broadly speaking, the Internet of Things has three parts. There are the sensors that collect data about us and our environment: smart thermostats, street and highway sensors, and those ubiquitous smartphones with their motion sensors and GPS location receivers. Then there are the “smarts” that figure out what the data means and what to do about it. This includes all the computer processors on these devices and ­- increasingly ­- in the cloud, as well as the memory that stores all of this information. And finally, there are the actuators that affect our environment. The point of a smart thermostat isn’t to record the temperature; it’s to control the furnace and the air conditioner. Driverless cars collect data about the road and the environment to steer themselves safely to their destinations.

You can think of the sensors as the eyes and ears of the internet. You can think of the actuators as the hands and feet of the internet. And you can think of the stuff in the middle as the brain. We are building an internet that senses, thinks, and acts.

This is the classic definition of a robot. We’re building a world-size robot, and we don’t even realize it.

To be sure, it’s not a robot in the classical sense. We think of robots as discrete autonomous entities, with sensors, brain, and actuators all together in a metal shell. The world-size robot is distributed. It doesn’t have a singular body, and parts of it are controlled in different ways by different people. It doesn’t have a central brain, and it has nothing even remotely resembling a consciousness. It doesn’t have a single goal or focus. It’s not even something we deliberately designed. It’s something we have inadvertently built out of the everyday objects we live with and take for granted. It is the extension of our computers and networks into the real world.

This world-size robot is actually more than the Internet of Things. It’s a combination of several decades-old computing trends: mobile computing, cloud computing, always-on computing, huge databases of personal information, the Internet of Things ­- or, more precisely, cyber-physical systems ­- autonomy, and artificial intelligence. And while it’s still not very smart, it’ll get smarter. It’ll get more powerful and more capable through all the interconnections we’re building.

It’ll also get much more dangerous.

**********

Computer security has been around for almost as long as computers have been. And while it’s true that security wasn’t part of the design of the original internet, it’s something we have been trying to achieve since its beginning.

I have been working in computer security for over 30 years: first in cryptography, then more generally in computer and network security, and now in general security technology. I have watched computers become ubiquitous, and have seen firsthand the problems ­- and solutions ­- of securing these complex machines and systems. I’m telling you all this because what used to be a specialized area of expertise now affects everything. Computer security is now everything security. There’s one critical difference, though: The threats have become greater.

Traditionally, computer security is divided into three categories: confidentiality, integrity, and availability. For the most part, our security concerns have largely centered around confidentiality. We’re concerned about our data and who has access to it ­- the world of privacy and surveillance, of data theft and misuse.

But threats come in many forms. Availability threats: computer viruses that delete our data, or ransomware that encrypts our data and demands payment for the unlock key. Integrity threats: hackers who can manipulate data entries can do things ranging from changing grades in a class to changing the amount of money in bank accounts. Some of these threats are pretty bad. Hospitals have paid tens of thousands of dollars to criminals whose ransomware encrypted critical medical files. JPMorgan Chase spends half a billion on cybersecurity a year.

Today, the integrity and availability threats are much worse than the confidentiality threats. Once computers start affecting the world in a direct and physical manner, there are real risks to life and property. There is a fundamental difference between crashing your computer and losing your spreadsheet data, and crashing your pacemaker and losing your life. This isn’t hyperbole; recently researchers found serious security vulnerabilities in St. Jude Medical’s implantable heart devices. Give the internet hands and feet, and it will have the ability to punch and kick.

Take a concrete example: modern cars, those computers on wheels. The steering wheel no longer turns the axles, nor does the accelerator pedal change the speed. Every move you make in a car is processed by a computer, which does the actual controlling. A central computer controls the dashboard. There’s another in the radio. The engine has 20 or so computers. These are all networked, and increasingly autonomous.

Now, let’s start listing the security threats. We don’t want car navigation systems to be used for mass surveillance, or the microphone for mass eavesdropping. We might want it to be used to determine a car’s location in the event of a 911 call, and possibly to collect information about highway congestion. We don’t want people to hack their own cars to bypass emissions-control limitations. We don’t want manufacturers or dealers to be able to do that, either, as Volkswagen did for years. We can imagine wanting to give police the ability to remotely and safely disable a moving car; that would make high-speed chases a thing of the past. But we definitely don’t want hackers to be able to do that. We definitely don’t want them disabling the brakes in every car without warning, at speed. As we make the transition from driver-controlled cars to cars with various driver-assist capabilities to fully driverless cars, we don’t want any of those critical components subverted. We don’t want someone to be able to accidentally crash your car, let alone do it on purpose. And equally, we don’t want them to be able to manipulate the navigation software to change your route, or the door-lock controls to prevent you from opening the door. I could go on.

That’s a lot of different security requirements, and the effects of getting them wrong range from illegal surveillance to extortion by ransomware to mass death.

**********

Our computers and smartphones are as secure as they are because companies like Microsoft, Apple, and Google spend a lot of time testing their code before it’s released, and quickly patch vulnerabilities when they’re discovered. Those companies can support large, dedicated teams because those companies make a huge amount of money, either directly or indirectly, from their software ­ and, in part, compete on its security. Unfortunately, this isn’t true of embedded systems like digital video recorders or home routers. Those systems are sold at a much lower margin, and are often built by offshore third parties. The companies involved simply don’t have the expertise to make them secure.

At a recent hacker conference, a security researcher analyzed 30 home routers and was able to break into half of them, including some of the most popular and common brands. The denial-of-service attacks that forced popular websites like Reddit and Twitter off the internet last October were enabled by vulnerabilities in devices like webcams and digital video recorders. In August, two security researchers demonstrated a ransomware attack on a smart thermostat.

Even worse, most of these devices don’t have any way to be patched. Companies like Microsoft and Apple continuously deliver security patches to your computers. Some home routers are technically patchable, but in a complicated way that only an expert would attempt. And the only way for you to update the firmware in your hackable DVR is to throw it away and buy a new one.

The market can’t fix this because neither the buyer nor the seller cares. The owners of the webcams and DVRs used in the denial-of-service attacks don’t care. Their devices were cheap to buy, they still work, and they don’t know any of the victims of the attacks. The sellers of those devices don’t care: They’re now selling newer and better models, and the original buyers only cared about price and features. There is no market solution, because the insecurity is what economists call an externality: It’s an effect of the purchasing decision that affects other people. Think of it kind of like invisible pollution.

**********

Security is an arms race between attacker and defender. Technology perturbs that arms race by changing the balance between attacker and defender. Understanding how this arms race has unfolded on the internet is essential to understanding why the world-size robot we’re building is so insecure, and how we might secure it. To that end, I have five truisms, born from what we’ve already learned about computer and internet security. They will soon affect the security arms race everywhere.

Truism No. 1: On the internet, attack is easier than defense.

There are many reasons for this, but the most important is the complexity of these systems. More complexity means more people involved, more parts, more interactions, more mistakes in the design and development process, more of everything where hidden insecurities can be found. Computer-security experts like to speak about the attack surface of a system: all the possible points an attacker might target and that must be secured. A complex system means a large attack surface. The defender has to secure the entire attack surface. The attacker just has to find one vulnerability ­- one unsecured avenue for attack -­ and gets to choose how and when to attack. It’s simply not a fair battle.

There are other, more general, reasons why attack is easier than defense. Attackers have a natural agility that defenders often lack. They don’t have to worry about laws, and often not about morals or ethics. They don’t have a bureaucracy to contend with, and can more quickly make use of technical innovations. Attackers also have a first-mover advantage. As a society, we’re generally terrible at proactive security; we rarely take preventive security measures until an attack actually happens. So more advantages go to the attacker.

Truism No. 2: Most software is poorly written and insecure.

If complexity isn’t enough, we compound the problem by producing lousy software. Well-written software, like the kind found in airplane avionics, is both expensive and time-consuming to produce. We don’t want that. For the most part, poorly written software has been good enough. We’d all rather live with buggy software than pay the prices good software would require. We don’t mind if our games crash regularly, or our business applications act weird once in a while. Because software has been largely benign, it hasn’t mattered. This has permeated the industry at all levels. At universities, we don’t teach how to code well. Companies don’t reward quality code in the same way they reward fast and cheap. And we consumers don’t demand it.

But poorly written software is riddled with bugs, sometimes as many as one per 1,000 lines of code. Some of them are inherent in the complexity of the software, but most are programming mistakes. Not all bugs are vulnerabilities, but some are.

Truism No. 3: Connecting everything to each other via the internet will expose new vulnerabilities.

The more we network things together, the more vulnerabilities on one thing will affect other things. On October 21, vulnerabilities in a wide variety of embedded devices were all harnessed together to create what hackers call a botnet. This botnet was used to launch a distributed denial-of-service attack against a company called Dyn. Dyn provided a critical internet function for many major internet sites. So when Dyn went down, so did all those popular websites.

These chains of vulnerabilities are everywhere. In 2012, journalist Mat Honan suffered a massive personal hack because of one of them. A vulnerability in his Amazon account allowed hackers to get into his Apple account, which allowed them to get into his Gmail account. And in 2013, the Target Corporation was hacked by someone stealing credentials from its HVAC contractor.

Vulnerabilities like these are particularly hard to fix, because no one system might actually be at fault. It might be the insecure interaction of two individually secure systems.

Truism No. 4: Everybody has to stop the best attackers in the world.

One of the most powerful properties of the internet is that it allows things to scale. This is true for our ability to access data or control systems or do any of the cool things we use the internet for, but it’s also true for attacks. In general, fewer attackers can do more damage because of better technology. It’s not just that these modern attackers are more efficient, it’s that the internet allows attacks to scale to a degree impossible without computers and networks.

This is fundamentally different from what we’re used to. When securing my home against burglars, I am only worried about the burglars who live close enough to my home to consider robbing me. The internet is different. When I think about the security of my network, I have to be concerned about the best attacker possible, because he’s the one who’s going to create the attack tool that everyone else will use. The attacker that discovered the vulnerability used to attack Dyn released the code to the world, and within a week there were a dozen attack tools using it.

Truism No. 5: Laws inhibit security research.

The Digital Millennium Copyright Act is a terrible law that fails at its purpose of preventing widespread piracy of movies and music. To make matters worse, it contains a provision that has critical side effects. According to the law, it is a crime to bypass security mechanisms that protect copyrighted work, even if that bypassing would otherwise be legal. Since all software can be copyrighted, it is arguably illegal to do security research on these devices and to publish the result.

Although the exact contours of the law are arguable, many companies are using this provision of the DMCA to threaten researchers who expose vulnerabilities in their embedded systems. This instills fear in researchers, and has a chilling effect on research, which means two things: (1) Vendors of these devices are more likely to leave them insecure, because no one will notice and they won’t be penalized in the market, and (2) security engineers don’t learn how to do security better.
Unfortunately, companies generally like the DMCA. The provisions against reverse-engineering spare them the embarrassment of having their shoddy security exposed. It also allows them to build proprietary systems that lock out competition. (This is an important one. Right now, your toaster cannot force you to only buy a particular brand of bread. But because of this law and an embedded computer, your Keurig coffee maker can force you to buy a particular brand of coffee.)

**********
In general, there are two basic paradigms of security. We can either try to secure something well the first time, or we can make our security agile. The first paradigm comes from the world of dangerous things: from planes, medical devices, buildings. It’s the paradigm that gives us secure design and secure engineering, security testing and certifications, professional licensing, detailed preplanning and complex government approvals, and long times-to-market. It’s security for a world where getting it right is paramount because getting it wrong means people dying.

The second paradigm comes from the fast-moving and heretofore largely benign world of software. In this paradigm, we have rapid prototyping, on-the-fly updates, and continual improvement. In this paradigm, new vulnerabilities are discovered all the time and security disasters regularly happen. Here, we stress survivability, recoverability, mitigation, adaptability, and muddling through. This is security for a world where getting it wrong is okay, as long as you can respond fast enough.

These two worlds are colliding. They’re colliding in our cars -­ literally -­ in our medical devices, our building control systems, our traffic control systems, and our voting machines. And although these paradigms are wildly different and largely incompatible, we need to figure out how to make them work together.

So far, we haven’t done very well. We still largely rely on the first paradigm for the dangerous computers in cars, airplanes, and medical devices. As a result, there are medical systems that can’t have security patches installed because that would invalidate their government approval. In 2015, Chrysler recalled 1.4 million cars to fix a software vulnerability. In September 2016, Tesla remotely sent a security patch to all of its Model S cars overnight. Tesla sure sounds like it’s doing things right, but what vulnerabilities does this remote patch feature open up?

**********
Until now we’ve largely left computer security to the market. Because the computer and network products we buy and use are so lousy, an enormous after-market industry in computer security has emerged. Governments, companies, and people buy the security they think they need to secure themselves. We’ve muddled through well enough, but the market failures inherent in trying to secure this world-size robot will soon become too big to ignore.

Markets alone can’t solve our security problems. Markets are motivated by profit and short-term goals at the expense of society. They can’t solve collective-action problems. They won’t be able to deal with economic externalities, like the vulnerabilities in DVRs that resulted in Twitter going offline. And we need a counterbalancing force to corporate power.

This all points to policy. While the details of any computer-security system are technical, getting the technologies broadly deployed is a problem that spans law, economics, psychology, and sociology. And getting the policy right is just as important as getting the technology right because, for internet security to work, law and technology have to work together. This is probably the most important lesson of Edward Snowden’s NSA disclosures. We already knew that technology can subvert law. Snowden demonstrated that law can also subvert technology. Both fail unless each work. It’s not enough to just let technology do its thing.

Any policy changes to secure this world-size robot will mean significant government regulation. I know it’s a sullied concept in today’s world, but I don’t see any other possible solution. It’s going to be especially difficult on the internet, where its permissionless nature is one of the best things about it and the underpinning of its most world-changing innovations. But I don’t see how that can continue when the internet can affect the world in a direct and physical manner.

**********

I have a proposal: a new government regulatory agency. Before dismissing it out of hand, please hear me out.

We have a practical problem when it comes to internet regulation. There’s no government structure to tackle this at a systemic level. Instead, there’s a fundamental mismatch between the way government works and the way this technology works that makes dealing with this problem impossible at the moment.

Government operates in silos. In the U.S., the FAA regulates aircraft. The NHTSA regulates cars. The FDA regulates medical devices. The FCC regulates communications devices. The FTC protects consumers in the face of “unfair” or “deceptive” trade practices. Even worse, who regulates data can depend on how it is used. If data is used to influence a voter, it’s the Federal Election Commission’s jurisdiction. If that same data is used to influence a consumer, it’s the FTC’s. Use those same technologies in a school, and the Department of Education is now in charge. Robotics will have its own set of problems, and no one is sure how that is going to be regulated. Each agency has a different approach and different rules. They have no expertise in these new issues, and they are not quick to expand their authority for all sorts of reasons.

Compare that with the internet. The internet is a freewheeling system of integrated objects and networks. It grows horizontally, demolishing old technological barriers so that people and systems that never previously communicated now can. Already, apps on a smartphone can log health information, control your energy use, and communicate with your car. That’s a set of functions that crosses jurisdictions of at least four different government agencies, and it’s only going to get worse.

Our world-size robot needs to be viewed as a single entity with millions of components interacting with each other. Any solutions here need to be holistic. They need to work everywhere, for everything. Whether we’re talking about cars, drones, or phones, they’re all computers.

This has lots of precedent. Many new technologies have led to the formation of new government regulatory agencies. Trains did, cars did, airplanes did. Radio led to the formation of the Federal Radio Commission, which became the FCC. Nuclear power led to the formation of the Atomic Energy Commission, which eventually became the Department of Energy. The reasons were the same in every case. New technologies need new expertise because they bring with them new challenges. Governments need a single agency to house that new expertise, because its applications cut across several preexisting agencies. It’s less that the new agency needs to regulate -­ although that’s often a big part of it -­ and more that governments recognize the importance of the new technologies.

The internet has famously eschewed formal regulation, instead adopting a multi-stakeholder model of academics, businesses, governments, and other interested parties. My hope is that we can keep the best of this approach in any regulatory agency, looking more at the new U.S. Digital Service or the 18F office inside the General Services Administration. Both of those organizations are dedicated to providing digital government services, and both have collected significant expertise by bringing people in from outside of government, and both have learned how to work closely with existing agencies. Any internet regulatory agency will similarly need to engage in a high level of collaborate regulation -­ both a challenge and an opportunity.

I don’t think any of us can predict the totality of the regulations we need to ensure the safety of this world, but here’s a few. We need government to ensure companies follow good security practices: testing, patching, secure defaults -­ and we need to be able to hold companies liable when they fail to do these things. We need government to mandate strong personal data protections, and limitations on data collection and use. We need to ensure that responsible security research is legal and well-funded. We need to enforce transparency in design, some sort of code escrow in case a company goes out of business, and interoperability between devices of different manufacturers, to counterbalance the monopolistic effects of interconnected technologies. Individuals need the right to take their data with them. And internet-enabled devices should retain some minimal functionality if disconnected from the internet

I’m not the only one talking about this. I’ve seen proposals for a National Institutes of Health analog for cybersecurity. University of Washington law professor Ryan Calo has proposed a Federal Robotics Commission. I think it needs to be broader: maybe a Department of Technology Policy.

Of course there will be problems. There’s a lack of expertise in these issues inside government. There’s a lack of willingness in government to do the hard regulatory work. Industry is worried about any new bureaucracy: both that it will stifle innovation by regulating too much and that it will be captured by industry and regulate too little. A domestic regulatory agency will have to deal with the fundamentally international nature of the problem.

But government is the entity we use to solve problems like this. Governments have the scope, scale, and balance of interests to address the problems. It’s the institution we’ve built to adjudicate competing social interests and internalize market externalities. Left to their own devices, the market simply can’t. That we’re currently in the middle of an era of low government trust, where many of us can’t imagine government doing anything positive in an area like this, is to our detriment.

Here’s the thing: Governments will get involved, regardless. The risks are too great, and the stakes are too high. Government already regulates dangerous physical systems like cars and medical devices. And nothing motivates the U.S. government like fear. Remember 2001? A nominally small-government Republican president created the Office of Homeland Security 11 days after the terrorist attacks: a rushed and ill-thought-out decision that we’ve been trying to fix for over a decade. A fatal disaster will similarly spur our government into action, and it’s unlikely to be well-considered and thoughtful action. Our choice isn’t between government involvement and no government involvement. Our choice is between smarter government involvement and stupider government involvement. We have to start thinking about this now. Regulations are necessary, important, and complex; and they’re coming. We can’t afford to ignore these issues until it’s too late.

We also need to start disconnecting systems. If we cannot secure complex systems to the level required by their real-world capabilities, then we must not build a world where everything is computerized and interconnected.

There are other models. We can enable local communications only. We can set limits on collected and stored data. We can deliberately design systems that don’t interoperate with each other. We can deliberately fetter devices, reversing the current trend of turning everything into a general-purpose computer. And, most important, we can move toward less centralization and more distributed systems, which is how the internet was first envisioned.

This might be a heresy in today’s race to network everything, but large, centralized systems are not inevitable. The technical elites are pushing us in that direction, but they really don’t have any good supporting arguments other than the profits of their ever-growing multinational corporations.

But this will change. It will change not only because of security concerns, it will also change because of political concerns. We’re starting to chafe under the worldview of everything producing data about us and what we do, and that data being available to both governments and corporations. Surveillance capitalism won’t be the business model of the internet forever. We need to change the fabric of the internet so that evil governments don’t have the tools to create a horrific totalitarian state. And while good laws and regulations in Western democracies are a great second line of defense, they can’t be our only line of defense.

My guess is that we will soon reach a high-water mark of computerization and connectivity, and that afterward we will make conscious decisions about what and how we decide to interconnect. But we’re still in the honeymoon phase of connectivity. Governments and corporations are punch-drunk on our data, and the rush to connect everything is driven by an even greater desire for power and market share. One of the presentations released by Edward Snowden contained the NSA mantra: “Collect it all.” A similar mantra for the internet today might be: “Connect it all.”

The inevitable backlash will not be driven by the market. It will be deliberate policy decisions that put the safety and welfare of society above individual corporations and industries. It will be deliberate policy decisions that prioritize the security of our systems over the demands of the FBI to weaken them in order to make their law-enforcement jobs easier. It’ll be hard policy for many to swallow, but our safety will depend on it.

**********

The scenarios I’ve outlined, both the technological and economic trends that are causing them and the political changes we need to make to start to fix them, come from my years of working in internet-security technology and policy. All of this is informed by an understanding of both technology and policy. That turns out to be critical, and there aren’t enough people who understand both.

This brings me to my final plea: We need more public-interest technologists.

Over the past couple of decades, we’ve seen examples of getting internet-security policy badly wrong. I’m thinking of the FBI’s “going dark” debate about its insistence that computer devices be designed to facilitate government access, the “vulnerability equities process” about when the government should disclose and fix a vulnerability versus when it should use it to attack other systems, the debacle over paperless touch-screen voting machines, and the DMCA that I discussed above. If you watched any of these policy debates unfold, you saw policy-makers and technologists talking past each other.

Our world-size robot will exacerbate these problems. The historical divide between Washington and Silicon Valley -­ the mistrust of governments by tech companies and the mistrust of tech companies by governments ­- is dangerous.

We have to fix this. Getting IoT security right depends on the two sides working together and, even more important, having people who are experts in each working on both. We need technologists to get involved in policy, and we need policy-makers to get involved in technology. We need people who are experts in making both technology and technological policy. We need technologists on congressional staffs, inside federal agencies, working for NGOs, and as part of the press. We need to create a viable career path for public-interest technologists, much as there already is one for public-interest attorneys. We need courses, and degree programs in colleges, for people interested in careers in public-interest technology. We need fellowships in organizations that need these people. We need technology companies to offer sabbaticals for technologists wanting to go down this path. We need an entire ecosystem that supports people bridging the gap between technology and law. We need a viable career path that ensures that even though people in this field won’t make as much as they would in a high-tech start-up, they will have viable careers. The security of our computerized and networked future ­ meaning the security of ourselves, families, homes, businesses, and communities ­ depends on it.

This plea is bigger than security, actually. Pretty much all of the major policy debates of this century will have a major technological component. Whether it’s weapons of mass destruction, robots drastically affecting employment, climate change, food safety, or the increasing ubiquity of ever-shrinking drones, understanding the policy means understanding the technology. Our society desperately needs technologists working on the policy. The alternative is bad policy.

**********

The world-size robot is less designed than created. It’s coming without any forethought or architecting or planning; most of us are completely unaware of what we’re building. In fact, I am not convinced we can actually design any of this. When we try to design complex sociotechnical systems like this, we are regularly surprised by their emergent properties. The best we can do is observe and channel these properties as best we can.

Market thinking sometimes makes us lose sight of the human choices and autonomy at stake. Before we get controlled ­ or killed ­ by the world-size robot, we need to rebuild confidence in our collective governance institutions. Law and policy may not seem as cool as digital tech, but they’re also places of critical innovation. They’re where we collectively bring about the world we want to live in.

While I might sound like a Cassandra, I’m actually optimistic about our future. Our society has tackled bigger problems than this one. It takes work and it’s not easy, but we eventually find our way clear to make the hard choices necessary to solve our real problems.

The world-size robot we’re building can only be managed responsibly if we start making real choices about the interconnected world we live in. Yes, we need security systems as robust as the threat landscape. But we also need laws that effectively regulate these dangerous technologies. And, more generally, we need to make moral, ethical, and political decisions on how those systems should work. Until now, we’ve largely left the internet alone. We gave programmers a special right to code cyberspace as they saw fit. This was okay because cyberspace was separate and relatively unimportant: That is, it didn’t matter. Now that that’s changed, we can no longer give programmers and the companies they work for this power. Those moral, ethical, and political decisions need, somehow, to be made by everybody. We need to link people with the same zeal that we are currently linking machines. “Connect it all” must be countered with “connect us all.”

This essay previously appeared in New York Magazine.