Knowing what not to do is the first and very important step. But how do we store production credentials. Database credentials, system secrets (e.g. for HMACs), access keys for 3rd party services like payment providers or social networks. There doesn’t seem to be an agreed upon solution.
I’ve previously argued with the 12-factor app recommendation to use environment variables – if you have a few that might be okay, but when the number of variables grow (as in any real application), it becomes impractical. And you can set environment variables via a bash script, but you’d have to store it somewhere. And in fact, even separate environment variables should be stored somewhere.
This somewhere could be a local directory (risky), a shared storage, e.g. FTP or S3 bucket with limited access, or a separate git repository. I think I prefer the git repository as it allows versioning (Note: S3 also does, but is provider-specific). So you can store all your environment-specific properties files with all their credentials and environment-specific configurations in a git repo with limited access (only Ops people). And that’s not bad, as long as it’s not the same repo as the source code.
Since many companies are using GitHub or BitBucket for their repositories, storing production credentials on a public provider may still be risky. That’s why it’s a good idea to encrypt the files in the repository. A good way to do it is via git-crypt. It is “transparent” encryption because it supports diff and encryption and decryption on the fly. Once you set it up, you continue working with the repo as if it’s not encrypted. There’s even a fork that works on Windows.
You simply run git-crypt init (after you’ve put the git-crypt binary on your OS Path), which generates a key. Then you specify your .gitattributes, e.g. like that:
And you’re done. Well, almost. If this is a fresh repo, everything is good. If it is an existing repo, you’d have to clean up your history which contains the unencrypted files. Following these steps will get you there, with one addition – before calling git commit, you should call git-crypt status -f so that the existing files are actually encrypted.
You’re almost done. We should somehow share and backup the keys. For the sharing part, it’s not a big issue to have a team of 2-3 Ops people share the same key, but you could also use the GPG option of git-crypt (as documented in the README). What’s left is to backup your secret key (that’s generated in the .git/git-crypt directory). You can store it (password-protected) in some other storage, be it a company shared folder, Dropbox/Google Drive, or even your email. Just make sure your computer is not the only place where it’s present and that it’s protected. I don’t think key rotation is necessary, but you can devise some rotation procedure.
git-crypt authors claim to shine when it comes to encrypting just a few files in an otherwise public repo. And recommend looking at git-remote-gcrypt. But as often there are non-sensitive parts of environment-specific configurations, you may not want to encrypt everything. And I think it’s perfectly fine to use git-crypt even in a separate repo scenario. And even though encryption is an okay approach to protect credentials in your source code repo, it’s still not necessarily a good idea to have the environment configurations in the same repo. Especially given that different people/teams manage these credentials. Even in small companies, maybe not all members have production access.
The outstanding questions in this case is – how do you sync the properties with code changes. Sometimes the code adds new properties that should be reflected in the environment configurations. There are two scenarios here – first, properties that could vary across environments, but can have default values (e.g. scheduled job periods), and second, properties that require explicit configuration (e.g. database credentials). The former can have the default values bundled in the code repo and therefore in the release artifact, allowing external files to override them. The latter should be announced to the people who do the deployment so that they can set the proper values.
The whole process of having versioned environment-speific configurations is actually quite simple and logical, even with the encryption added to the picture. And I think it’s a good security practice we should try to follow.
What do I do with a Mac that still has personal data on it? Do I take out the disk drive and smash it? Do I sweep it with a really strong magnet? Is there a difference in how I handle a hard drive (HDD) versus a solid-state drive (SSD)? Well, taking a sledgehammer or projectile weapon to your old machine is certainly one way to make the data irretrievable, and it can be enormously cathartic as long as you follow appropriate safety and disposal protocols. But there are far less destructive ways to make sure your data is gone for good. Let me introduce you to secure erasing.
Which Type of Drive Do You Have?
Before we start, you need to know whether you have a HDD or a SSD. To find out, or at least to make sure, you click on the Apple menu and select “About this Mac.” Once there, select the “Storage” tab to see which type of drive is in your system.
The first example, below, shows a SATA Disk (HDD) in the system.
In the next case, we see we have a Solid State SATA Drive (SSD), plus a Mac SuperDrive.
The third screen shot shows an SSD, as well. In this case it’s called “Flash Storage.”
Make Sure You Have a Backup
Before you get started, you’ll want to make sure that any important data on your hard drive has moved somewhere else. OS X’s built-in Time Machine backup software is a good start, especially when paired with Backblaze. You can learn more about using Time Machine in our Mac Backup Guide.
With a local backup copy in hand and secure cloud storage, you know your data is always safe no matter what happens.
Once you’ve verified your data is backed up, roll up your sleeves and get to work. The key is OS X Recovery — a special part of the Mac operating system since OS X 10.7 “Lion.”
How to Wipe a Mac Hard Disk Drive (HDD)
NOTE: If you’re interested in wiping an SSD, see below.
Make sure your Mac is turned off.
Press the power button.
Immediately hold down the command and R keys.
Wait until the Apple logo appears.
Select “Disk Utility” from the OS X Utilities list. Click Continue.
Select the disk you’d like to erase by clicking on it in the sidebar.
Click the Erase button.
Click the Security Options button.
The Security Options window includes a slider that enables you to determine how thoroughly you want to erase your hard drive.
There are four notches to that Security Options slider. “Fastest” is quick but insecure — data could potentially be rebuilt using a file recovery app. Moving that slider to the right introduces progressively more secure erasing. Disk Utility’s most secure level erases the information used to access the files on your disk, then writes zeroes across the disk surface seven times to help remove any trace of what was there. This setting conforms to the DoD 5220.22-M specification.
Once you’ve selected the level of secure erasing you’re comfortable with, click the OK button.
Click the Erase button to begin. Bear in mind that the more secure method you select, the longer it will take. The most secure methods can add hours to the process.
Once it’s done, the Mac’s hard drive will be clean as a whistle and ready for its next adventure: a fresh installation of OS X, being donated to a relative or a local charity, or just sent to an e-waste facility. Of course you can still drill a hole in your disk or smash it with a sledgehammer if it makes you happy, but now you know how to wipe the data from your old computer with much less ruckus.
The above instructions apply to older Macintoshes with HDDs. What do you do if you have an SSD?
Securely Erasing SSDs, and Why Not To
Most new Macs ship with solid state drives (SSDs). Only the iMac and Mac mini ship with regular hard drives anymore, and even those are available in pure SSD variants if you want.
If your Mac comes equipped with an SSD, Apple’s Disk Utility software won’t actually let you zero the hard drive.
With an SSD drive, Secure Erase and Erasing Free Space are not available in Disk Utility. These options are not needed for an SSD drive because a standard erase makes it difficult to recover data from an SSD.
In fact, some folks will tell you not to zero out the data on an SSD, since it can cause wear and tear on the memory cells that, over time, can affect its reliability. I don’t think that’s nearly as big an issue as it used to be — SSD reliability and longevity has improved.
If “Standard Erase” doesn’t quite make you feel comfortable that your data can’t be recovered, there are a couple of options.
FileVault Keeps Your Data Safe
One way to make sure that your SSD’s data remains secure is to use FileVault. FileVault is whole-disk encryption for the Mac. With FileVault engaged, you need a password to access the information on your hard drive. Without it, that data is encrypted.
There’s one potential downside of FileVault — if you lose your password or the encryption key, you’re screwed: You’re not getting your data back any time soon. Based on my experience working at a Mac repair shop, losing a FileVault key happens more frequently than it should.
When you first set up a new Mac, you’re given the option of turning FileVault on. If you don’t do it then, you can turn on FileVault at any time by clicking on your Mac’s System Preferences, clicking on Security & Privacy, and clicking on the FileVault tab. Be warned, however, that the initial encryption process can take hours, as will decryption if you ever need to turn FileVault off.
With FileVault turned on, you can restart your Mac into its Recovery System (by restarting the Mac while holding down the command and R keys) and erase the hard drive using Disk Utility, once you’ve unlocked it (by selecting the disk, clicking the File menu, and clicking Unlock). That deletes the FileVault key, which means any data on the drive is useless.
FileVault doesn’t impact the performance of most modern Macs, though I’d suggest only using it if your Mac has an SSD, not a conventional hard disk drive.
Securely Erasing Free Space on Your SSD
If you don’t want to take Apple’s word for it, if you’re not using FileVault, or if you just want to, there is a way to securely erase free space on your SSD. It’s a little more involved but it works.
Before we get into the nitty-gritty, let me state for the record that this really isn’t necessary to do, which is why Apple’s made it so hard to do. But if you’re set on it, you’ll need to use Apple’s Terminal app. Terminal provides you with command line interface access to the OS X operating system. Terminal lives in the Utilities folder, but you can access Terminal from the Mac’s Recovery System, as well. Once your Mac has booted into the Recovery partition, click the Utilities menu and select Terminal to launch it.
From a Terminal command line, type:
diskutil secureErase freespace VALUE /Volumes/DRIVE
That tells your Mac to securely erase the free space on your SSD. You’ll need to change VALUE to a number between 0 and 4. 0 is a single-pass run of zeroes; 1 is a single-pass run of random numbers; 2 is a 7-pass erase; 3 is a 35-pass erase; and 4 is a 3-pass erase. DRIVE should be changed to the name of your hard drive. To run a 7-pass erase of your SSD drive in “JohnB-Macbook”, you would enter the following:
And remember, if you used a space in the name of your Mac’s hard drive, you need to insert a leading backslash before the space. For example, to run a 35-pass erase on a hard drive called “Macintosh HD” you enter the following:
diskutil secureErase freespace 3 /Volumes/Macintosh\ HD
Something to remember is that the more extensive the erase procedure, the longer it will take.
When Erasing is Not Enough — How to Destroy a Drive
Version 26.1 of the Emacs editor is out. Highlights include a built-in Lisp threading mechanism that provides some concurrency, double buffering when running under X, a redesigned flymake mode, 24-bit color support in text mode, and a systemd unit file.
Businesses and organizations that rely on macOS server for essential office and data services are facing some decisions about the future of their IT services.
Apple recently announced that it is deprecating a significant portion of essential network services in macOS Server, as they described in a support statement posted on April 24, 2018, “Prepare for changes to macOS Server.” Apple’s note includes:
macOS Server is changing to focus more on management of computers, devices, and storage on your network. As a result, some changes are coming in how Server works. A number of services will be deprecated, and will be hidden on new installations of an update to macOS Server coming in spring 2018.
The note lists the services that will be removed in a future release of macOS Server, including calendar and contact support, Dynamic Host Configuration Protocol (DHCP), Domain Name Services (DNS), mail, instant messages, virtual private networking (VPN), NetInstall, Web server, and the Wiki.
Apple assures users who have already configured any of the listed services that they will be able to use them in the spring 2018 macOS Server update, but the statement ends with links to a number of alternative services, including hosted services, that macOS Server users should consider as viable replacements to the features it is removing. These alternative services are all FOSS (Free and Open-Source Software).
As difficult as this could be for organizations that use macOS server, this is not unexpected. Apple left the server hardware space back in 2010, when Steve Jobs announced the company was ending its line of Xserve rackmount servers, which were introduced in May, 2002. Since then, macOS Server has hardly been a prominent part of Apple’s product lineup. It’s not just the product itself that has lost some luster, but the entire category of SMB office and business servers, which has been undergoing a gradual change in recent years.
Some might wonder how important the news about macOS Server is, given that macOS Server represents a pretty small share of the server market. macOS Server has been important to design shops, agencies, education users, and small businesses that likely have been on Macs for ages, but it’s not a significant part of the IT infrastructure of larger organizations and businesses.
What Comes After macOS Server?
Lovers of macOS Server don’t have to fear having their Mac minis pried from their cold, dead hands quite yet. Installed services will continue to be available. In the fall of 2018, new installations and upgrades of macOS Server will require users to migrate most services to other software. Since many of the services of macOS Server were already open-source, this means that a change in software might not be required. It does mean more configuration and management required from those who continue with macOS Server, however.
Users can continue with macOS Server if they wish, but many will see the writing on the wall and look for a suitable substitute.
The Times They Are A-Changin’
For many people working in organizations, what is significant about this announcement is how it reflects the move away from the once ubiquitous server-based IT infrastructure. Services that used to be centrally managed and office-based, such as storage, file sharing, communications, and computing, have moved to the cloud.
In selecting the next office IT platforms, there’s an opportunity to move to solutions that reflect and support how people are working and the applications they are using both in the office and remotely. For many, this means including cloud-based services in office automation, backup, and business continuity/disaster recovery planning. This includes Software as a Service, Platform as a Service, and Infrastructure as a Service (Saas, PaaS, IaaS) options.
IT solutions that integrate well with the cloud are worth strong consideration for what comes after a macOS Server-based environment.
Synology NAS as a macOS Server Alternative
One solution that is becoming popular is to replace macOS Server with a device that has the ability to provide important office services, but also bridges the office and cloud environments. Using Network-Attached Storage (NAS) to take up the server slack makes a lot of sense. Many customers are already using NAS for file sharing, local data backup, automatic cloud backup, and other uses. In the case of Synology, their operating system, Synology DiskStation Manager (DSM), is Linux based, and integrates the basic functions of file sharing, centralized backup, RAID storage, multimedia streaming, virtual storage, and other common functions.
Since DSM is based on Linux, there are numerous server applications available, including many of the same ones that are available for macOS Server, which shares conceptual roots with Linux as it comes from BSD Unix.
Synology DiskStation Manager Package Center
According to Ed Lukacs, COO at 2FIFTEEN Systems Management in Salt Lake City, their customers have found the move from macOS Server to Synology NAS not only painless, but positive. DSM works seamlessly with macOS and has been faster for their customers, as well. Many of their customers are running Adobe Creative Suite and Google G Suite applications, so a workflow that combines local storage, remote access, and the cloud, is already well known to them. Remote users are supported by Synology’s QuickConnect or VPN.
Customers have been able to get up and running quickly, with only initial data transfers requiring some time to complete. After that, management of the NAS can be handled in-house or with the support of a Managed Service Provider (MSP).
Are You Sticking with macOS Server or Moving to Another Platform?
If you’re affected by this change in macOS Server, please let us know in the comments how you’re planning to cope. Are you using Synology NAS for server services? Please tell us how that’s working for you.
This blog post was co-authored by Ujjwal Ratan, a senior AI/ML solutions architect on the global life sciences team.
Healthcare data is generated at an ever-increasing rate and is predicted to reach 35 zettabytes by 2020. Being able to cost-effectively and securely manage this data whether for patient care, research or legal reasons is increasingly important for healthcare providers.
Healthcare providers must have the ability to ingest, store and protect large volumes of data including clinical, genomic, device, financial, supply chain, and claims. AWS is well-suited to this data deluge with a wide variety of ingestion, storage and security services (e.g. AWS Direct Connect, Amazon Kinesis Streams, Amazon S3, Amazon Macie) for customers to handle their healthcare data. In a recent Healthcare IT News article, healthcare thought-leader, John Halamka, noted, “I predict that five years from now none of us will have datacenters. We’re going to go out to the cloud to find EHRs, clinical decision support, analytics.”
I realize simply storing this data is challenging enough. Magnifying the problem is the fact that healthcare data is increasingly attractive to cyber attackers, making security a top priority. According to Mariya Yao in her Forbes column, it is estimated that individual medical records can be worth hundreds or even thousands of dollars on the black market.
In this first of a 2-part post, I will address the value that AWS can bring to customers for ingesting, storing and protecting provider’s healthcare data. I will describe key components of any cloud-based healthcare workload and the services AWS provides to meet these requirements. In part 2 of this post we will dive deep into the AWS services used for advanced analytics, artificial intelligence and machine learning.
The data tsunami is upon us
So where is this data coming from? In addition to the ubiquitous electronic health record (EHR), the sources of this data include:
devices such as MRIs, x-rays and ultrasounds
sensors and wearables for patients
medical equipment telemetry
Additional sources of data come from non-clinical, operational systems such as:
claims and billing
Data from these sources can be structured (e.g., claims data) as well as unstructured (e.g., clinician notes). Some data comes across in streams such as that taken from patient monitors, while some comes in batch form. Still other data comes in near-real time such as HL7 messages. All of this data has retention policies dictating how long it must be stored. Much of this data is stored in perpetuity as many systems in use today have no purge mechanism. AWS has services to manage all these data types as well as their retention, security and access policies.
Imaging is a significant contributor to this data tsunami. Increasing demand for early-stage diagnoses along with aging populations drive increasing demand for images from CT, PET, MRI, ultrasound, digital pathology, X-ray and fluoroscopy. For example, a thin-slice CT image can be hundreds of megabytes. Increasing demand and strict retention policies make storage costly.
Due to the plummeting cost of gene sequencing, molecular diagnostics (including liquid biopsy) is a large contributor to this data deluge. Many predict that as the value of molecular testing becomes more identifiable, the reimbursement models will change and it will increasingly become the standard of care. According to the Washington Post article “Sequencing the Genome Creates so Much Data We Don’t Know What to do with It,”
“Some researchers predict that up to one billion people will have their genome sequenced by 2025 generating up to 40 exabytes of data per year.”
Although genomics is primarily used for oncology diagnostics today, it’s also used for other purposes, pharmacogenomics — used to understand how an individual will metabolize a medication.
It is increasingly challenging for the typical hospital, clinic or physician practice to securely store, process and manage this data without cloud adoption.
Amazon has a variety of ingestion techniques depending on the nature of the data including size, frequency and structure. AWS Snowball and AWS Snowmachine are appropriate for extremely-large, secure data transfers whether one time or episodic. AWS Glue is a fully-managed ETL service for securely moving data from on-premise to AWS and Amazon Kinesis can be used for ingesting streaming data.
Amazon S3, Amazon S3 IA, and Amazon Glacier are economical, data-storage services with a pay-as-you-go pricing model that expand (or shrink) with the customer’s requirements.
The above architecture has four distinct components – ingestion, storage, security, and analytics. In this post I will dive deeper into the first three components, namely ingestion, storage and security. In part 2, I will look at how to use AWS’ analytics services to draw value on, and optimize, your healthcare data.
A typical provider data center will consist of many systems with varied datasets. AWS provides multiple tools and services to effectively and securely connect to these data sources and ingest data in various formats. The customers can choose from a range of services and use them in accordance with the use case.
For use cases involving one-time (or periodic), very large data migrations into AWS, customers can take advantage of AWS Snowball devices. These devices come in two sizes, 50 TB and 80 TB and can be combined together to create a petabyte scale data transfer solution.
The devices are easy to connect and load and they are shipped to AWS avoiding the network bottlenecks associated with such large-scale data migrations. The devices are extremely secure supporting 256-bit encryption and come in a tamper-resistant enclosure. AWS Snowball imports data in Amazon S3 which can then interface with other AWS compute services to process that data in a scalable manner.
For use cases involving a need to store a portion of datasets on premises for active use and offload the rest on AWS, the Amazon storage gateway service can be used. The service allows you to seamlessly integrate on premises applications via standard storage protocols like iSCSI or NFS mounted on a gateway appliance. It supports a file interface, a volume interface and a tape interface which can be utilized for a range of use cases like disaster recovery, backup and archiving, cloud bursting, storage tiering and migration.
By using the AWS proposed reference architecture for disaster recovery, healthcare providers can ensure their data assets are securely stored on the cloud and are easily accessible in the event of a disaster. The “AWS Disaster Recovery” whitepaper includes details on options available to customers based on their desired recovery time objective (RTO) and recovery point objective (RPO).
AWS is an ideal destination for offloading large volumes of less-frequently-accessed data. These datasets are rarely used in active compute operations but are exceedingly important to retain for reasons like compliance. By storing these datasets on AWS, customers can take advantage of the highly-durable platform to securely store their data and also retrieve them easily when they need to. For more details on how AWS enables customers to run back and archival use cases on AWS, please refer to the following set of whitepapers.
A healthcare provider may have a variety of databases spread throughout the hospital system supporting critical applications such as EHR, PACS, finance and many more. These datasets often need to be aggregated to derive information and calculate metrics to optimize business processes. AWS Glue is a fully-managed Extract, Transform and Load (ETL) service that can read data from a JDBC-enabled, on-premise database and transfer the datasets into AWS services like Amazon S3, Amazon Redshift and Amazon RDS. This allows customers to create transformation workflows that integrate smaller datasets from multiple sources and aggregates them on AWS.
Healthcare providers deal with a variety of streaming datasets which often have to be analyzed in near real time. These datasets come from a variety of sources such as sensors, messaging buses and social media, and often do not adhere to an industry standard. The Amazon Kinesis suite of services, that includes Amazon Kinesis Streams, Amazon Kinesis Firehose, and Amazon Kinesis Analytics, are the ideal set of services to accomplish the task of deriving value from streaming data.
Example: Using AWS Glue to de-identify and ingest healthcare data into S3 Let’s consider a scenario in which a provider maintains patient records in a database they want to ingest into S3. The provider also wants to de-identify the data by stripping personally- identifiable attributes and store the non-identifiable information in an S3 bucket. This bucket is different from the one that contains identifiable information. Doing this allows the healthcare provider to separate sensitive information with more restrictions set up via S3 bucket policies.
To ingest records into S3, we create a Glue job that reads from the source database using a Glue connection. The connection is also used by a Glue crawler to populate the Glue data catalog with the schema of the source database. We will use the Glue development endpoint and a zeppelin notebook server on EC2 to develop and execute the job.
Step 1: Import the necessary libraries and also set a glue context which is a wrapper on the spark context:
Step 2: Create a dataframe from the source data. I call the dataframe “readmissionsdata”. Here is what the schema would look like:
Step 3: Now select the columns that contains indentifiable information and store it in a new dataframe. Call the new dataframe “phi”.
Step 4: Non-PHI columns are stored in a separate dataframe. Call this dataframe “nonphi”.
Step 5: Write the two dataframes into two separate S3 buckets
Once successfully executed, the PHI and non-PHI attributes are stored in two separate files in two separate buckets that can be individually maintained.
In 2016, 327 healthcare providers reported a protected health information (PHI) breach, affecting 16.4m patient records. There have been 342 data breaches reported in 2017 — involving 3.2 million patient records.
To date, AWS has released 51 HIPAA-eligible services to help customers address security challenges and is in the process of making many more services HIPAA-eligible. These HIPAA-eligible services (along with all other AWS services) help customers build solutions that comply with HIPAA security and auditing requirements. A catalogue of HIPAA-enabled services can be found at AWS HIPAA-eligible services. It is important to note that AWS manages physical and logical access controls for the AWS boundary. However, the overall security of your workloads is a shared responsibility, where you are responsible for controlling user access to content on your AWS accounts.
AWS storage services allow you to store data efficiently while maintaining high durability and scalability. By using Amazon S3 as the central storage layer, you can take advantage of the Amazon S3 storage management features to get operational metrics on your data sets and transition them between various storage classes to save costs. By tagging objects on Amazon S3, you can build a governance layer on Amazon S3 to grant role based access to objects using Amazon IAM and Amazon S3 bucket policies.
To learn more about the Amazon S3 storage management features, see the following link.
In the example above, we are storing the PHI information in a bucket named “phi.” Now, we want to protect this information to make sure its encrypted, does not have unauthorized access, and all access requests to the data are logged.
Encryption: S3 provides settings to enable default encryption on a bucket. This ensures any object in the bucket is encrypted by default.
Logging: S3 provides object level logging that can be used to capture all API calls to the object. The API calls are logged in cloudtrail for easy access and consolidation. Moreover, it also supports events to proactively alert customers of read and write operations.
Access control: Customers can use S3 bucket policies and IAM policies to restrict access to the phi bucket. It can also put a restriction to enforce multi-factor authentication on the bucket. For example, the following policy enforces multi-factor authentication on the phi bucket:
In Part 1 of this blog, we detailed the ingestion, storage, security and management of healthcare data on AWS. Stay tuned for part two where we are going to dive deep into optimizing the data for analytics and machine learning.
Amazon S3 provides comprehensive security and compliance capabilities that meet even the most stringent regulatory requirements. It gives you flexibility in the way you manage data for cost optimization, access control, and compliance. However, because the service is flexible, a user could accidentally configure buckets in a manner that is not secure. For example, let’s say you uploaded files to an Amazon S3 bucket with public read permissions, even though you intended only to share this file with a colleague or a partner. Although this might have accomplished your task to share the file internally, the file is now available to anyone on the internet, even without authentication.
In this blog post, we show you how to prevent your Amazon S3 buckets and objects from allowing public access. We discuss how to secure data in Amazon S3 with a defense-in-depth approach, where multiple security controls are put in place to help prevent data leakage. This approach helps prevent you from allowing public access to confidential information, such as personally identifiable information (PII) or protected health information (PHI).
Preventing your Amazon S3 buckets and objects from allowing public access
Every call to an Amazon S3 service becomes a REST API request. When your request is transformed via a REST call, the permissions are converted into parameters included in the HTTP header or as URL parameters. The Amazon S3 bucket policy allows or denies access to the Amazon S3 bucket or Amazon S3 objects based on policy statements, and then evaluates conditions based on those parameters. To learn more, see Using Bucket Policies and User Policies.
With this in mind, let’s say multiple AWS Identity and Access Management (IAM) users at Example Corp. have access to an Amazon S3 bucket and the objects in the bucket. Example Corp. wants to share the objects among its IAM users, while at the same time preventing the objects from being made available publicly.
To demonstrate how to do this, we start by creating an Amazon S3 bucket named examplebucket. After creating this bucket, we must apply the following bucket policy. This policy denies any uploaded object (PutObject) with the attribute x-amz-acl having the values public-read, public-read-write, or authenticated-read. This means authenticated users cannot upload objects to the bucket if the objects have public permissions.
“Deny any Amazon S3 request to PutObject or PutObjectAcl in the bucket examplebucket when the request includes one of the following access control lists (ACLs): public-read, public-read-write, or authenticated-read.”
Remember that IAM policies are evaluated not in a first-match-and-exit model. Instead, IAM evaluates first if there is an explicit Deny. If there is not, IAM continues to evaluate if you have an explicit Allow and then you have an implicit Deny.
The above policy creates an explicit Deny. Even when any authenticated user tries to upload (PutObject) an object with public read or write permissions, such as public-read or public-read-write or authenticated-read, the action will be denied. To understand how S3 Access Permissions work, you must understand what Access Control Lists (ACL) and Grants are. You can find the documentation here.
Now let’s continue our bucket policy explanation by examining the next statement.
This statement is very similar to the first statement, except that instead of checking the ACLs, we are checking specific user groups’ grants that represent the following groups:
AuthenticatedUsers group. Represented by http://acs.amazonaws.com/groups/global/AuthenticatedUsers, this group represents all AWS accounts. Access permissions to this group allow any AWS account to access the resource. However, all requests must be signed (authenticated).
AllUsers group. Represented by http://acs.amazonaws.com/groups/global/AllUsers, access permissions to this group allow anyone on the internet access to the resource. The requests can be signed (authenticated) or unsigned (anonymous). Unsigned requests omit the Authentication header in the request.
Now that you know how to deny object uploads with permissions that would make the object public, you just have two statement policies that prevent users from changing the bucket permissions (Denying s3:PutBucketACL from ACL and Denying s3:PutBucketACL from Grants).
Below is how we’re preventing users from changing the bucket permisssions.
As you can see above, the statement is very similar to the Object statements, except that now we use s3:PutBucketAcl instead of s3:PutObjectAcl, the Resource is just the bucket ARN, and the objects have the “/*” in the end of the ARN.
In this section, we showed how to prevent IAM users from accidently uploading Amazon S3 objects with public permissions to buckets. In the next section, we show you how to enforce multiple layers of security controls, such as encryption of data at rest and in transit while serving traffic from Amazon S3.
Securing data on Amazon S3 with defense-in-depth
Let’s say that Example Corp. wants to serve files securely from Amazon S3 to its users with the following requirements:
The data must be encrypted at rest and during transit.
The data must be accessible only by a limited set of public IP addresses.
All requests for data should be handled only by Amazon CloudFront (which is a content delivery network) instead of being directly available from an Amazon S3 URL. If you’re using an Amazon S3 bucket as the origin for a CloudFront distribution, you can grant public permission to read the objects in your bucket. This allows anyone to access your objects either through CloudFront or the Amazon S3 URL. CloudFront doesn’t expose Amazon S3 URLs, but your users still might have access to those URLs if your application serves any objects directly from Amazon S3, or if anyone gives out direct links to specific objects in Amazon S3.
A domain name is required to consume the content. Custom SSL certificate support lets you deliver content over HTTPS by using your own domain name and your own SSL certificate. This gives visitors to your website the security benefits of CloudFront over an SSL connection that uses your own domain name, in addition to lower latency and higher reliability.
To represent defense-in-depth visually, the following diagram contains several Amazon S3 objects (A) in a single Amazon S3 bucket (B). You can encrypt these objects on the server side or the client side. You also can configure the bucket policy such that objects are accessible only through CloudFront, which you can accomplish through an origin access identity (C). You then can configure CloudFront to deliver content only over HTTPS in addition to using your own domain name (D).
Defense-in-depth requirement 1: Data must be encrypted at rest and during transit
Let’s start with the objects themselves. Amazon S3 objects—files in this case—can range from zero bytes to multiple terabytes in size (see service limits for the latest information). Each Amazon S3 bucket includes a collection of objects, and the objects can be uploaded via the Amazon S3 console, AWS CLI, or AWS API.
If you choose to use server-side encryption, Amazon S3 encrypts your objects before saving them on disks in AWS data centers. To encrypt an object at the time of upload, you need to add the x-amz-server-side-encryption header to the request to tell Amazon S3 to encrypt the object using Amazon S3 managed keys (SSE-S3), AWS KMS managed keys (SSE-KMS), or customer-provided keys (SSE-C). There are two possible values for the x-amz-server-side-encryption header: AES256, which tells Amazon S3 to use Amazon S3 managed keys, and aws:kms, which tells Amazon S3 to use AWS KMS managed keys.
The following code example shows a Put request using SSE-S3.
PUT /example-object HTTP/1.1
Date: Wed, 8 Jun 2016 17:50:00 GMT
Authorization: authorization string
[11434 bytes of object data]
If you choose to use client-side encryption, you can encrypt data on the client side and upload the encrypted data to Amazon S3. In this case, you manage the encryption process, the encryption keys, and related tools. You encrypt data on the client side by using AWS KMS managed keys or a customer-supplied, client-side master key.
Defense-in-depth requirement 2: Data must be accessible only by a limited set of public IP addresses
At the Amazon S3 bucket level, you can configure permissions through a bucket policy. For example, you can limit access to the objects in a bucket by IP address range or specific IP addresses. Alternatively, you can make the objects accessible only through HTTPS.
The following bucket policy allows access to Amazon S3 objects only through HTTPS (the policy was generated with the AWS Policy Generator). Here the bucket policy explicitly denies ("Effect": "Deny") all read access ("Action": "s3:GetObject") from anybody who browses ("Principal": "*") to Amazon S3 objects within an Amazon S3 bucket if they are not accessed through HTTPS ("aws:SecureTransport": "false").
Defense-in-depth requirement 3: Data must not be publicly accessible directly from an Amazon S3 URL
Next, configure Amazon CloudFront to serve traffic from within the bucket. The use of CloudFront serves several purposes:
CloudFront is a content delivery network that acts as a cache to serve static files quickly to clients.
Depending on the number of requests, the cost of delivery is less than if objects were served directly via Amazon S3.
Objects served through CloudFront can be limited to specific countries.
Access to these Amazon S3 objects is available only through CloudFront. We do this by creating an origin access identity (OAI) for CloudFront and granting access to objects in the respective Amazon S3 bucket only to that OAI. As a result, access to Amazon S3 objects from the internet is possible only through CloudFront; all other means of accessing the objects—such as through an Amazon S3 URL—are denied. CloudFront acts not only as a content distribution network, but also as a host that denies access based on geographic restrictions. You apply these restrictions by updating your CloudFront web distribution and adding a whitelist that contains only a specific country’s name (let’s say Liechtenstein). Alternatively, you could add a blacklist that contains every country except that country. Learn more about how to use CloudFront geographic restriction to whitelist or blacklist a country to restrict or allow users in specific locations from accessing web content in the AWS Support Knowledge Center.
Defense-in-depth requirement 4: A domain name is required to consume the content
To serve content from CloudFront, you must use a domain name in the URLs for objects on your webpages or in your web application. The domain name can be either of the following:
The domain name that CloudFront automatically assigns when you create a distribution, such as d111111abcdef8.cloudfront.net
Your own domain name, such as example.com
For example, you might use one of the following URLs to return the file image.jpg:
You use the same URL format whether you store the content in Amazon S3 buckets or at a custom origin, like one of your own web servers.
Instead of using the default domain name that CloudFront assigns for you when you create a distribution, you can add an alternate domain name that’s easier to work with, like example.com. By setting up your own domain name with CloudFront, you can use a URL like this for objects in your distribution: http://example.com/images/image.jpg.
Let’s say that you already have a domain name hosted on Amazon Route 53. You would like to serve traffic from the domain name, request an SSL certificate, and add this to your CloudFront web distribution. The SSL offloading occurs in CloudFront by serving traffic securely from each CloudFront location. You also can configure CloudFront to deliver your content over HTTPS by using your custom domain name and your own SSL certificate. Serving web content through CloudFront reduces response from the origin as requests are redirected to the nearest edge location. This results in faster download times than if the visitor had requested the content from a data center that is located farther away.
In this post, we demonstrated how you can apply policies to Amazon S3 buckets so that only users with appropriate permissions are allowed to access the buckets. Anonymous users (with public-read/public-read-write permissions) and authenticated users without the appropriate permissions are prevented from accessing the buckets.
We also examined how to secure access to objects in Amazon S3 buckets. The objects in Amazon S3 buckets can be encrypted at rest and during transit. Doing so helps provide end-to-end security from the source (in this case, Amazon S3) to your users.
If you have feedback about this blog post, submit comments in the “Comments” section below. If you have questions about this blog post, start a new thread on the Amazon S3 forum or contact AWS Support.
On his blog, Tom Tromey looks at just-in-time (JIT) compilation for Emacs and what he has done differently in his implementation from what was done in earlier efforts. He also looks at potential enhancements to his JIT: “Calling a function in Emacs Lisp is quite expensive. A call from the JIT requires marshalling the arguments into an array, then calling Ffuncall; which then might dispatch to a C function (a “subr”), the bytecode interpreter, or the ordinary interpreter. In some cases this may require allocation.
This overhead applies to nearly every call — but the C implementation of Emacs is free to call various primitive functions directly, without using Ffuncall to indirect through some Lisp symbol.
Now, these direct calls aren’t without a cost: they prevent the modification of some functions from Lisp. Sometimes this is a pain (it might be handy to hack on load from Lisp), but in many cases it is unimportant.
So, one idea for the JIT is to keep a list of such functions and then emit direct calls rather than indirect ones.”
Security updates have been issued by Arch Linux (linux-hardened, linux-lts, linux-zen, and mongodb), Debian (gdk-pixbuf, gifsicle, graphicsmagick, kernel, and poppler), Fedora (dracut, electron-cash, and firefox), Gentoo (backintime, binutils, chromium, emacs, libXcursor, miniupnpc, openssh, optipng, and webkit-gtk), Mageia (kernel, kernel-linus, kernel-tmb, openafs, and python-mistune), openSUSE (clamav-database, ImageMagick, kernel-firmware, nodejs4, and qemu), Red Hat (linux-firmware, ovirt-guest-agent-docker, qemu-kvm-rhev, redhat-virtualization-host, rhev-hypervisor7, rhvm-appliance, thunderbird, and vdsm), Scientific Linux (thunderbird), SUSE (kernel and qemu), and Ubuntu (firefox and poppler).
I’m getting ready to wrap up my work for the year, cleaning up my inbox and catching up on a few recent AWS launches that happened at and shortly after AWS re:Invent.
Last week we launched Amazon Linux 2. This is modern version of Linux, designed to meet the security, stability, and productivity needs of enterprise environments while giving you timely access to new tools and features. It also includes all of the things that made the Amazon Linux AMI popular, including AWS integration, cloud-init, a secure default configuration, regular security updates, and AWS Support. From that base, we have added many new features including:
Long-Term Support – You can use Amazon Linux 2 in situations where you want to stick with a single major version of Linux for an extended period of time, perhaps to avoid re-qualifying your applications too frequently. This build (2017.12) is a candidate for LTS status; the final determination will be made based on feedback in the Amazon Linux Discussion Forum. Long-term support for the Amazon Linux 2 LTS build will include security updates, bug fixes, user-space Application Binary Interface (ABI), and user-space Application Programming Interface (API) compatibility for 5 years.
Extras Library – You can now get fast access to fresh, new functionality while keeping your base OS image stable and lightweight. The Amazon Linux Extras Library eliminates the age-old tradeoff between OS stability and access to fresh software. It contains open source databases, languages, and more, each packaged together with any needed dependencies.
Tuned Kernel – You have access to the latest 4.9 LTS kernel, with support for the latest EC2 features and tuned to run efficiently in AWS and other virtualized environments.
Systemd – Amazon Linux 2 includes the systemd init system, designed to provide better boot performance and increased control over individual services and groups of interdependent services. For example, you can indicate that Service B must be started only after Service A is fully started, or that Service C should start on a change in network connection status.
Wide Availabilty – Amazon Linux 2 is available in all AWS Regions in AMI and Docker image form. Virtual machine images for Hyper-V, KVM, VirtualBox, and VMware are also available. You can build and test your applications on your laptop or in your own data center and then deploy them to AWS.
I’m interested in the Extras Library; here’s how I see which topics (lists of packages) are available:
As you can see, the library includes languages, editors, and web tools that receive frequent updates. Each topic contains all of dependencies that are needed to install the package on Amazon Linux 2. For example, the Rust topic includes the cmake build system for Rust, cargo for Rust package maintenance, and the LLVM-based compiler toolchain for Rust.
SNS Updates Many AWS customers use the Amazon Linux AMIs as a starting point for their own AMIs. If you do this and would like to kick off your build process whenever a new AMI is released, you can subscribe to an SNS topic:
You can be notified by email, invoke a AWS Lambda function, and so forth.
Today, we are launching the first Debian Stretch release of the Raspberry Pi Desktop for PCs and Macs, and we’re also releasing the latest version of Raspbian Stretch for your Pi.
For PCs and Macs
When we released our custom desktop environment on Debian for PCs and Macs last year, we were slightly taken aback by how popular it turned out to be. We really only created it as a result of one of those “Wouldn’t it be cool if…” conversations we sometimes have in the office, so we were delighted by the Pi community’s reaction.
Seeing how keen people were on the x86 version, we decided that we were going to try to keep releasing it alongside Raspbian, with the ultimate aim being to make simultaneous releases of both. This proved to be tricky, particularly with the move from the Jessie version of Debian to the Stretch version this year. However, we have now finished the job of porting all the custom code in Raspbian Stretch to Debian, and so the first Debian Stretch release of the Raspberry Pi Desktop for your PC or Mac is available from today.
The new Stretch releases
As with the Jessie release, you can either run this as a live image from a DVD, USB stick, or SD card or install it as the native operating system on the hard drive of an old laptop or desktop computer. Please note that installing this software will erase anything else on the hard drive — do not install this over a machine running Windows or macOS that you still need to use for its original purpose! It is, however, safe to boot a live image on such a machine, since your hard drive will not be touched by this.
We’re also pleased to announce that we are releasing the latest version of Raspbian Stretch for your Pi today. The Pi and PC versions are largely identical: as before, there are a few applications (such as Mathematica) which are exclusive to the Pi, but the user interface, desktop, and most applications will be exactly the same.
For Raspbian, this new release is mostly bug fixes and tweaks over the previous Stretch release, but there are one or two changes you might notice.
The file manager included as part of the LXDE desktop (on which our desktop is based) is a program called PCManFM, and it’s very feature-rich; there’s not much you can’t do in it. However, having used it for a few years, we felt that it was perhaps more complex than it needed to be — the sheer number of menu options and choices made some common operations more awkward than they needed to be. So to try to make file management easier, we have implemented a cut-down mode for the file manager.
Most of the changes are to do with the menus. We’ve removed a lot of options that most people are unlikely to change, and moved some other options into the Preferences screen rather than the menus. The two most common settings people tend to change — how icons are displayed and sorted — are now options on the toolbar and in a top-level menu rather than hidden away in submenus.
The sidebar now only shows a single hierarchical view of the file system, and we’ve tidied the toolbar and updated the icons to make them match our house style. We’ve removed the option for a tabbed interface, and we’ve stomped a few bugs as well.
One final change was to make it possible to rename a file just by clicking on its icon to highlight it, and then clicking on its name. This is the way renaming works on both Windows and macOS, and it’s always seemed slightly awkward that Unix desktop environments tend not to support it.
As with most of the other changes we’ve made to the desktop over the last few years, the intention is to make it simpler to use, and to ease the transition from non-Unix environments. But if you really don’t like what we’ve done and long for the old file manager, just untick the box for Display simplified user interface and menus in the Layout page of Preferences, and everything will be back the way it was!
Battery indicator for laptops
One important feature missing from the previous release was an indication of the amount of battery life. Eben runs our desktop on his Mac, and he was becoming slightly irritated by having to keep rebooting into macOS just to check whether his battery was about to die — so fixing this was a priority!
We’ve added a battery status icon to the taskbar; this shows current percentage charge, along with whether the battery is charging, discharging, or connected to the mains. When you hover over the icon with the mouse pointer, a tooltip with more details appears, including the time remaining if the battery can provide this information.
While this battery monitor is mainly intended for the PC version, it also supports the first-generation pi-top — to see it, you’ll only need to make sure that I2C is enabled in Configuration. A future release will support the new second-generation pi-top.
New PC applications
We have included a couple of new applications in the PC version. One is called PiServer — this allows you to set up an operating system, such as Raspbian, on the PC which can then be shared by a number of Pi clients networked to it. It is intended to make it easy for classrooms to have multiple Pis all running exactly the same software, and for the teacher to have control over how the software is installed and used. PiServer is quite a clever piece of software, and it’ll be covered in more detail in another blog post in December.
We’ve also added an application which allows you to easily use the GPIO pins of a Pi Zero connected via USB to a PC in applications using Scratch or Python. This makes it possible to run the same physical computing projects on the PC as you do on a Pi! Again, we’ll tell you more in a separate blog post this month.
Both of these applications are included as standard on the PC image, but not on the Raspbian image. You can run them on a Pi if you want — both can be installed from apt.
How to get the new versions
New images for both Raspbian and Debian versions are available from the Downloads page.
It is possible to update existing installations of both Raspbian and Debian versions. For Raspbian, this is easy: just open a terminal window and enter
How to update to the latest version of Raspbian on your Raspberry Pi. Download Raspbian here: More information on the latest version of Raspbian: Buy a Raspberry Pi:
It is slightly more complex for the PC version, as the previous release was based around Debian Jessie. You will need to edit the files /etc/apt/sources.list and /etc/apt/sources.list.d/raspi.list, using sudo to do so. In both files, change every occurrence of the word “jessie” to “stretch”. When that’s done, do the following:
At several points during the upgrade process, you will be asked if you want to keep the current version of a configuration file or to install the package maintainer’s version. In every case, keep the existing version, which is the default option. The update may take an hour or so, depending on your network connection.
As with all software updates, there is the possibility that something may go wrong during the process, which could lead to your operating system becoming corrupted. Therefore, we always recommend making a backup first.
Enjoy the new versions, and do let us know any feedback you have in the comments or on the forums!
One of the first things you learn when you start programming is that, just like any craftsperson, your tools matter. Notepad.exe isn’t going to cut it. A powerful editor and testing pipeline supercharge your productivity. I still remember learning to use Vim for the first time and being able to zip around systems and complex programs. Do you remember how hard it was to setup all your compilers and dependencies on a new machine? How many cycles have you wasted matching versions, tinkering with configs, and then writing documentation to onboard a new developer to a project?
The Ace Editor at the core of Cloud9 is what lets you write code quickly, easily, and beautifully. It follows a UNIX philosophy of doing one thing and doing it well: writing code.
It has all the typical IDE features you would expect: live syntax checking, auto-indent, auto-completion, code folding, split panes, version control integration, multiple cursors and selections, and it also has a few unique features I want to highlight. First of all, it’s fast, even for large (100000+ line) files. There’s no lag or other issues while typing. It has over two dozen themes built-in (solarized!) and you can bring all of your favorite themes from Sublime Text or TextMate as well. It has built-in support for 40+ language modes and customizable run configurations for your projects. Most importantly though, it has Vim mode (or emacs if your fingers work that way). It also has a keybinding editor that allows you to bend the editor to your will.
The editor supports powerful keyboard navigation and commands (similar to Sublime Text or vim plugins like ctrlp). On a Mac, with ⌘+P you can open any file in your environment with fuzzy search. With ⌘+. you can open up the command pane which allows you to do invoke any of the editor commands by typing the name. It also helpfully displays the keybindings for a command in the pane, for instance to open to a terminal you can press ⌥+T. Oh, did I mention there’s a terminal? It ships with the AWS CLI preconfigured for access to your resources.
The environment also comes with pre-installed debugging tools for many popular languages – but you’re not limited to what’s already installed. It’s easy to add in new programs and define new run configurations.
The editor is just one, admittedly important, component in an IDE though. I want to show you some other compelling features.
The AWS Cloud9 IDE is the first IDE I’ve used that is truly “cloud native”. The service is provided at no additional charge, and you only charged for the underlying compute and storage resources. When you create an environment you’re prompted for either: an instance type and an auto-hibernate time, or SSH access to a machine of your choice.
If you’re running in AWS the auto-hibernate feature will stop your instance shortly after you stop using your IDE. This can be a huge cost savings over running a more permanent developer desktop. You can also launch it within a VPC to give it secure access to your development resources. If you want to run Cloud9 outside of AWS, or on an existing instance, you can provide SSH access to the service which it will use to create an environment on the external machine. Your environment is provisioned with automatic and secure access to your AWS account so you don’t have to worry about copying credentials around. Let me say that again: you can run this anywhere.
Serverless Development with AWS Cloud9
I spend a lot of time on Twitch developing serverless applications. I have hundreds of lambda functions and APIs deployed. Cloud9 makes working with every single one of these functions delightful. Let me show you how it works.
If you look in the top right side of the editor you’ll see an AWS Resources tab. Opening this you can see all of the lambda functions in your region (you can see functions in other regions by adjusting your region preferences in the AWS preference pane).
You can import these remote functions to your local workspace just by double-clicking them. This allows you to edit, test, and debug your serverless applications all locally. You can create new applications and functions easily as well. If you click the Lambda icon in the top right of the pane you’ll be prompted to create a new lambda function and Cloud9 will automatically create a Serverless Application Model template for you as well. The IDE ships with support for the popular SAM local tool pre-installed. This is what I use in most of my local testing and serverless development. Since you have a terminal, it’s easy to install additional tools and use other serverless frameworks.
Launching an Environment from AWS CodeStar
With AWS CodeStar you can easily provision an end-to-end continuous delivery toolchain for development on AWS. Codestar provides a unified experience for building, testing, deploying, and managing applications using AWS CodeCommit, CodeBuild, CodePipeline, and CodeDeploy suite of services. Now, with a few simple clicks you can provision a Cloud9 environment to develop your application. Your environment will be pre-configured with the code for your CodeStar application already checked out and git credentials already configured.
You can easily share this environment with your coworkers which leads me to another extremely useful set of features.
One of the many things that sets AWS Cloud9 apart from other editors are the rich collaboration tools. You can invite an IAM user to your environment with a few clicks.
You can see what files they’re working on, where their cursors are, and even share a terminal. The chat features is useful as well.
Things to Know
There are no additional charges for this service beyond the underlying compute and storage.
c9.io continues to run for existing users. You can continue to use all the features of c9.io and add new team members if you have a team account. In the future, we will provide tools for easy migration of your c9.io workspaces to AWS Cloud9.
AWS Cloud9 is available in the US West (Oregon), US East (Ohio), US East (N.Virginia), EU (Ireland), and Asia Pacific (Singapore) regions.
I can’t wait to see what you build with AWS Cloud9!
Over the past several months, B2 Cloud Storage has continued to grow like we planted magic beans. During that time we have added a B2 Java SDK, and certified integrations with GoodSync, Arq, Panic, UpdraftPlus, Morro Data, QNAP, Archiware, Restic, and more. In addition, B2 customers like Panna Cooking, Sermon Audio, and Fellowship Church are happy they chose B2 as their cloud storage provider. If any of that sounds interesting, read on.
The B2 Java SDK
While the Backblaze B2 API is well documented and straight-forward to implement, we were asked by a few of our Integration Partners if we had an SDK they could use. So we developed one as an open-course project on GitHub, where we hope interested parties will not only use our Java SDK, but make it better for everyone else.
There are different reasons one might use the Java SDK, but a couple of areas where the SDK can simplify the coding process are:
Expiring Authorization — B2 requires an application key for a given account be reissued once a day when using the API. If the application key expires while you are in the middle of transferring files or some other B2 activity (bucket list, etc.), the SDK can be used to detect and then update the application key on the fly. Your B2 related activities will continue without incident and without having to capture and code your own exception case.
Error Handling — There are different types of error codes B2 will return, from expired application keys to detecting malformed requests to command time-outs. The SDK can dramatically simplify the coding needed to capture and account for the various things that can happen.
While Backblaze has created the Java SDK, developers in the GitHub community have also created other SDKs for B2, for example, for PHP (https://github.com/cwhite92/b2-sdk-php,) and Go (https://github.com/kurin/blazer.) Let us know in the comments about other SDKs you’d like to see or perhaps start your own GitHub project. We will publish any updates in our next B2 roundup.
What You Can Do with Affordable and Available Cloud Storage
You’re probably aware that B2 is up to 75% less expensive than other similar cloud storage services like Amazon S3 and Microsoft Azure. Businesses and organizations are finding that projects that previously weren’t economically feasible with other Cloud Storage services are now not only possible, but a reality with B2. Here are a few recent examples:
SermonAudio wanted their media files to be readily available, but didn’t want to build and manage their own internal storage farm. Until B2, cloud storage was just too expensive to use. Now they use B2 to store their audio and video files, and also as the primary source of downloads and streaming requests from their subscribers.
Fellowship Church wanted to escape from the ever increasing amount of time they were spending saving their data to their LTO-based system. Using B2 saved countless hours of personnel time versus LTO, fit easily into their video processing workflow, and provided instant access at any time to their media library.
Panna Cooking replaced their closet full of archive hard drives with a cost-efficient hybrid-storage solution combining 45Drives and Backblaze B2 Cloud Storage. Archived media files that used to take hours to locate are now readily available regardless of whether they reside in local storage or in the B2 Cloud.
Leading companies in backup, archive, and sync continue to add B2 Cloud Storage as a storage destination for their customers. These companies realize that by offering B2 as an option, they can dramatically lower the total cost of ownership for their customers — and that’s always a good thing.
If your favorite application is not integrated to B2, you can do something about it. One integration partner told us they received over 200 customer requests for a B2 integration. The partner got the message and the integration is currently in beta test.
Below are some of the partner integrations completed in the past few months. You can check the B2 Partner Integrations page for a complete list.
Archiware — Both P5 Archive and P5 Backup can now store data in the B2 Cloud making your offsite media files readily available while keeping your off-site storage costs predictable and affordable.
Arq — Combine Arq and B2 for amazingly affordable backup of external drives, network drives, NAS devices, Windows PCs, Windows Servers, and Macs to the cloud.
GoodSync — Automatically synchronize and back up all your photos, music, email, and other important files between all your desktops, laptops, servers, external drives, and sync, or back up to B2 Cloud Storage for off-site storage.
QNAP — QNAP Hybrid Backup Sync consolidates backup, restoration, and synchronization functions into a single QTS application to easily transfer your data to local, remote, and cloud storage.
Morro Data — Their CloudNAS solution stores files in the cloud, caches them locally as needed, and syncs files globally among other CloudNAS systems in an organization.
Restic – Restic is a fast, secure, multi-platform command line backup program. Files are uploaded to a B2 bucket as de-duplicated, encrypted chunks. Each backup is a snapshot of only the data that has changed, making restores of a specific date or time easy.
Transmit 5 by Panic — Transmit 5, the gold standard for macOS file transfer apps, now supports B2. Upload, download, and manage files on tons of servers with an easy, familiar, and powerful UI.
UpdraftPlus — WordPress developers and admins can now use the UpdraftPlus Premium WordPress plugin to affordably back up their data to the B2 Cloud.
Getting Started with B2 Cloud Storage
If you’re using B2 today, thank you. If you’d like to try B2, but don’t know where to start, here’s a guide to getting started with the B2 Web Interface — no programming or scripting is required. You get 10 gigabytes of free storage and 1 gigabyte a day in free downloads. Give it a try.
Mathy Vanhoef has just published a devastating attack against WPA2, the 14-year-old encryption protocol used by pretty much all wi-fi systems. Its an interesting attack, where the attacker forces the protocol to reuse a key. The authors call this attack KRACK, for Key Reinstallation Attacks
This is yet another of a series of marketed attacks; with a cool name, a website, and a logo. The Q&A on the website answers a lot of questions about the attack and its implications. And lots of good information in this ArsTechnica article.
“Key Reinstallation Attacks: Forcing Nonce Reuse in WPA2,” by Mathy Vanhoef and Frank Piessens.
Abstract: We introduce the key reinstallation attack. This attack abuses design or implementation flaws in cryptographic protocols to reinstall an already-in-use key. This resets the key’s associated parameters such as transmit nonces and receive replay counters. Several types of cryptographic Wi-Fi handshakes are affected by the attack. All protected Wi-Fi networks use the 4-way handshake to generate a fresh session key. So far, this 14-year-old handshake has remained free from attacks, and is even proven secure. However, we show that the 4-way handshake is vulnerable to a key reinstallation attack. Here, the adversary tricks a victim into reinstalling an already-in-use key. This is achieved by manipulating and replaying handshake messages. When reinstalling the key, associated parameters such as the incremental transmit packet number (nonce) and receive packet number (replay counter) are reset to their initial value. Our key reinstallation attack also breaks the PeerKey, group key, and Fast BSS Transition (FT) handshake. The impact depends on the handshake being attacked, and the data-confidentiality protocol in use. Simplified, against AES-CCMP an adversary can replay and decrypt (but not forge) packets. This makes it possible to hijack TCP streams and inject malicious data into them. Against WPA-TKIP and GCMP the impact is catastrophic: packets can be replayed, decrypted, and forged. Because GCMP uses the same authentication key in both communication directions, it is especially affected.
Finally, we confirmed our findings in practice, and found that every Wi-Fi device is vulnerable to some variant of our attacks. Notably, our attack is exceptionally devastating against Android 6.0: it forces the client into using a predictable all-zero encryption key.
I’m just reading about this now, and will post more information as I learn it.
EDITED TO ADD: This meets my definition of brilliant. The attack is blindingly obvious once it’s pointed out, but for over a decade no one noticed it.
EDITED TO ADD: Matthew Green has a blog post on what went wrong. The vulnerability is in the interaction between two protocols. At a meta level, he blames the opaque IEEE standards process:
One of the problems with IEEE is that the standards are highly complex and get made via a closed-door process of private meetings. More importantly, even after the fact, they’re hard for ordinary security researchers to access. Go ahead and google for the IETF TLS or IPSec specifications — you’ll find detailed protocol documentation at the top of your Google results. Now go try to Google for the 802.11i standards. I wish you luck.
The IEEE has been making a few small steps to ease this problem, but they’re hyper-timid incrementalist bullshit. There’s an IEEE program called GET that allows researchers to access certain standards (including 802.11) for free, but only after they’ve been public for six months — coincidentally, about the same time it takes for vendors to bake them irrevocably into their hardware and software.
This whole process is dumb and — in this specific case — probably just cost industry tens of millions of dollars. It should stop.
Nicholas Weaver explains why most people shouldn’t worry about this:
So unless your Wi-Fi password looks something like a cat’s hairball (e.g. “:SNEIufeli7rc” — which is not guessable with a few million tries by a computer), a local attacker had the capability to determine the password, decrypt all the traffic, and join the network before KRACK.
KRACK is, however, relevant for enterprise Wi-Fi networks: networks where you needed to accept a cryptographic certificate to join initially and have to provide both a username and password. KRACK represents a new vulnerability for these networks. Depending on some esoteric details, the attacker can decrypt encrypted traffic and, in some cases, inject traffic onto the network.
But in none of these cases can the attacker join the network completely. And the most significant of these attacks affects Linux devices and Android phones, they don’t affect Macs, iPhones, or Windows systems. Even when feasible, these attacks require physical proximity: An attacker on the other side of the planet can’t exploit KRACK, only an attacker in the parking lot can.
Today we’re launching support for multiple TLS/SSL certificates on Application Load Balancers (ALB) using Server Name Indication (SNI). You can now host multiple TLS secured applications, each with its own TLS certificate, behind a single load balancer. In order to use SNI, all you need to do is bind multiple certificates to the same secure listener on your load balancer. ALB will automatically choose the optimal TLS certificate for each client. These new features are provided at no additional charge.
If you’re looking for a TL;DR on how to use this new feature just click here. If you’re like me and you’re a little rusty on the specifics of Transport Layer Security (TLS) then keep reading.
TLS? SSL? SNI?
People tend to use the terms SSL and TLS interchangeably even though the two are technically different. SSL technically refers to a predecessor of the TLS protocol. To keep things simple I’ll be using the term TLS for the rest of this post.
TLS is a protocol for securely transmitting data like passwords, cookies, and credit card numbers. It enables privacy, authentication, and integrity of the data being transmitted. TLS uses certificate based authentication where certificates are like ID cards for your websites. You trust the person that signed and issued the certificate, the certificate authority (CA), so you trust that the data in the certificate is correct. When a browser connects to your TLS-enabled ALB, ALB presents a certificate that contains your site’s public key, which has been cryptographically signed by a CA. This way the client can be sure it’s getting the ‘real you’ and that it’s safe to use your site’s public key to establish a secure connection.
With SNI support we’re making it easy to use more than one certificate with the same ALB. The most common reason you might want to use multiple certificates is to handle different domains with the same load balancer. It’s always been possible to use wildcard and subject-alternate-name (SAN) certificates with ALB, but these come with limitations. Wildcard certificates only work for related subdomains that match a simple pattern and while SAN certificates can support many different domains, the same certificate authority has to authenticate each one. That means you have reauthenticate and reprovision your certificate everytime you add a new domain.
One of our most frequent requests on forums, reddit, and in my e-mail inbox has been to use the Server Name Indication (SNI) extension of TLS to choose a certificate for a client. Since TLS operates at the transport layer, below HTTP, it doesn’t see the hostname requested by a client. SNI works by having the client tell the server “This is the domain I expect to get a certificate for” when it first connects. The server can then choose the correct certificate to respond to the client. All modern web browsers and a large majority of other clients support SNI. In fact, today we see SNI supported by over 99.5% of clients connecting to CloudFront.
Smart Certificate Selection on ALB
ALB’s smart certificate selection goes beyond SNI. In addition to containing a list of valid domain names, certificates also describe the type of key exchange and cryptography that the server supports, as well as the signature algorithm (SHA2, SHA1, MD5) used to sign the certificate. To establish a TLS connection, a client starts a TLS handshake by sending a “ClientHello” message that outlines the capabilities of the client: the protocol versions, extensions, cipher suites, and compression methods. Based on what an individual client supports, ALB’s smart selection algorithm chooses a certificate for the connection and sends it to the client. ALB supports both the classic RSA algorithm and the newer, hipper, and faster Elliptic-curve based ECDSA algorithm. ECDSA support among clients isn’t as prevalent as SNI, but it is supported by all modern web browsers. Since it’s faster and requires less CPU, it can be particularly useful for ultra-low latency applications and for conserving the amount of battery used by mobile applications. Since ALB can see what each client supports from the TLS handshake, you can upload both RSA and ECDSA certificates for the same domains and ALB will automatically choose the best one for each client.
ALB Access Logs now include the client’s requested hostname and the certificate ARN used. If the “hostname” field is empty (represented by a “-“) the client did not use the SNI extension in their request.
You can use any of your certificates in ACM or IAM.
You can bind multiple certificates for the same domain(s) to a secure listener. Your ALB will choose the optimal certificate based on multiple factors including the capabilities of the client.
If the client does not support SNI your ALB will use the default certificate (the one you specified when you created the listener).
There are three new ELB API calls: AddListenerCertificates, RemoveListenerCertificates, and DescribeListenerCertificates.
You can bind up to 25 certificates per load balancer (not counting the default certificate).
Overall, I will personally use this feature and I’m sure a ton of AWS users will benefit from it as well. I want to thank the Elastic Load Balancing team for all their hard work in getting this into the hands of our users.
TL;DR: you may now configure systemd to dynamically allocate a UNIX user ID for service processes when it starts them and release it when it stops them. It’s pretty secure, mixes well with transient services, socket activated services and service templating.
Today we released systemd 235. Among other improvements this greatly extends the dynamic user logic of systemd. Dynamic users are a powerful but little known concept, supported in its basic form since systemd 232. With this blog story I hope to make it a bit better known.
The UNIX user concept is the most basic and well-understood security concept in POSIX operating systems. It is UNIX/POSIX’ primary security concept, the one everybody can agree on, and most security concepts that came after it (such as process capabilities, SELinux and other MACs, user name-spaces, …) in some form or another build on it, extend it or at least interface with it. If you build a Linux kernel with all security features turned off, the user concept is pretty much the one you’ll still retain.
Originally, the user concept was introduced to make multi-user systems a reality, i.e. systems enabling multiple human users to share the same system at the same time, cleanly separating their resources and protecting them from each other. The majority of today’s UNIX systems don’t really use the user concept like that anymore though. Most of today’s systems probably have only one actual human user (or even less!), but their user databases (/etc/passwd) list a good number more entries than that. Today, the majority of UNIX users in most environments are system users, i.e. users that are not the technical representation of a human sitting in front of a PC anymore, but the security identity a system service — an executable program — runs as. Event though traditional, simultaneous multi-user systems slowly became less relevant, their ground-breaking basic concept became the cornerstone of UNIX security. The OS is nowadays partitioned into isolated services — and each service runs as its own system user, and thus within its own, minimal security context.
The people behind the Android OS realized the relevance of the UNIX user concept as the primary security concept on UNIX, and took its use even further: on Android not only system services take benefit of the UNIX user concept, but each UI app gets its own, individual user identity too — thus neatly separating app resources from each other, and protecting app processes from each other, too.
Back in the more traditional Linux world things are a bit less advanced in this area. Even though users are the quintessential UNIX security concept, allocation and management of system users is still a pretty limited, raw and static affair. In most cases, RPM or DEB package installation scripts allocate a fixed number of (usually one) system users when you install the package of a service that wants to take benefit of the user concept, and from that point on the system user remains allocated on the system and is never deallocated again, even if the package is later removed again. Most Linux distributions limit the number of system users to 1000 (which isn’t particularly a lot). Allocating a system user is hence expensive: the number of available users is limited, and there’s no defined way to dispose of them after use. If you make use of system users too liberally, you are very likely to run out of them sooner rather than later.
You may wonder why system users are generally not deallocated when the package that registered them is uninstalled from a system (at least on most distributions). The reason for that is one relevant property of the user concept (you might even want to call this a design flaw): user IDs are sticky to files (and other objects such as IPC objects). If a service running as a specific system user creates a file at some location, and is then terminated and its package and user removed, then the created file still belongs to the numeric ID (“UID”) the system user originally got assigned. When the next system user is allocated and — due to ID recycling — happens to get assigned the same numeric ID, then it will also gain access to the file, and that’s generally considered a problem, given that the file belonged to a potentially very different service once upon a time, and likely should not be readable or changeable by anything coming after it. Distributions hence tend to avoid UID recycling which means system users remain registered forever on a system after they have been allocated once.
The above is a description of the status quo ante. Let’s now focus on what systemd’s dynamic user concept brings to the table, to improve the situation.
Introducing Dynamic Users
With systemd dynamic users we hope to make make it easier and cheaper to allocate system users on-the-fly, thus substantially increasing the possible uses of this core UNIX security concept.
If you write a systemd service unit file, you may enable the dynamic user logic for it by setting the DynamicUser= option in its [Service] section to yes. If you do a system user is dynamically allocated the instant the service binary is invoked, and released again when the service terminates. The user is automatically allocated from the UID range 61184–65519, by looking for a so far unused UID.
Now you may wonder, how does this concept deal with the sticky user issue discussed above? In order to counter the problem, two strategies easily come to mind:
Prohibit the service from creating any files/directories or IPC objects
Automatically removing the files/directories or IPC objects the service created when it shuts down.
In systemd we implemented both strategies, but for different parts of the execution environment. Specifically:
Setting DynamicUser=yes implies ProtectSystem=strict and ProtectHome=read-only. These sand-boxing options turn off write access to pretty much the whole OS directory tree, with a few relevant exceptions, such as the API file systems /proc, /sys and so on, as well as /tmp and /var/tmp. (BTW: setting these two options on your regular services that do not use DynamicUser= is a good idea too, as it drastically reduces the exposure of the system to exploited services.)
Setting DynamicUser=yes implies PrivateTmp=yes. This option sets up /tmp and /var/tmp for the service in a way that it gets its own, disconnected version of these directories, that are not shared by other services, and whose life-cycle is bound to the service’s own life-cycle. Thus if the service goes down, the user is removed and all its temporary files and directories with it. (BTW: as above, consider setting this option for your regular services that do not use DynamicUser= too, it’s a great way to lock things down security-wise.)
Setting DynamicUser=yes implies RemoveIPC=yes. This option ensures that when the service goes down all SysV and POSIX IPC objects (shared memory, message queues, semaphores) owned by the service’s user are removed. Thus, the life-cycle of the IPC objects is bound to the life-cycle of the dynamic user and service, too. (BTW: yes, here too, consider using this in your regular services, too!)
With these four settings in effect, services with dynamic users are nicely sand-boxed. They cannot create files or directories, except in /tmp and /var/tmp, where they will be removed automatically when the service shuts down, as will any IPC objects created. Sticky ownership of files/directories and IPC objects is hence dealt with effectively.
The RuntimeDirectory= option may be used to open up a bit the sandbox to external programs. If you set it to a directory name of your choice, it will be created below /run when the service is started, and removed in its entirety when it is terminated. The ownership of the directory is assigned to the service’s dynamic user. This way, a dynamic user service can expose API interfaces (AF_UNIX sockets, …) to other services at a well-defined place and again bind the life-cycle of it to the service’s own run-time. Example: set RuntimeDirectory=foobar in your service, and watch how a directory /run/foobar appears at the moment you start the service, and disappears the moment you stop it again. (BTW: Much like the other settings discussed above, RuntimeDirectory= may be used outside of the DynamicUser= context too, and is a nice way to run any service with a properly owned, life-cycle-managed run-time directory.)
Of course, a service running in such an environment (although already very useful for many cases!), has a major limitation: it cannot leave persistent data around it can reuse on a later run. As pretty much the whole OS directory tree is read-only to it, there’s simply no place it could put the data that survives from one service invocation to the next.
With systemd 235 this limitation is removed: there are now three new settings: StateDirectory=, LogsDirectory= and CacheDirectory=. In many ways they operate like RuntimeDirectory=, but create sub-directories below /var/lib, /var/log and /var/cache, respectively. There’s one major difference beyond that however: directories created that way are persistent, they will survive the run-time cycle of a service, and thus may be used to store data that is supposed to stay around between invocations of the service.
Of course, the obvious question to ask now is: how do these three settings deal with the sticky file ownership problem?
For that we lifted a concept from container managers. Container managers have a very similar problem: each container and the host typically end up using a very similar set of numeric UIDs, and unless user name-spacing is deployed this means that host users might be able to access the data of specific containers that also have a user by the same numeric UID assigned, even though it actually refers to a very different identity in a different context. (Actually, it’s even worse than just getting access, due to the existence of setuid file bits, access might translate to privilege elevation.) The way container managers protect the container images from the host (and from each other to some level) is by placing the container trees below a boundary directory, with very restrictive access modes and ownership (0700 and root:root or so). A host user hence cannot take advantage of the files/directories of a container user of the same UID inside of a local container tree, simply because the boundary directory makes it impossible to even reference files in it. After all on UNIX, in order to get access to a specific path you need access to every single component of it.
How is that applied to dynamic user services? Let’s say StateDirectory=foobar is set for a service that has DynamicUser= turned off. The instant the service is started, /var/lib/foobar is created as state directory, owned by the service’s user and remains in existence when the service is stopped. If the same service now is run with DynamicUser= turned on, the implementation is slightly altered. Instead of a directory /var/lib/foobar a symbolic link by the same path is created (owned by root), pointing to /var/lib/private/foobar (the latter being owned by the service’s dynamic user). The /var/lib/private directory is created as boundary directory: it’s owned by root:root, and has a restrictive access mode of 0700. Both the symlink and the service’s state directory will survive the service’s life-cycle, but the state directory will remain, and continues to be owned by the now disposed dynamic UID — however it is protected from other host users (and other services which might get the same dynamic UID assigned due to UID recycling) by the boundary directory.
The obvious question to ask now is: but if the boundary directory prohibits access to the directory from unprivileged processes, how can the service itself which runs under its own dynamic UID access it anyway? This is achieved by invoking the service process in a slightly modified mount name-space: it will see most of the file hierarchy the same way as everything else on the system (modulo /tmp and /var/tmp as mentioned above), except for /var/lib/private, which is over-mounted with a read-only tmpfs file system instance, with a slightly more liberal access mode permitting the service read access. Inside of this tmpfs file system instance another mount is placed: a bind mount to the host’s real /var/lib/private/foobar directory, onto the same name. Putting this together these means that superficially everything looks the same and is available at the same place on the host and from inside the service, but two important changes have been made: the /var/lib/private boundary directory lost its restrictive character inside the service, and has been emptied of the state directories of any other service, thus making the protection complete. Note that the symlink /var/lib/foobar hides the fact that the boundary directory is used (making it little more than an implementation detail), as the directory is available this way under the same name as it would be if DynamicUser= was not used. Long story short: for the daemon and from the view from the host the indirection through /var/lib/private is mostly transparent.
This logic of course raises another question: what happens to the state directory if a dynamic user service is started with a state directory configured, gets UID X assigned on this first invocation, then terminates and is restarted and now gets UID Y assigned on the second invocation, with X ≠ Y? On the second invocation the directory — and all the files and directories below it — will still be owned by the original UID X so how could the second instance running as Y access it? Our way out is simple: systemd will recursively change the ownership of the directory and everything contained within it to UID Y before invoking the service’s executable.
Of course, such recursive ownership changing (chown()ing) of whole directory trees can become expensive (though according to my experiences, IRL and for most services it’s much cheaper than you might think), hence in order to optimize behavior in this regard, the allocation of dynamic UIDs has been tweaked in two ways to avoid the necessity to do this expensive operation in most cases: firstly, when a dynamic UID is allocated for a service an allocation loop is employed that starts out with a UID hashed from the service’s name. This means a service by the same name is likely to always use the same numeric UID. That means that a stable service name translates into a stable dynamic UID, and that means recursive file ownership adjustments can be skipped (of course, after validation). Secondly, if the configured state directory already exists, and is owned by a suitable currently unused dynamic UID, it’s preferably used above everything else, thus maximizing the chance we can avoid the chown()ing. (That all said, ultimately we have to face it, the currently available UID space of 4K+ is very small still, and conflicts are pretty likely sooner or later, thus a chown()ing has to be expected every now and then when this feature is used extensively).
Note that CacheDirectory= and LogsDirectory= work very similar to StateDirectory=. The only difference is that they manage directories below the /var/cache and /var/logs directories, and their boundary directory hence is /var/cache/private and /var/log/private, respectively.
So, after all this introduction, let’s have a look how this all can be put together. Here’s a trivial example:
# cat > /etc/systemd/system/dynamic-user-test.service <<EOF[Service]ExecStart=/usr/bin/sleep 4711DynamicUser=yes
# systemctl daemon-reload# systemctl start dynamic-user-test# systemctl status dynamic-user-test
Loaded: loaded (/etc/systemd/system/dynamic-user-test.service; static; vendor preset: disabled)
Active: active (running) since Fri 2017-10-06 13:12:25 CEST; 3s ago
Main PID: 2967(sleep)
Tasks: 1(limit: 4915)
└─2967 /usr/bin/sleep 4711
Okt 0613:12:25 sigma systemd: Started dynamic-user-test.service.
# ps -e -o pid,comm,user | grep 29672967 sleep dynamic-user-test
# id dynamic-user-testuid=64642(dynamic-user-test)gid=64642(dynamic-user-test)groups=64642(dynamic-user-test)# systemctl stop dynamic-user-test# id dynamic-user-test
id: ‘dynamic-user-test’: no such user
In this example, we create a unit file with DynamicUser= turned on, start it, check if it’s running correctly, have a look at the service process’ user (which is named like the service; systemd does this automatically if the service name is suitable as user name, and you didn’t configure any user name to use explicitly), stop the service and verify that the user ceased to exist too.
That’s already pretty cool. Let’s step it up a notch, by doing the same in an interactive transient service (for those who don’t know systemd well: a transient service is a service that is defined and started dynamically at run-time, for example via the systemd-run command from the shell. Think: run a service without having to write a unit file first):
# systemd-run --pty --property=DynamicUser=yes --property=StateDirectory=wuff /bin/sh
Running as unit: run-u15750.service
Press ^] three times within 1s to disconnect TTY.
sh-4.4$ ls -al /var/lib/private/
drwxr-xr-x. 3 root root 606. Okt 13:21 .
drwxr-xr-x. 1 root root 8526. Okt 13:21 ..
drwxr-xr-x. 1 run-u15750 run-u15750 86. Okt 13:22 wuff
sh-4.4$ ls -ld /var/lib/wuff
lrwxrwxrwx. 1 root root 126. Okt 13:21 /var/lib/wuff -> private/wuff
sh-4.4$ ls -ld /var/lib/wuff/
drwxr-xr-x. 1 run-u15750 run-u15750 06. Okt 13:21 /var/lib/wuff/
sh-4.4$ echo hello > /var/lib/wuff/test
sh-4.4$ exitexit# id run-u15750
id: ‘run-u15750’: no such user
# ls -al /var/lib/private
drwx------. 1 root root 666. Okt 13:21 .
drwxr-xr-x. 1 root root 8526. Okt 13:21 ..
drwxr-xr-x. 1631226312286. Okt 13:22 wuff
# ls -ld /var/lib/wuff
lrwxrwxrwx. 1 root root 126. Okt 13:21 /var/lib/wuff -> private/wuff
# ls -ld /var/lib/wuff/
drwxr-xr-x. 1631226312286. Okt 13:22 /var/lib/wuff/
# cat /var/lib/wuff/test
The above invokes an interactive shell as transient service run-u15750.service (systemd-run picked that name automatically, since we didn’t specify anything explicitly) with a dynamic user whose name is derived automatically from the service name. Because StateDirectory=wuff is used, a persistent state directory for the service is made available as /var/lib/wuff. In the interactive shell running inside the service, the ls commands show the /var/lib/private boundary directory and its contents, as well as the symlink that is placed for the service. Finally, before exiting the shell, a file is created in the state directory. Back in the original command shell we check if the user is still allocated: it is not, of course, since the service ceased to exist when we exited the shell and with it the dynamic user associated with it. From the host we check the state directory of the service, with similar commands as we did from inside of it. We see that things are set up pretty much the same way in both cases, except for two things: first of all the user/group of the files is now shown as raw numeric UIDs instead of the user/group names derived from the unit name. That’s because the user ceased to exist at this point, and “ls” shows the raw UID for files owned by users that don’t exist. Secondly, the access mode of the boundary directory is different: when we look at it from outside of the service it is not readable by anyone but root, when we looked from inside we saw it it being world readable.
Now, let’s see how things look if we start another transient service, reusing the state directory from the first invocation:
# systemd-run --pty --property=DynamicUser=yes --property=StateDirectory=wuff /bin/sh
Running as unit: run-u16087.service
Press ^] three times within 1s to disconnect TTY.
sh-4.4$ cat /var/lib/wuff/test
sh-4.4$ ls -al /var/lib/wuff/
drwxr-xr-x. 1 run-u16087 run-u16087 86. Okt 13:22 .
drwxr-xr-x. 3 root root 606. Okt 15:42 ..
-rw-r--r--. 1 run-u16087 run-u16087 66. Okt 13:22 test
Here, systemd-run picked a different auto-generated unit name, but the used dynamic UID is still the same, as it was read from the pre-existing state directory, and was otherwise unused. As we can see the test file we generated earlier is accessible and still contains the data we left in there. Do note that the user name is different this time (as it is derived from the unit name, which is different), but the UID it is assigned to is the same one as on the first invocation. We can thus see that the mentioned optimization of the UID allocation logic (i.e. that we start the allocation loop from the UID owner of any existing state directory) took effect, so that no recursive chown()ing was required.
And that’s the end of our example, which hopefully illustrated a bit how this concept and implementation works.
Now that we had a look at how to enable this logic for a unit and how it is implemented, let’s discuss where this actually could be useful in real life.
One major benefit of dynamic user IDs is that running a privilege-separated service leaves no artifacts in the system. A system user is allocated and made use of, but it is discarded automatically in a safe and secure way after use, in a fashion that is safe for later recycling. Thus, quickly invoking a short-lived service for processing some job can be protected properly through a user ID without having to pre-allocate it and without this draining the available UID pool any longer than necessary.
In many cases, starting a service no longer requires package-specific preparation. Or in other words, quite often useradd/mkdir/chown/chmod invocations in “post-inst” package scripts, as well as sysusers.d and tmpfiles.d drop-ins become unnecessary, as the DynamicUser= and StateDirectory=/CacheDirectory=/LogsDirectory= logic can do the necessary work automatically, on-demand and with a well-defined life-cycle.
By combining dynamic user IDs with the transient unit concept, new creative ways of sand-boxing are made available. For example, let’s say you don’t trust the correct implementation of the sort command. You can now lock it into a simple, robust, dynamic UID sandbox with a simple systemd-run and still integrate it into a shell pipeline like any other command. Here’s an example, showcasing a shell pipeline whose middle element runs as a dynamically on-the-fly allocated UID, that is released when the pipelines ends.
By combining dynamic user IDs with the systemd templating logic it is now possible to do much more fine-grained and fully automatic UID management. For example, let’s say you have a template unit file /etc/systemd/system/[email protected]:
And you are done. (Invoke this as many times as you like, each time replacing customerxyz by some customer identifier, you get the idea.)
By combining dynamic user IDs with socket activation you may easily implement a system where each incoming connection is served by a process instance running as a different, fresh, newly allocated UID within its own sandbox. Here’s an example waldo.socket:
With the two unit files above, systemd will listen on TCP/IP port 2048, and for each incoming connection invoke a fresh instance of [email protected], each time utilizing a different, new, dynamically allocated UID, neatly isolated from any other instance.
Dynamic user IDs combine very well with state-less systems, i.e. systems that come up with an unpopulated /etc and /var. A service using dynamic user IDs and the StateDirectory=, CacheDirectory=, LogsDirectory= and RuntimeDirectory= concepts will implicitly allocate the users and directories it needs for running, right at the moment where it needs it.
Dynamic users are a very generic concept, hence a multitude of other uses are thinkable; the list above is just supposed to trigger your imagination.
What does this mean for you as a packager?
I am pretty sure that a large number of services shipped with today’s distributions could benefit from using DynamicUser= and StateDirectory= (and related settings). It often allows removal of post-inst packaging scripts altogether, as well as any sysusers.d and tmpfiles.d drop-ins by unifying the needed declarations in the unit file itself. Hence, as a packager please consider switching your unit files over. That said, there are a number of conditions where DynamicUser= and StateDirectory= (and friends) cannot or should not be used. To name a few:
Service that need to write to files outside of /run/<package>, /var/lib/<package>, /var/cache/<package>, /var/log/<package>, /var/tmp, /tmp, /dev/shm are generally incompatible with this scheme. This rules out daemons that upgrade the system as one example, as that involves writing to /usr.
Services that maintain a herd of processes with different user IDs. Some SMTP services are like this. If your service has such a super-server design, UID management needs to be done by the super-server itself, which rules out systemd doing its dynamic UID magic for it.
Services which run as root (obviously…) or are otherwise privileged.
Services that need to live in the same mount name-space as the host system (for example, because they want to establish mount points visible system-wide). As mentioned DynamicUser= implies ProtectSystem=, PrivateTmp= and related options, which all require the service to run in its own mount name-space.
Your focus is older distributions, i.e. distributions that do not have systemd 232 (for DynamicUser=) or systemd 235 (for StateDirectory= and friends) yet.
If your distribution’s packaging guides don’t allow it. Consult your packaging guides, and possibly start a discussion on your distribution’s mailing list about this.
A couple of additional, random notes about the implementation and use of these features:
Do note that allocating or deallocating a dynamic user leaves /etc/passwd untouched. A dynamic user is added into the user database through the glibc NSS module nss-systemd, and this information never hits the disk.
On traditional UNIX systems it was the job of the daemon process itself to drop privileges, while the DynamicUser= concept is designed around the service manager (i.e. systemd) being responsible for that. That said, since v235 there’s a way to marry DynamicUser= and such services which want to drop privileges on their own. For that, turn on DynamicUser= and set User= to the user name the service wants to setuid() to. This has the effect that systemd will allocate the dynamic user under the specified name when the service is started. Then, prefix the command line you specify in ExecStart= with a single ! character. If you do, the user is allocated for the service, but the daemon binary is is invoked as root instead of the allocated user, under the assumption that the daemon changes its UID on its own the right way. Not that after registration the user will show up instantly in the user database, and is hence resolvable like any other by the daemon process. Example: ExecStart=!/usr/bin/mydaemond
You may wonder why systemd uses the UID range 61184–65519 for its dynamic user allocations (side note: in hexadecimal this reads as 0xEF00–0xFFEF). That’s because distributions (specifically Fedora) tend to allocate regular users from below the 60000 range, and we don’t want to step into that. We also want to stay away from 65535 and a bit around it, as some of these UIDs have special meanings (65535 is often used as special value for “invalid” or “no” UID, as it is identical to the 16bit value -1; 65534 is generally mapped to the “nobody” user, and is where some kernel subsystems map unmappable UIDs). Finally, we want to stay within the 16bit range. In a user name-spacing world each container tends to have much less than the full 32bit UID range available that Linux kernels theoretically provide. Everybody apparently can agree that a container should at least cover the 16bit range though — already to include a nobody user. (And quite frankly, I am pretty sure assigning 64K UIDs per container is nicely systematic, as the the higher 16bit of the 32bit UID values this way become a container ID, while the lower 16bit become the logical UID within each container, if you still follow what I am babbling here…). And before you ask: no this range cannot be changed right now, it’s compiled in. We might change that eventually however.
You might wonder what happens if you already used UIDs from the 61184–65519 range on your system for other purposes. systemd should handle that mostly fine, as long as that usage is properly registered in the user database: when allocating a dynamic user we pick a UID, see if it is currently used somehow, and if yes pick a different one, until we find a free one. Whether a UID is used right now or not is checked through NSS calls. Moreover the IPC object lists are checked to see if there are any objects owned by the UID we are about to pick. This means systemd will avoid using UIDs you have assigned otherwise. Note however that this of course makes the pool of available UIDs smaller, and in the worst cases this means that allocating a dynamic user might fail because there simply are no unused UIDs in the range.
If not specified otherwise the name for a dynamically allocated user is derived from the service name. Not everything that’s valid in a service name is valid in a user-name however, and in some cases a randomized name is used instead to deal with this. Often it makes sense to pick the user names to register explicitly. For that use User= and choose whatever you like.
If you pick a user name with User= and combine it with DynamicUser= and the user already exists statically it will be used for the service and the dynamic user logic is automatically disabled. This permits automatic up- and downgrades between static and dynamic UIDs. For example, it provides a nice way to move a system from static to dynamic UIDs in a compatible way: as long as you select the same User= value before and after switching DynamicUser= on, the service will continue to use the statically allocated user if it exists, and only operates in the dynamic mode if it does not. This is useful for other cases as well, for example to adapt a service that normally would use a dynamic user to concepts that require statically assigned UIDs, for example to marry classic UID-based file system quota with such services.
systemd always allocates a pair of dynamic UID and GID at the same time, with the same numeric ID.
If the Linux kernel had a “shiftfs” or similar functionality, i.e. a way to mount an existing directory to a second place, but map the exposed UIDs/GIDs in some way configurable at mount time, this would be excellent for the implementation of StateDirectory= in conjunction with DynamicUser=. It would make the recursive chown()ing step unnecessary, as the host version of the state directory could simply be mounted into a the service’s mount name-space, with a shift applied that maps the directory’s owner to the services’ UID/GID. But I don’t have high hopes in this regard, as all work being done in this area appears to be bound to user name-spacing — which is a concept not used here (and I guess one could say user name-spacing is probably more a source of problems than a solution to one, but you are welcome to disagree on that).
Security updates have been issued by CentOS (augeas, samba, and samba4), Debian (apache2, bluez, emacs23, and newsbeuter), Fedora (kernel and mingw-LibRaw), openSUSE (apache2 and libzip), Oracle (kernel), SUSE (kernel, spice, and xen), and Ubuntu (emacs24, emacs25, and samba).
Use the Join button above to receive notification of new posts in this series.
In my previous posts, I talked about coming up with an idea, determining the solution, and getting your first customers. But you’re building a company, not a product. Let’s talk about what the first year should look like.
The primary goals for that first year are to: 1) set up the company; 2) build, launch, and learn; and 3) survive.
Setting Up the Company
The company you’re building is more than the product itself, and you’re not going to do it alone. You don’t want to spend too much time on this since getting customers is key, but if you don’t set up the basics, there are all sorts of issues down the line.
Find Your Co-Founders & Determine Roles
You may already have the idea, but who do you need to execute it? At Backblaze, we needed people to build the web experience, the client backup application, and the server/storage side. We also needed someone to handle the business/marketing aspects, and we felt that the design and user experience were critical. As a result, we started with five co-founders: three engineers, a designer, and me for the business and marketing.
Of course not every role needs to be filled by a co-founder. You can hire employees for positions as well. But think through the strategic skills you’ll need to launch and consider co-founders with those skill sets.
Too many people think they can just “work together” on everything. Don’t. Determine roles as quickly as possible so that it’s clear who is responsible for what work and which decisions. We were lucky in that we had worked together and thus knew what each person would do, but even so we assigned titles early on to clarify roles.
Takeaway: Fill critical roles and explicitly split roles and responsibilities.
Get Your Legal Basics In Place
When we’re excited about building a product, legal basics are often the last thing we want to deal with. You don’t need to go overboard, but it’s critical to get certain things done.
Determine ownership split. What is the percentage breakdown of the company that each of the founders will own? It can be a tough discussion, but it only becomes more difficult later when there is more value and people have put more time into it. At Backblaze we split the equity equally five ways. This is uncommon. The benefit of this is that all the founders feel valued and “in it together.” The benefit of the more common split where someone has a dominant share is that person is typically empowered to be the ultimate decision-maker. Slicing Pie provides some guidance on how to think about splitting equity. Regardless of which way you want you go, don’t put it off.
Incorporate. Hard to be a company if you’re not. There are various formats, but if you plan to raise angel/venture funding, a Delaware-based C-corp is standard.
Deal With Stock. At a minimum, issue stock to the founders, have each one buy their shares, and file an 83(b). Buying your shares at this stage might be $100. Filing the 83(b) election marks the date at which you purchased your shares, and shows that you bought them for what they were worth. This one piece of paper paper can make the difference between paying long-term capital gains rates (~20%) or income tax rates (~40%).
Assign Intellectual Property. Ask everyone to sign a Proprietary Information and Inventions Assignment (“PIIA”). This document says that what they do at the company is owned by the company. Early on we had a friend who came by and brainstormed ideas. We thought of it as interesting banter. He later said he owned part of our storage design. While we worked it out together, a PIIA makes ownership clear.
The ownership split can be worked out by the founders directly. For the other items, I would involve lawyers. Some law firms will set up the basics and defer payment until you raise money or the business can pay for services out of operations. Gunderson Dettmer did that for us (ask for Bennett Yee). Cooley will do this on a casey-by-case basis as well.
Takeaway: Don’t let the excitement of building a company distract you from filing the basic legal documents required to protect and grow your company.
Get Health Insurance
This item may seem out of place, but not having health insurance can easily bankrupt you personally, and that certainly won’t bode well for your company. While you can buy individual health insurance, it will often be less expensive to buy it as a company. Also, it will make recruiting employees more difficult if you do not offer healthcare. When we contacted brokers they asked us to send the W-2 of each employee that wanted coverage, but the founders weren’t taking a salary at first. To work around this, make the founders ‘officers’ of the company, and the healthcare brokers can then insure them. (Of course, you need to be ok with your co-founders being officers, but hopefully, that is logical anyway.)
Takeaway: Don’t take your co-founders’ physical and financial health for granted. Health insurance can serve as both individual protection and a recruiting tool for future employees.
Building, Launching & Learning
Getting the company set up gives you the foundation, but ultimately a company with no product and no customers isn’t very interesting.
Ideally, you have one person on the team focusing on all of the items above and everyone else can be heads-down building product. There is a lot to say about building product, but for this post, I’ll just say that your goal is to get something out the door that is good enough to start collecting feedback. It doesn’t have to have every feature you dream of and doesn’t have to support 1 billion users on day one.
If you’re building a car or rocket, that may take some time. But with the availability of open-source software and cloud services, most startups should launch inside of a year.
Launching forces a scoping of the feature set to what’s critical, rallies the company around a goal, starts building awareness of your company and solution, and pushes forward the learning process. Backblaze launched in public beta on June 2, 2008, eight months after the founders all started working on it full-time.
Takeaway: Focus on the most important features and launch.
Learn & Iterate
As much as we think we know about the customers and their needs, the launch process and beyond opens up all sorts of insights. This early period is critical to collect feedback and iterate, especially while both the product and company are still quite malleable. We initially planned on building peer-to-peer and local backup immediately on the heels of our online offering, but after launching found minimal demand for those features. On the other hand, there was tremendous demand from companies and resellers.
Takeaway: Use the critical post-launch period to collect feedback and iterate.
“Live to fight another day.” If the company doesn’t survive, it’s hard to change the world. Let’s talk about some of the survival components.
Consider What You As A Founding Team Want & How You Work
Are you doing this because you hope to get rich? See yourself on the cover of Fortune? Make your own decisions? Work from home all the time? Founder fighting is the number one reason companies fail; the founders need to be on the same page as much as possible.
At Backblaze we agreed very early on that we wanted three things:
Build products we were proud of
This has driven various decisions over the years and has evolved into being part of the culture. For example, while Backblaze is absolutely a company with a profit motive, we do not compromise the product to make more money. Other directions are not bad; they’re just different.
Pretend you’re getting married to each other. Do some introspection and talk about your vision of the future a lot. Do you expect everyone to work 20 or 100 hours every week? In the office or remote? How do you like to work? What pet peeves do you have?
When getting married each person brings the “life they’ve known,” often influenced by the life their parents lived. Together they need to decide which aspects of their previous lives they want to keep, toss, or change. As founders coming together, you have the same opportunity for your new company.
Takeaway: In order for a company to survive, the founders must agree on what they want the company to be. Have the discussions early.
Determine How You Will Fund Your Business
Raising venture capital is often seen as the only path, and considered the most important thing to start doing on day one. However, there are a variety of options for funding your business, including using money from savings, part-time work, friends & family money, loans, angels, and customers. Consider the right option for you, your founding team, and your business.
Whichever option you choose for funding your business, chances are high that you will not be flush with cash on day one. In certain situations, you actually don’t want to conserve cash because you’ve raised $100m and now you want to run as fast as you can to capture a market — cash is plentiful and time is not. However, with the exception of founder struggles, running out of cash is the most common way companies go under. There are many ways to conserve cash — limit hiring of employees and consultants, use lawyers and accountants sparingly, don’t spend on advertising, work from a home office, etc. The most important way is to simply ensure that you and your team are cash conscious, challenging decisions that commit you to spending cash.
Backblaze spent a total of $94,122 to get to public beta launch. That included building the backup application, our own server infrastructure, the website with account/billing/restore functionality, the marketing involved in getting to launch, and all the steps above in setting up the company, paying for healthcare, etc. The five founders took no salary during this time (which, of course, would have cost dramatically more), so most of this money went to computers, servers, hard drives, and other infrastructure.
Takeaway: Minimize cash burn — it extends your runway and gives you options.
Slowly Flesh Out Your Team
We started with five co-founders, and thus a fairly fleshed-out team. A year in, we only added one person, a Mac architect. Three months later we shipped a beta of our Mac version, which has resulted in more than 50% of our revenue.
Minimizing hiring is key to cash conservation, and hiring ahead of getting market feedback is risky since you may realize that the talent you need will change. However, once you start getting feedback, think about the key people that you need to move your company forward. But be rigorous in determining whether they’re critical. We didn’t hire our first customer support person until all five founders were spending 20% of their time on it.
Takeaway: Don’t hire in anticipation of market growth; hire to fuel the growth.
Keep Your Spirits Up
Startups are roller coasters of emotion. There have been some serious articles about founders suffering from depression and worse. The idea phase is exhilarating, then there is the slog of building. The launch is a blast, but the week after there are crickets.
On June 2, 2008, we launched in public beta with great press and hordes of customers. But a few months later we were signing up only about 10 new customers per month. That’s $50 new monthly recurring revenue (MRR) after a year of work and no salary.
On August 25, 2008, we brought on our Mac architect. Two months later, on October 26, 2008, Apple launched Time Machine — completely free and built-in backup for all Macs.
There were plenty of times when our prospects looked bleak. In the rearview mirror it’s easy to say, “well sure, but now you have lots of customers,” or “yes, but Time Machine doesn’t do cloud backup.” But at the time neither of these were a given.
Takeaway: Getting up each day and believing that as a team you’ll figure it out will let you get to the point where you can look in the rearview mirror and say, “It looked bleak back then.”
Succeeding in Your First Year
I titled the post “Surviving Your First Year,” but if you manage to, 1) set up the company; 2) build, launch, and learn; and 3) survive, you will have done more than survive: you’ll have truly succeeded in your first year.
Security updates have been issued by Arch Linux (tomcat7), Debian (kernel and perl), Fedora (libwmf and mpg123), Mageia (bluez, ffmpeg, gstreamer0.10-plugins-good, gstreamer1.0-plugins-good, libwmf, tomcat, and tor), openSUSE (emacs, fossil, freexl, php5, and xen), Red Hat (augeas, rh-mysql56-mysql, samba, and samba4), Scientific Linux (augeas, samba, and samba4), Slackware (samba), SUSE (emacs and kernel), and Ubuntu (qemu).
Security updates have been issued by CentOS (emacs), Debian (apache2, gdk-pixbuf, and pyjwt), Fedora (autotrace, converseen, dmtx-utils, drawtiming, emacs, gtatool, imageinfo, ImageMagick, inkscape, jasper, k3d, kxstitch, libwpd, mingw-libzip, perl-Image-SubImageFind, pfstools, php-pecl-imagick, psiconv, q, rawtherapee, ripright, rss-glx, rubygem-rmagick, synfig, synfigstudio, techne, vdr-scraper2vdr, vips, and WindowMaker), Oracle (emacs and kernel), Red Hat (emacs and kernel), Scientific Linux (emacs), SUSE (emacs), and Ubuntu (apache2).
The collective thoughts of the interwebz
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.