Amazon Aurora MySQL (Aurora MySQL) is a managed relational database engine, wire-compatible with MySQL 5.6. Most of the drivers, connectors, and tools that you currently use with MySQL can be used with Aurora MySQL with little or no change.
Aurora MySQL database (DB) clusters provide advanced features such as:
One primary instance that supports read and write operations and up to 15 Aurora Replicas that support read-only operations, and can be automatically promoted to the primary role if the current primary instance fails
A cluster endpoint that automatically follows the primary instance in case of failover
A reader endpoint that includes all Aurora Replicas and is automatically updated when Aurora Replicas are added or removed
Internal server connection pooling and thread multiplexing for improved scalability
Near-instantaneous database restarts and crash recovery
Access to near-real-time cluster metadata that allows application developers to build “smart drivers,” connecting directly to individual instances based on their read-write or read-only role
Client-side components that are not well configured (applications, drivers, connectors, proxies) may be unable to react to recovery actions and DB cluster topology changes, or the reaction may be delayed, contributing to unexpected downtime and performance issues. In order to prevent that and make the most of Aurora MySQL features, database administrators (DBAs) and application developers are encouraged to implement best practices.
You can read more about these best practices in the recently-published Amazon Aurora MySQL DBA Handbook – Connection Management whitepaper.
About the Author
Szymon Komendera is a database engineer on the Amazon Aurora team.
We made it easier for you to comply with regulatory standards by controlling access to AWS Regions using IAM policies. For example, if your company requires users to create resources in a specific AWS region, you can now add a new condition to the IAM policies you attach to your IAM principal (user or role) to enforce this for all AWS services. In this post, I review conditions in policies, introduce the new condition, and review a policy example to demonstrate how you can control access across multiple AWS services to a specific region.
Before I introduce the new condition, let’s review the condition element of an IAM policy. A condition is an optional IAM policy element that lets you specify special circumstances under which the policy grants or denies permission. A condition includes a condition key, operator, and value for the condition. There are two types of conditions: service-specific conditions and global conditions. Service-specific conditions are specific to certain actions in an AWS service. For example, the condition key ec2:InstanceType supports specific EC2 actions. Global conditions support all actions across all AWS services.
Now that I’ve reviewed the condition element in an IAM policy, let me introduce the new condition.
AWS:RequestedRegion condition key
The new global condition key, , supports all actions across all AWS services. You can use any string operator and specify any AWS region for its value.
Allows you to specify the region to which the IAM principal (user or role) can make API calls
I’ll now demonstrate the use of the new global condition key.
Example: Policy with region-level control
Let’s say a group of software developers in my organization is working on a project using Amazon EC2 and Amazon RDS. The project requires a web server running on an EC2 instance using Amazon Linux and a MySQL database instance in RDS. The developers also want to test Amazon Lambda, an event-driven platform, to retrieve data from the MySQL DB instance in RDS for future use.
My organization requires all the AWS resources to remain in the Frankfurt, eu-central-1, region. To make sure this project follows these guidelines, I create a single IAM policy for all the AWS services that this group is going to use and apply the new global condition key aws:RequestedRegion for all the services. This way I can ensure that any new EC2 instances launched or any database instances created using RDS are in Frankfurt. This policy also ensures that any Lambda functions this group creates for testing are also in the Frankfurt region.
The first statement in the above example contains all the read-only actions that let my developers use the console for EC2, RDS, and Lambda. The permissions for IAM-related actions are required to launch EC2 instances with a role, enable enhanced monitoring in RDS, and for AWS Lambda to assume the IAM execution role to execute the Lambda function. I’ve combined all the read-only actions into a single statement for simplicity. The second statement is where I give write access to my developers for the three services and restrict the write access to the Frankfurt region using the aws:RequestedRegion condition key. You can also list multiple AWS regions with the new condition key if your developers are allowed to create resources in multiple regions. The third statement grants permissions for the IAM action iam:PassRole required by AWS Lambda. For more information on allowing users to create a Lambda function, see Using Identity-Based Policies for AWS Lambda.
You can now use the aws:RequestedRegion global condition key in your IAM policies to specify the region to which the IAM principal (user or role) can invoke an API call. This capability makes it easier for you to restrict the AWS regions your IAM principals can use to comply with regulatory standards and improve account security. For more information about this global condition key and policy examples using aws:RequestedRegion, see the IAM documentation.
If you have comments about this post, submit them in the Comments section below. If you have questions about or suggestions for this solution, start a new thread on the IAM forum.
Want more AWS Security news? Follow us on Twitter.
Kernel developers go to some lengths to mark read-only data so that it can be protected by the system’s memory-management unit. Memory that cannot be changed cannot be altered by an attacker to corrupt the system. But the kernel’s mechanisms for managing read-only memory do not work for memory that must be initialized after the initial system bootstrap has completed. A patch set from Igor Stoppa seeks to change that situation by creating a new API just for late-initialized read-only data.
A customer has been successfully creating and running multiple Amazon Elasticsearch Service (Amazon ES) domains to support their business users’ search needs across products, orders, support documentation, and a growing suite of similar needs. The service has become heavily used across the organization. This led to some domains running at 100% capacity during peak times, while others began to run low on storage space. Because of this increased usage, the technical teams were in danger of missing their service level agreements. They contacted me for help.
This post shows how you can set up automated alarms to warn when domains need attention.
Amazon ES is a fully managed service that delivers Elasticsearch’s easy-to-use APIs and real-time analytics capabilities along with the availability, scalability, and security that production workloads require. The service offers built-in integrations with a number of other components and AWS services, enabling customers to go from raw data to actionable insights quickly and securely.
One of these other integrated services is Amazon CloudWatch. CloudWatch is a monitoring service for AWS Cloud resources and the applications that you run on AWS. You can use CloudWatch to collect and track metrics, collect and monitor log files, set alarms, and automatically react to changes in your AWS resources.
CloudWatch collects metrics for Amazon ES. You can use these metrics to monitor the state of your Amazon ES domains, and set alarms to notify you about high utilization of system resources. For more information, see Amazon Elasticsearch Service Metrics and Dimensions.
While the metrics are automatically collected, the missing piece is how to set alarms on these metrics at appropriate levels for each of your domains. This post includes sample Python code to evaluate the current state of your Amazon ES environment, and to set up alarms according to AWS recommendations and best practices.
There are two components to the sample solution:
es-check-cwalarms.py: This Python script checks the CloudWatch alarms that have been set, for all Amazon ES domains in a given account and region.
The sample code can also be found in the amazon-es-check-cw-alarms GitHub repo. The scripts are easy to extend or combine, as described in the section “Extensions and Adaptations”.
Assessing the current state
The first script, es-check-cwalarms.py, is used to give an overview of the configurations and alarm settings for all the Amazon ES domains in the given region. The script takes the following parameters:
python es-checkcwalarms.py -h
usage: es-checkcwalarms.py [-h] [-e ESPREFIX] [-n NOTIFY] [-f FREE][-p PROFILE] [-r REGION]
Checks a set of recommended CloudWatch alarms for Amazon Elasticsearch Service domains (optionally, those beginning with a given prefix).
-h, – help show this help message and exit
-e ESPREFIX, – esprefix ESPREFIX Only check Amazon Elasticsearch Service domains that begin with this prefix.
-n NOTIFY, – notify NOTIFY List of CloudWatch alarm actions; e.g. ['arn:aws:sns:xxxx']
-f FREE, – free FREE Minimum free storage (MB) on which to alarm
-p PROFILE, – profile PROFILE IAM profile name to use
-r REGION, – region REGION AWS region for the domain. Default: us-east-1
The script first identifies all the domains in the given region (or, optionally, limits them to the subset that begins with a given prefix). It then starts running a set of checks against each one.
The script can be run from the command line or set up as a scheduled Lambda function. For example, for one customer, it was deemed appropriate to regularly run the script to check that alarms were correctly set for all domains. In addition, because configuration changes—cluster size increases to accommodate larger workloads being a common change—might require updates to alarms, this approach allowed the automatic identification of alarms no longer appropriately set as the domain configurations changed.
The output shown below is the output for one domain in my account.
Starting checks for Elasticsearch domain iotfleet , version is 53
Iotfleet Automated snapshot hour (UTC): 0
Iotfleet Instance configuration: 1 instances; type:m3.medium.elasticsearch
Iotfleet Instance storage definition is: 4 GB; free storage calced to: 819.2 MB
iotfleet Desired free storage set to (in MB): 819.2
iotfleet WARNING: Not using VPC Endpoint
iotfleet WARNING: Does not have Zone Awareness enabled
iotfleet WARNING: Instance count is ODD. Best practice is for an even number of data nodes and zone awareness.
iotfleet WARNING: Does not have Dedicated Masters.
iotfleet WARNING: Neither index nor search slow logs are enabled.
iotfleet WARNING: EBS not in use. Using instance storage only.
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-ClusterStatus.yellow-Alarm ClusterStatus.yellow
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-ClusterStatus.red-Alarm ClusterStatus.red
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-CPUUtilization-Alarm CPUUtilization
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-JVMMemoryPressure-Alarm JVMMemoryPressure
iotfleet WARNING: Missing alarm!! ('ClusterIndexWritesBlocked', 'Maximum', 60, 5, 'GreaterThanOrEqualToThreshold', 1.0)
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-AutomatedSnapshotFailure-Alarm AutomatedSnapshotFailure
iotfleet Alarm: Threshold does not match: Test-Elasticsearch-iotfleet-FreeStorageSpace-Alarm Should be: 819.2 ; is 3000.0
The output messages fall into the following categories:
System overview, Informational: The Amazon ES version and configuration, including instance type and number, storage, automated snapshot hour, etc.
Free storage: A calculation for the appropriate amount of free storage, based on the recommended 20% of total storage.
Warnings: best practices that are not being followed for this domain. (For more about this, read on.)
Alarms: An assessment of the CloudWatch alarms currently set for this domain, against a recommended set.
The script contains an array of recommended CloudWatch alarms, based on best practices for these metrics and statistics. Using the array allows alarm parameters (such as free space) to be updated within the code based on current domain statistics and configurations.
For a given domain, the script checks if each alarm has been set. If the alarm is set, it checks whether the values match those in the array esAlarms. In the output above, you can see three different situations being reported:
Alarm ok; definition matches. The alarm set for the domain matches the settings in the array.
Alarm: Threshold does not match. An alarm exists, but the threshold value at which the alarm is triggered does not match.
WARNING: Missing alarm!! The recommended alarm is missing.
All in all, the list above shows that this domain does not have a configuration that adheres to best practices, nor does it have all the recommended alarms.
Setting up alarms
Now that you know that the domains in their current state are missing critical alarms, you can correct the situation.
To demonstrate the script, set up a new domain named “ver”, in us-west-2. Specify 1 node, and a 10-GB EBS disk. Also, create an SNS topic in us-west-2 with a name of “sendnotification”, which sends you an email.
Run the second script, es-create-cwalarms.py, from the command line. This script creates (or updates) the desired CloudWatch alarms for the specified Amazon ES domain, “ver”.
python es-create-cwalarms.py -r us-west-2 -e test -c ver -n "['arn:aws:sns:us-west-2:xxxxxxxxxx:sendnotification']"
EBS enabled: True type: gp2 size (GB): 10 No Iops 10240 total storage (MB)
Desired free storage set to (in MB): 2048.0
Successfully finished creating alarms!
As with the first script, this script contains an array of recommended CloudWatch alarms, based on best practices for these metrics and statistics. This approach allows you to add or modify alarms based on your use case (more on that below).
After running the script, navigate to Alarms on the CloudWatch console. You can see the set of alarms set up on your domain.
Because the “ver” domain has only a single node, cluster status is yellow, and that alarm is in an “ALARM” state. It’s already sent a notification that the alarm has been triggered.
In most cases, the alarm triggers due to an increased workload. The likely action is to reconfigure the system to handle the increased workload, rather than reducing the incoming workload. Reconfiguring any backend store—a category of systems that includes Elasticsearch—is best performed when the system is quiescent or lightly loaded. Reconfigurations such as setting zone awareness or modifying the disk type cause Amazon ES to enter a “processing” state, potentially disrupting client access.
Other changes, such as increasing the number of data nodes, may cause Elasticsearch to begin moving shards, potentially impacting search performance on these shards while this is happening. These actions should be considered in the context of your production usage. For the same reason I also do not recommend running a script that resets all domains to match best practices.
Avoid the need to reconfigure during heavy workload by setting alarms at a level that allows a considered approach to making the needed changes. For example, if you identify that each weekly peak is increasing, you can reconfigure during a weekly quiet period.
While Elasticsearch can be reconfigured without being quiesced, it is not a best practice to automatically scale it up and down based on usage patterns. Unlike some other AWS services, I recommend against setting a CloudWatch action that automatically reconfigures the system when alarms are triggered.
There are other situations where the planned reconfiguration approach may not work, such as low or zero free disk space causing the domain to reject writes. If the business is dependent on the domain continuing to accept incoming writes and deleting data is not an option, the team may choose to reconfigure immediately.
Extensions and adaptations
You may wish to modify the best practices encoded in the scripts for your own environment or workloads. It’s always better to avoid situations where alerts are generated but routinely ignored. All alerts should trigger a review and one or more actions, either immediately or at a planned date. The following is a list of common situations where you may wish to set different alarms for different domains:
Dev/test vs. production You may have a different set of configuration rules and alarms for your dev environment configurations than for test. For example, you may require zone awareness and dedicated masters for your production environment, but not for your development domains. Or, you may not have any alarms set in dev. For test environments that mirror your potential peak load, test to ensure that the alarms are appropriately triggered.
Differing workloads or SLAs for different domains You may have one domain with a requirement for superfast search performance, and another domain with a heavy ingest load that tolerates slower search response. Your reaction to slow response for these two workloads is likely to be different, so perhaps the thresholds for these two domains should be set at a different level. In this case, you might add a “max CPU utilization” alarm at 100% for 1 minute for the fast search domain, while the other domain only triggers an alarm when the average has been higher than 60% for 5 minutes. You might also add a “free space” rule with a higher threshold to reflect the need for more space for the heavy ingest load if there is danger that it could fill the available disk quickly.
“Normal” alarms versus “emergency” alarms If, for example, free disk space drops to 25% of total capacity, an alarm is triggered that indicates action should be taken as soon as possible, such as cleaning up old indexes or reconfiguring at the next quiet period for this domain. However, if free space drops below a critical level (20% free space), action must be taken immediately in order to prevent Amazon ES from setting the domain to read-only. Similarly, if the “ClusterIndexWritesBlocked” alarm triggers, the domain has already stopped accepting writes, so immediate action is needed. In this case, you may wish to set “laddered” alarms, where one threshold causes an alarm to be triggered to review the current workload for a planned reconfiguration, but a different threshold raises a “DefCon 3” alarm that immediate action is required.
The sample scripts provided here are a starting point, intended for you to adapt to your own environment and needs.
Running the scripts one time can identify how far your current state is from your desired state, and create an initial set of alarms. Regularly re-running these scripts can capture changes in your environment over time and adjusting your alarms for changes in your environment and configurations. One customer has set them up to run nightly, and to automatically create and update alarms to match their preferred settings.
Removing unwanted alarms
Each CloudWatch alarm costs approximately $0.10 per month. You can remove unwanted alarms in the CloudWatch console, under Alarms. If you set up a “ver” domain above, remember to remove it to avoid continuing charges.
Setting CloudWatch alarms appropriately for your Amazon ES domains can help you avoid suboptimal performance and allow you to respond to workload growth or configuration issues well before they become urgent. This post gives you a starting point for doing so. The additional sleep you’ll get knowing you don’t need to be concerned about Elasticsearch domain performance will allow you to focus on building creative solutions for your business and solving problems for your customers.
Dr. Veronika Megler is a senior consultant at Amazon Web Services. She works with our customers to implement innovative big data, AI and ML projects, helping them accelerate their time-to-value when using AWS.
The OpenSSL project has announced a number of changes to how the project is developed. These include shutting down the openssl-dev mailing list in favor of discussing all patches on GitHub and the addition of a new, read-only (for the world) openssl-project list. “We are changing our release schedule so that unless there are extenuating circumstances, security releases will go out on a Tuesday, with the pre-notification being the previous Tuesday. We don’t see a need to have people ready to sacrifice their weekend every time a new CVE comes out.”
In this day and age, it is almost impossible to do without our mobile devices and the applications that help make our lives easier. As our dependency on our mobile phone grows, the mobile application market has exploded with millions of apps vying for our attention. For mobile developers, this means that we must ensure that we build applications that provide the quality, real-time experiences that app users desire. Therefore, it has become essential that mobile applications are developed to include features such as multi-user data synchronization, offline network support, and data discovery, just to name a few. According to several articles, I read recently about mobile development trends on publications like InfoQ, DZone, and the mobile development blog AlleviateTech, one of the key elements in of delivering the aforementioned capabilities is with cloud-driven mobile applications. It seems that this is especially true, as it related to mobile data synchronization and data storage.
That being the case, it is a perfect time for me to announce a new service for building innovative mobile applications that are driven by data-intensive services in the cloud; AWS AppSync. AWS AppSync is a fully managed serverless GraphQL service for real-time data queries, synchronization, communications and offline programming features. For those not familiar, let me briefly share some information about the open GraphQL specification. GraphQL is a responsive data query language and server-side runtime for querying data sources that allow for real-time data retrieval and dynamic query execution. You can use GraphQL to build a responsive API for use in when building client applications. GraphQL works at the application layer and provides a type system for defining schemas. These schemas serve as specifications to define how operations should be performed on the data and how the data should be structured when retrieved. Additionally, GraphQL has a declarative coding model which is supported by many client libraries and frameworks including React, React Native, iOS, and Android.
Now the power of the GraphQL open standard query language is being brought to you in a rich managed service with AWS AppSync. With AppSync developers can simplify the retrieval and manipulation of data across multiple data sources with ease, allowing them to quickly prototype, build and create robust, collaborative, multi-user applications. AppSync keeps data updated when devices are connected, but enables developers to build solutions that work offline by caching data locally and synchronizing local data when connections become available.
Let’s discuss some key concepts of AWS AppSync and how the service works.
AWS AppSync Client: service client that defines operations, wraps authorization details of requests, and manage offline logic.
Data Source: the data storage system or a trigger housing data
Identity: a set of credentials with permissions and identification context provided with requests to GraphQL proxy
GraphQL Proxy: the GraphQL engine component for processing and mapping requests, handling conflict resolution, and managing Fine Grained Access Control
Operation: one of three GraphQL operations supported in AppSync
Query: a read-only fetch call to the data
Mutation: a write of the data followed by a fetch,
Subscription: long-lived connections that receive data in response to events.
Action: a notification to connected subscribers from a GraphQL subscription.
Resolver: function using request and response mapping templates that converts and executes payload against data source
How It Works
A schema is created to define types and capabilities of the desired GraphQL API and tied to a Resolver function. The schema can be created to mirror existing data sources or AWS AppSync can create tables automatically based the schema definition. Developers can also use GraphQL features for data discovery without having knowledge of the backend data sources. After a schema definition is established, an AWS AppSync client can be configured with an operation request, like a Query operation. The client submits the operation request to GraphQL Proxy along with an identity context and credentials. The GraphQL Proxy passes this request to the Resolver which maps and executes the request payload against pre-configured AWS data services like an Amazon DynamoDB table, an AWS Lambda function, or a search capability using Amazon Elasticsearch. The Resolver executes calls to one or all of these services within a single network call minimizing CPU cycles and bandwidth needs and returns the response to the client. Additionally, the client application can change data requirements in code on demand and the AppSync GraphQL API will dynamically map requests for data accordingly, allowing prototyping and faster development.
In order to take a quick peek at the service, I’ll go to the AWS AppSync console. I’ll click the Create API button to get started.
When the Create new API screen opens, I’ll give my new API a name, TarasTestApp, and since I am just exploring the new service I will select the Sample schema option. You may notice from the informational dialog box on the screen that in using the sample schema, AWS AppSync will automatically create the DynamoDB tables and the IAM roles for me.It will also deploy the TarasTestApp API on my behalf. After review of the sample schema provided by the console, I’ll click the Create button to create my test API.
After the TaraTestApp API has been created and the associated AWS resources provisioned on my behalf, I can make updates to the schema, data source, or connect my data source(s) to a resolver. I also can integrate my GraphQL API into an iOS, Android, Web, or React Native application by cloning the sample repo from GitHub and downloading the accompanying GraphQL schema. These application samples are great to help get you started and they are pre-configured to function in offline scenarios.
If I select the Schema menu option on the console, I can update and view the TarasTestApp GraphQL API schema.
Additionally, if I select the Data Sources menu option in the console, I can see the existing data sources. Within this screen, I can update, delete, or add data sources if I so desire.
Next, I will select the Query menu option which takes me to the console tool for writing and testing queries. Since I chose the sample schema and the AWS AppSync service did most of the heavy lifting for me, I’ll try a query against my new GraphQL API.
I’ll use a mutation to add data for the event type in my schema. Since this is a mutation and it first writes data and then does a read of the data, I want the query to return values for name and where.
If I go to the DynamoDB table created for the event type in the schema, I will see that the values from my query have been successfully written into the table. Now that was a pretty simple task to write and retrieve data based on a GraphQL API schema from a data source, don’t you think.
TL;DR: you may now configure systemd to dynamically allocate a UNIX user ID for service processes when it starts them and release it when it stops them. It’s pretty secure, mixes well with transient services, socket activated services and service templating.
Today we released systemd 235. Among other improvements this greatly extends the dynamic user logic of systemd. Dynamic users are a powerful but little known concept, supported in its basic form since systemd 232. With this blog story I hope to make it a bit better known.
The UNIX user concept is the most basic and well-understood security concept in POSIX operating systems. It is UNIX/POSIX’ primary security concept, the one everybody can agree on, and most security concepts that came after it (such as process capabilities, SELinux and other MACs, user name-spaces, …) in some form or another build on it, extend it or at least interface with it. If you build a Linux kernel with all security features turned off, the user concept is pretty much the one you’ll still retain.
Originally, the user concept was introduced to make multi-user systems a reality, i.e. systems enabling multiple human users to share the same system at the same time, cleanly separating their resources and protecting them from each other. The majority of today’s UNIX systems don’t really use the user concept like that anymore though. Most of today’s systems probably have only one actual human user (or even less!), but their user databases (/etc/passwd) list a good number more entries than that. Today, the majority of UNIX users in most environments are system users, i.e. users that are not the technical representation of a human sitting in front of a PC anymore, but the security identity a system service — an executable program — runs as. Event though traditional, simultaneous multi-user systems slowly became less relevant, their ground-breaking basic concept became the cornerstone of UNIX security. The OS is nowadays partitioned into isolated services — and each service runs as its own system user, and thus within its own, minimal security context.
The people behind the Android OS realized the relevance of the UNIX user concept as the primary security concept on UNIX, and took its use even further: on Android not only system services take benefit of the UNIX user concept, but each UI app gets its own, individual user identity too — thus neatly separating app resources from each other, and protecting app processes from each other, too.
Back in the more traditional Linux world things are a bit less advanced in this area. Even though users are the quintessential UNIX security concept, allocation and management of system users is still a pretty limited, raw and static affair. In most cases, RPM or DEB package installation scripts allocate a fixed number of (usually one) system users when you install the package of a service that wants to take benefit of the user concept, and from that point on the system user remains allocated on the system and is never deallocated again, even if the package is later removed again. Most Linux distributions limit the number of system users to 1000 (which isn’t particularly a lot). Allocating a system user is hence expensive: the number of available users is limited, and there’s no defined way to dispose of them after use. If you make use of system users too liberally, you are very likely to run out of them sooner rather than later.
You may wonder why system users are generally not deallocated when the package that registered them is uninstalled from a system (at least on most distributions). The reason for that is one relevant property of the user concept (you might even want to call this a design flaw): user IDs are sticky to files (and other objects such as IPC objects). If a service running as a specific system user creates a file at some location, and is then terminated and its package and user removed, then the created file still belongs to the numeric ID (“UID”) the system user originally got assigned. When the next system user is allocated and — due to ID recycling — happens to get assigned the same numeric ID, then it will also gain access to the file, and that’s generally considered a problem, given that the file belonged to a potentially very different service once upon a time, and likely should not be readable or changeable by anything coming after it. Distributions hence tend to avoid UID recycling which means system users remain registered forever on a system after they have been allocated once.
The above is a description of the status quo ante. Let’s now focus on what systemd’s dynamic user concept brings to the table, to improve the situation.
Introducing Dynamic Users
With systemd dynamic users we hope to make make it easier and cheaper to allocate system users on-the-fly, thus substantially increasing the possible uses of this core UNIX security concept.
If you write a systemd service unit file, you may enable the dynamic user logic for it by setting the DynamicUser= option in its [Service] section to yes. If you do a system user is dynamically allocated the instant the service binary is invoked, and released again when the service terminates. The user is automatically allocated from the UID range 61184–65519, by looking for a so far unused UID.
Now you may wonder, how does this concept deal with the sticky user issue discussed above? In order to counter the problem, two strategies easily come to mind:
Prohibit the service from creating any files/directories or IPC objects
Automatically removing the files/directories or IPC objects the service created when it shuts down.
In systemd we implemented both strategies, but for different parts of the execution environment. Specifically:
Setting DynamicUser=yes implies ProtectSystem=strict and ProtectHome=read-only. These sand-boxing options turn off write access to pretty much the whole OS directory tree, with a few relevant exceptions, such as the API file systems /proc, /sys and so on, as well as /tmp and /var/tmp. (BTW: setting these two options on your regular services that do not use DynamicUser= is a good idea too, as it drastically reduces the exposure of the system to exploited services.)
Setting DynamicUser=yes implies PrivateTmp=yes. This option sets up /tmp and /var/tmp for the service in a way that it gets its own, disconnected version of these directories, that are not shared by other services, and whose life-cycle is bound to the service’s own life-cycle. Thus if the service goes down, the user is removed and all its temporary files and directories with it. (BTW: as above, consider setting this option for your regular services that do not use DynamicUser= too, it’s a great way to lock things down security-wise.)
Setting DynamicUser=yes implies RemoveIPC=yes. This option ensures that when the service goes down all SysV and POSIX IPC objects (shared memory, message queues, semaphores) owned by the service’s user are removed. Thus, the life-cycle of the IPC objects is bound to the life-cycle of the dynamic user and service, too. (BTW: yes, here too, consider using this in your regular services, too!)
With these four settings in effect, services with dynamic users are nicely sand-boxed. They cannot create files or directories, except in /tmp and /var/tmp, where they will be removed automatically when the service shuts down, as will any IPC objects created. Sticky ownership of files/directories and IPC objects is hence dealt with effectively.
The RuntimeDirectory= option may be used to open up a bit the sandbox to external programs. If you set it to a directory name of your choice, it will be created below /run when the service is started, and removed in its entirety when it is terminated. The ownership of the directory is assigned to the service’s dynamic user. This way, a dynamic user service can expose API interfaces (AF_UNIX sockets, …) to other services at a well-defined place and again bind the life-cycle of it to the service’s own run-time. Example: set RuntimeDirectory=foobar in your service, and watch how a directory /run/foobar appears at the moment you start the service, and disappears the moment you stop it again. (BTW: Much like the other settings discussed above, RuntimeDirectory= may be used outside of the DynamicUser= context too, and is a nice way to run any service with a properly owned, life-cycle-managed run-time directory.)
Of course, a service running in such an environment (although already very useful for many cases!), has a major limitation: it cannot leave persistent data around it can reuse on a later run. As pretty much the whole OS directory tree is read-only to it, there’s simply no place it could put the data that survives from one service invocation to the next.
With systemd 235 this limitation is removed: there are now three new settings: StateDirectory=, LogsDirectory= and CacheDirectory=. In many ways they operate like RuntimeDirectory=, but create sub-directories below /var/lib, /var/log and /var/cache, respectively. There’s one major difference beyond that however: directories created that way are persistent, they will survive the run-time cycle of a service, and thus may be used to store data that is supposed to stay around between invocations of the service.
Of course, the obvious question to ask now is: how do these three settings deal with the sticky file ownership problem?
For that we lifted a concept from container managers. Container managers have a very similar problem: each container and the host typically end up using a very similar set of numeric UIDs, and unless user name-spacing is deployed this means that host users might be able to access the data of specific containers that also have a user by the same numeric UID assigned, even though it actually refers to a very different identity in a different context. (Actually, it’s even worse than just getting access, due to the existence of setuid file bits, access might translate to privilege elevation.) The way container managers protect the container images from the host (and from each other to some level) is by placing the container trees below a boundary directory, with very restrictive access modes and ownership (0700 and root:root or so). A host user hence cannot take advantage of the files/directories of a container user of the same UID inside of a local container tree, simply because the boundary directory makes it impossible to even reference files in it. After all on UNIX, in order to get access to a specific path you need access to every single component of it.
How is that applied to dynamic user services? Let’s say StateDirectory=foobar is set for a service that has DynamicUser= turned off. The instant the service is started, /var/lib/foobar is created as state directory, owned by the service’s user and remains in existence when the service is stopped. If the same service now is run with DynamicUser= turned on, the implementation is slightly altered. Instead of a directory /var/lib/foobar a symbolic link by the same path is created (owned by root), pointing to /var/lib/private/foobar (the latter being owned by the service’s dynamic user). The /var/lib/private directory is created as boundary directory: it’s owned by root:root, and has a restrictive access mode of 0700. Both the symlink and the service’s state directory will survive the service’s life-cycle, but the state directory will remain, and continues to be owned by the now disposed dynamic UID — however it is protected from other host users (and other services which might get the same dynamic UID assigned due to UID recycling) by the boundary directory.
The obvious question to ask now is: but if the boundary directory prohibits access to the directory from unprivileged processes, how can the service itself which runs under its own dynamic UID access it anyway? This is achieved by invoking the service process in a slightly modified mount name-space: it will see most of the file hierarchy the same way as everything else on the system (modulo /tmp and /var/tmp as mentioned above), except for /var/lib/private, which is over-mounted with a read-only tmpfs file system instance, with a slightly more liberal access mode permitting the service read access. Inside of this tmpfs file system instance another mount is placed: a bind mount to the host’s real /var/lib/private/foobar directory, onto the same name. Putting this together these means that superficially everything looks the same and is available at the same place on the host and from inside the service, but two important changes have been made: the /var/lib/private boundary directory lost its restrictive character inside the service, and has been emptied of the state directories of any other service, thus making the protection complete. Note that the symlink /var/lib/foobar hides the fact that the boundary directory is used (making it little more than an implementation detail), as the directory is available this way under the same name as it would be if DynamicUser= was not used. Long story short: for the daemon and from the view from the host the indirection through /var/lib/private is mostly transparent.
This logic of course raises another question: what happens to the state directory if a dynamic user service is started with a state directory configured, gets UID X assigned on this first invocation, then terminates and is restarted and now gets UID Y assigned on the second invocation, with X ≠ Y? On the second invocation the directory — and all the files and directories below it — will still be owned by the original UID X so how could the second instance running as Y access it? Our way out is simple: systemd will recursively change the ownership of the directory and everything contained within it to UID Y before invoking the service’s executable.
Of course, such recursive ownership changing (chown()ing) of whole directory trees can become expensive (though according to my experiences, IRL and for most services it’s much cheaper than you might think), hence in order to optimize behavior in this regard, the allocation of dynamic UIDs has been tweaked in two ways to avoid the necessity to do this expensive operation in most cases: firstly, when a dynamic UID is allocated for a service an allocation loop is employed that starts out with a UID hashed from the service’s name. This means a service by the same name is likely to always use the same numeric UID. That means that a stable service name translates into a stable dynamic UID, and that means recursive file ownership adjustments can be skipped (of course, after validation). Secondly, if the configured state directory already exists, and is owned by a suitable currently unused dynamic UID, it’s preferably used above everything else, thus maximizing the chance we can avoid the chown()ing. (That all said, ultimately we have to face it, the currently available UID space of 4K+ is very small still, and conflicts are pretty likely sooner or later, thus a chown()ing has to be expected every now and then when this feature is used extensively).
Note that CacheDirectory= and LogsDirectory= work very similar to StateDirectory=. The only difference is that they manage directories below the /var/cache and /var/logs directories, and their boundary directory hence is /var/cache/private and /var/log/private, respectively.
So, after all this introduction, let’s have a look how this all can be put together. Here’s a trivial example:
# cat > /etc/systemd/system/dynamic-user-test.service <<EOF[Service]ExecStart=/usr/bin/sleep 4711DynamicUser=yes
# systemctl daemon-reload# systemctl start dynamic-user-test# systemctl status dynamic-user-test
Loaded: loaded (/etc/systemd/system/dynamic-user-test.service; static; vendor preset: disabled)
Active: active (running) since Fri 2017-10-06 13:12:25 CEST; 3s ago
Main PID: 2967(sleep)
Tasks: 1(limit: 4915)
└─2967 /usr/bin/sleep 4711
Okt 0613:12:25 sigma systemd: Started dynamic-user-test.service.
# ps -e -o pid,comm,user | grep 29672967 sleep dynamic-user-test
# id dynamic-user-testuid=64642(dynamic-user-test)gid=64642(dynamic-user-test)groups=64642(dynamic-user-test)# systemctl stop dynamic-user-test# id dynamic-user-test
id: ‘dynamic-user-test’: no such user
In this example, we create a unit file with DynamicUser= turned on, start it, check if it’s running correctly, have a look at the service process’ user (which is named like the service; systemd does this automatically if the service name is suitable as user name, and you didn’t configure any user name to use explicitly), stop the service and verify that the user ceased to exist too.
That’s already pretty cool. Let’s step it up a notch, by doing the same in an interactive transient service (for those who don’t know systemd well: a transient service is a service that is defined and started dynamically at run-time, for example via the systemd-run command from the shell. Think: run a service without having to write a unit file first):
# systemd-run – pty – property=DynamicUser=yes – property=StateDirectory=wuff /bin/sh
Running as unit: run-u15750.service
Press ^] three times within 1s to disconnect TTY.
sh-4.4$ ls -al /var/lib/private/
drwxr-xr-x. 3 root root 606. Okt 13:21 .
drwxr-xr-x. 1 root root 8526. Okt 13:21 ..
drwxr-xr-x. 1 run-u15750 run-u15750 86. Okt 13:22 wuff
sh-4.4$ ls -ld /var/lib/wuff
lrwxrwxrwx. 1 root root 126. Okt 13:21 /var/lib/wuff -> private/wuff
sh-4.4$ ls -ld /var/lib/wuff/
drwxr-xr-x. 1 run-u15750 run-u15750 06. Okt 13:21 /var/lib/wuff/
sh-4.4$ echo hello > /var/lib/wuff/test
sh-4.4$ exitexit# id run-u15750
id: ‘run-u15750’: no such user
# ls -al /var/lib/private
drwx------. 1 root root 666. Okt 13:21 .
drwxr-xr-x. 1 root root 8526. Okt 13:21 ..
drwxr-xr-x. 1631226312286. Okt 13:22 wuff
# ls -ld /var/lib/wuff
lrwxrwxrwx. 1 root root 126. Okt 13:21 /var/lib/wuff -> private/wuff
# ls -ld /var/lib/wuff/
drwxr-xr-x. 1631226312286. Okt 13:22 /var/lib/wuff/
# cat /var/lib/wuff/test
The above invokes an interactive shell as transient service run-u15750.service (systemd-run picked that name automatically, since we didn’t specify anything explicitly) with a dynamic user whose name is derived automatically from the service name. Because StateDirectory=wuff is used, a persistent state directory for the service is made available as /var/lib/wuff. In the interactive shell running inside the service, the ls commands show the /var/lib/private boundary directory and its contents, as well as the symlink that is placed for the service. Finally, before exiting the shell, a file is created in the state directory. Back in the original command shell we check if the user is still allocated: it is not, of course, since the service ceased to exist when we exited the shell and with it the dynamic user associated with it. From the host we check the state directory of the service, with similar commands as we did from inside of it. We see that things are set up pretty much the same way in both cases, except for two things: first of all the user/group of the files is now shown as raw numeric UIDs instead of the user/group names derived from the unit name. That’s because the user ceased to exist at this point, and “ls” shows the raw UID for files owned by users that don’t exist. Secondly, the access mode of the boundary directory is different: when we look at it from outside of the service it is not readable by anyone but root, when we looked from inside we saw it it being world readable.
Now, let’s see how things look if we start another transient service, reusing the state directory from the first invocation:
# systemd-run – pty – property=DynamicUser=yes – property=StateDirectory=wuff /bin/sh
Running as unit: run-u16087.service
Press ^] three times within 1s to disconnect TTY.
sh-4.4$ cat /var/lib/wuff/test
sh-4.4$ ls -al /var/lib/wuff/
drwxr-xr-x. 1 run-u16087 run-u16087 86. Okt 13:22 .
drwxr-xr-x. 3 root root 606. Okt 15:42 ..
-rw-r--r--. 1 run-u16087 run-u16087 66. Okt 13:22 test
Here, systemd-run picked a different auto-generated unit name, but the used dynamic UID is still the same, as it was read from the pre-existing state directory, and was otherwise unused. As we can see the test file we generated earlier is accessible and still contains the data we left in there. Do note that the user name is different this time (as it is derived from the unit name, which is different), but the UID it is assigned to is the same one as on the first invocation. We can thus see that the mentioned optimization of the UID allocation logic (i.e. that we start the allocation loop from the UID owner of any existing state directory) took effect, so that no recursive chown()ing was required.
And that’s the end of our example, which hopefully illustrated a bit how this concept and implementation works.
Now that we had a look at how to enable this logic for a unit and how it is implemented, let’s discuss where this actually could be useful in real life.
One major benefit of dynamic user IDs is that running a privilege-separated service leaves no artifacts in the system. A system user is allocated and made use of, but it is discarded automatically in a safe and secure way after use, in a fashion that is safe for later recycling. Thus, quickly invoking a short-lived service for processing some job can be protected properly through a user ID without having to pre-allocate it and without this draining the available UID pool any longer than necessary.
In many cases, starting a service no longer requires package-specific preparation. Or in other words, quite often useradd/mkdir/chown/chmod invocations in “post-inst” package scripts, as well as sysusers.d and tmpfiles.d drop-ins become unnecessary, as the DynamicUser= and StateDirectory=/CacheDirectory=/LogsDirectory= logic can do the necessary work automatically, on-demand and with a well-defined life-cycle.
By combining dynamic user IDs with the transient unit concept, new creative ways of sand-boxing are made available. For example, let’s say you don’t trust the correct implementation of the sort command. You can now lock it into a simple, robust, dynamic UID sandbox with a simple systemd-run and still integrate it into a shell pipeline like any other command. Here’s an example, showcasing a shell pipeline whose middle element runs as a dynamically on-the-fly allocated UID, that is released when the pipelines ends.
By combining dynamic user IDs with the systemd templating logic it is now possible to do much more fine-grained and fully automatic UID management. For example, let’s say you have a template unit file /etc/systemd/system/[email protected]:
And you are done. (Invoke this as many times as you like, each time replacing customerxyz by some customer identifier, you get the idea.)
By combining dynamic user IDs with socket activation you may easily implement a system where each incoming connection is served by a process instance running as a different, fresh, newly allocated UID within its own sandbox. Here’s an example waldo.socket:
With the two unit files above, systemd will listen on TCP/IP port 2048, and for each incoming connection invoke a fresh instance of [email protected], each time utilizing a different, new, dynamically allocated UID, neatly isolated from any other instance.
Dynamic user IDs combine very well with state-less systems, i.e. systems that come up with an unpopulated /etc and /var. A service using dynamic user IDs and the StateDirectory=, CacheDirectory=, LogsDirectory= and RuntimeDirectory= concepts will implicitly allocate the users and directories it needs for running, right at the moment where it needs it.
Dynamic users are a very generic concept, hence a multitude of other uses are thinkable; the list above is just supposed to trigger your imagination.
What does this mean for you as a packager?
I am pretty sure that a large number of services shipped with today’s distributions could benefit from using DynamicUser= and StateDirectory= (and related settings). It often allows removal of post-inst packaging scripts altogether, as well as any sysusers.d and tmpfiles.d drop-ins by unifying the needed declarations in the unit file itself. Hence, as a packager please consider switching your unit files over. That said, there are a number of conditions where DynamicUser= and StateDirectory= (and friends) cannot or should not be used. To name a few:
Service that need to write to files outside of /run/<package>, /var/lib/<package>, /var/cache/<package>, /var/log/<package>, /var/tmp, /tmp, /dev/shm are generally incompatible with this scheme. This rules out daemons that upgrade the system as one example, as that involves writing to /usr.
Services that maintain a herd of processes with different user IDs. Some SMTP services are like this. If your service has such a super-server design, UID management needs to be done by the super-server itself, which rules out systemd doing its dynamic UID magic for it.
Services which run as root (obviously…) or are otherwise privileged.
Services that need to live in the same mount name-space as the host system (for example, because they want to establish mount points visible system-wide). As mentioned DynamicUser= implies ProtectSystem=, PrivateTmp= and related options, which all require the service to run in its own mount name-space.
Your focus is older distributions, i.e. distributions that do not have systemd 232 (for DynamicUser=) or systemd 235 (for StateDirectory= and friends) yet.
If your distribution’s packaging guides don’t allow it. Consult your packaging guides, and possibly start a discussion on your distribution’s mailing list about this.
A couple of additional, random notes about the implementation and use of these features:
Do note that allocating or deallocating a dynamic user leaves /etc/passwd untouched. A dynamic user is added into the user database through the glibc NSS module nss-systemd, and this information never hits the disk.
On traditional UNIX systems it was the job of the daemon process itself to drop privileges, while the DynamicUser= concept is designed around the service manager (i.e. systemd) being responsible for that. That said, since v235 there’s a way to marry DynamicUser= and such services which want to drop privileges on their own. For that, turn on DynamicUser= and set User= to the user name the service wants to setuid() to. This has the effect that systemd will allocate the dynamic user under the specified name when the service is started. Then, prefix the command line you specify in ExecStart= with a single ! character. If you do, the user is allocated for the service, but the daemon binary is is invoked as root instead of the allocated user, under the assumption that the daemon changes its UID on its own the right way. Not that after registration the user will show up instantly in the user database, and is hence resolvable like any other by the daemon process. Example: ExecStart=!/usr/bin/mydaemond
You may wonder why systemd uses the UID range 61184–65519 for its dynamic user allocations (side note: in hexadecimal this reads as 0xEF00–0xFFEF). That’s because distributions (specifically Fedora) tend to allocate regular users from below the 60000 range, and we don’t want to step into that. We also want to stay away from 65535 and a bit around it, as some of these UIDs have special meanings (65535 is often used as special value for “invalid” or “no” UID, as it is identical to the 16bit value -1; 65534 is generally mapped to the “nobody” user, and is where some kernel subsystems map unmappable UIDs). Finally, we want to stay within the 16bit range. In a user name-spacing world each container tends to have much less than the full 32bit UID range available that Linux kernels theoretically provide. Everybody apparently can agree that a container should at least cover the 16bit range though — already to include a nobody user. (And quite frankly, I am pretty sure assigning 64K UIDs per container is nicely systematic, as the the higher 16bit of the 32bit UID values this way become a container ID, while the lower 16bit become the logical UID within each container, if you still follow what I am babbling here…). And before you ask: no this range cannot be changed right now, it’s compiled in. We might change that eventually however.
You might wonder what happens if you already used UIDs from the 61184–65519 range on your system for other purposes. systemd should handle that mostly fine, as long as that usage is properly registered in the user database: when allocating a dynamic user we pick a UID, see if it is currently used somehow, and if yes pick a different one, until we find a free one. Whether a UID is used right now or not is checked through NSS calls. Moreover the IPC object lists are checked to see if there are any objects owned by the UID we are about to pick. This means systemd will avoid using UIDs you have assigned otherwise. Note however that this of course makes the pool of available UIDs smaller, and in the worst cases this means that allocating a dynamic user might fail because there simply are no unused UIDs in the range.
If not specified otherwise the name for a dynamically allocated user is derived from the service name. Not everything that’s valid in a service name is valid in a user-name however, and in some cases a randomized name is used instead to deal with this. Often it makes sense to pick the user names to register explicitly. For that use User= and choose whatever you like.
If you pick a user name with User= and combine it with DynamicUser= and the user already exists statically it will be used for the service and the dynamic user logic is automatically disabled. This permits automatic up- and downgrades between static and dynamic UIDs. For example, it provides a nice way to move a system from static to dynamic UIDs in a compatible way: as long as you select the same User= value before and after switching DynamicUser= on, the service will continue to use the statically allocated user if it exists, and only operates in the dynamic mode if it does not. This is useful for other cases as well, for example to adapt a service that normally would use a dynamic user to concepts that require statically assigned UIDs, for example to marry classic UID-based file system quota with such services.
systemd always allocates a pair of dynamic UID and GID at the same time, with the same numeric ID.
If the Linux kernel had a “shiftfs” or similar functionality, i.e. a way to mount an existing directory to a second place, but map the exposed UIDs/GIDs in some way configurable at mount time, this would be excellent for the implementation of StateDirectory= in conjunction with DynamicUser=. It would make the recursive chown()ing step unnecessary, as the host version of the state directory could simply be mounted into a the service’s mount name-space, with a shift applied that maps the directory’s owner to the services’ UID/GID. But I don’t have high hopes in this regard, as all work being done in this area appears to be bound to user name-spacing — which is a concept not used here (and I guess one could say user name-spacing is probably more a source of problems than a solution to one, but you are welcome to disagree on that).
For passive projects such as point-of-sale displays, video loopers, and your upcoming Halloween builds, Adafruit have come up with a read-only solution for powering down your Raspberry Pi without endangering your SD card.
Pulling the plug
At home, at a coding club, or at a Jam, you rarely need to pull the plug on your Raspberry Pi without going through the correct shutdown procedure. To ensure a long life for your SD card and its contents, you should always turn off you Pi by selecting the shutdown option from the menu. This way the Pi saves any temporary files to the card before relinquishing power.
By pulling the plug while your OS is still running, you might corrupt these files, which could result in the Pi failing to boot up again. The only fix? Wipe the SD card clean and start over, waving goodbye to all files you didn’t back up.
But what if it’s not as easy as selecting shutdown, because your Raspberry Pi is embedded deep inside the belly of a project? Maybe you’ve hot-glued your Zero W into a pumpkin which is now screwed to the roof of your porch, or your store has a bank of Pi-powered monitors playing ads and the power is set to shut off every evening. Without the ability to shut down your Pi via the menu, you risk the SD card’s contents every time you power down your project.
Just in time of the plethora of Halloween projects we’re looking forward to this month, the clever folk at Adafruit have designed a solution for this issue. They’ve shared a script which forces the Raspberry Pi to run in read-only mode, so that powering it down via a plug pull will not corrupt the SD card.
The script makes the Pi save temporary files to the RAM instead of the SD card. Of course, this means that no files or new software can be written to the card. However, if that’s not necessary for your Pi project, you might be happy to make the trade-off. Note that you can only use Adafruit’s script on Raspbian Lite.
Find more about the read-only Raspberry Pi solution, including the script and optional GPIO-halt utility, on the Adafruit Learn page. And be aware that making your Pi read-only is irreversible, so be sure to back up the contents of your SD card before you implement the script.
It’s October, and we’re now allowed to get excited about Halloween and all of the wonderful projects you plan on making for the big night.
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.