Security updates for Wednesday

2022-01-26

Post Syndicated from original https://lwn.net/Articles/882724/rss

Security updates have been issued by CentOS (httpd), Debian (libxfont, lrzsz, nss, openjdk-17, policykit-1, webkit2gtk, and wpewebkit), Mageia (polkit), openSUSE (expat, json-c, kernel, polkit, qemu, rust1.55, rust1.57, thunderbird, unbound, and webkit2gtk3), Oracle (httpd:2.4, java-11-openjdk, and polkit), Red Hat (httpd:2.4, OpenShift Container Platform 3.11.570, polkit, and Red Hat OpenStack Platform 16.1 (etcd)), Scientific Linux (polkit), Slackware (polkit), SUSE (aide, expat, firefox, json-c, kernel, polkit, qemu, rust, rust1.55, rust1.57, thunderbird, unbound, and webkit2gtk3), and Ubuntu (policykit-1 and xorg-server).

How Ransomware Is Changing US Federal Policy

2022-01-26 Harley Geiger

Post Syndicated from Harley Geiger original https://blog.rapid7.com/2022/01/26/how-ransomware-is-changing-us-federal-policy/

How Ransomware Is Changing US Federal Policy

In past decades, attackers breaching systems and stealing sensitive information prompted a wave of regulations focused on consumer privacy and breach notification. The current surge in ransomware attacks is prompting a new wave of action from policymakers. Unlike the more abstract harms threatened by breaches of personal information, ransomware will grind systems to a halt, suspending business and government operations and potentially threatening health and safety. One indication of the shift in awareness of this form of cybercrime is that President Biden addressed the ransomware threat multiple times in 2021.

The increased stakes of the ransomware threat are pushing regulators to take a harder look at whether regulatory requirements for cybersecurity safeguards are effective or if new requirements are needed to help combat the threat. The federal agencies are also stepping up their coordination on information sharing and incident reporting, and the Administration is growing its collaboration with international partners and the private sector. Let’s look at a few recent and ongoing initiatives.

Cybersecurity requirements for critical infrastructure

In March 2021, Secretary of Homeland Security Mayorkas announced a series of initiatives to strengthen cybersecurity for critical infrastructure, citing ransomware as a national security threat driving the effort. Less than two months later, the Colonial Pipeline ransomware event disrupted the East Coast fuel supply.

Not long after the Colonial attack, the Transportation Security Administration (TSA) exercised its authority to impose security regulations on the pipeline sector. Through two separate rules, TSA required pipeline operators to establish incident response and recovery plans, implement mitigation measures to protect against ransomware attacks, and undergo annual cybersecurity audits and architecture reviews, among other things.

In December 2021, TSA also issued new security regulations for the aviation, freight rail, and passenger rail sectors. The regulations require (among other things) reporting ransomware incidents to CISA and maintaining an incident response plan to detect, mitigate, and recover from ransomware attacks.

Ransomware is a key motivating factor in the sudden tightening of cybersecurity requirements. Previously, the cybersecurity regulations for pipelines were voluntary, with an accommodative relationship between pipeline operators and their regulators. Policymakers are increasingly voicing concern that other critical infrastructure sectors are in a similar position. With basic societal needs at risk when ransomware successfully disrupts critical infrastructure operations, some lawmakers are signaling openness to creating additional cybersecurity regulations for critical sectors.

OFAC sanctions

The federal government is also using its sanctions authority to head off ransomware payments. According to a recent FinCEN report, the average amount of reported ransomware transactions was approximately $100 million per month in 2021. These payments encourage more ransom-based attacks and fund other criminal activities.

The Office of Foreign Assets Control (OFAC) issued guidance warning that paying ransoms to sanctioned persons and organizations is in violation of sanctions regulations. Liability for these violations, OFAC notes, applies even if the person did not know that the ransomware payment was sent to a sanctioned entity.

Critics of this approach warn that applying sanctions to specific attacker groups is ineffective as the groups can simply rebrand or partner with other criminal elements to take payments. They add that sanctions imposed on payments does nothing but further victimize those organizations or individuals being attacked and remove their choices for recovery or force them underground. Ransomware is already grossly under-reported, and critics of sanctions warn that sanctions will likely encourage a lack of transparency.

More recently, OFAC also issued virtual currency guidance — aimed at currency companies, miners, exchanges, and users — emphasizing that the facilitation of ransomware payments to sanctioned entities is illegal. The guidance also describes best practices for assessing the risk of violating sanctions during transactions. In addition, OFAC imposed sanctions on a Russia-based cryptocurrency exchange for allegedly facilitating financial transactions for ransomware actors — the first sanctions of this kind.

OFAC followed up with an advisory on sanctions guidance for the virtual currency industry and applied sanctions on a cryptocurrency firm that was not doing its due diligence in preventing the facilitation of payments to ransomware criminal gangs.

Ransomware reporting

Requirements to report ransomware payments and ransomware-related incidents to federal authorities is another area to watch. Incident reporting requirements are in place for federal agencies and contractors via a Biden Administration Executive Order, but Congress is taking steps to expand these requirements to other private-sector entities.

Both the House of Representatives and the Senate have advanced legislation that would require businesses to report ransomware payments within 24 hours. The report would need to include the method of payment, instructions for making the payment, and other details to help federal investigators follow the payment flows and identify ransomware trends over time. The legislation would also require owners and operators of critical infrastructure to report substantial cybersecurity incidents (including a disruptive ransomware attack) within 72 hours. Interestingly, the legislation’s definition of “ransomware” encompasses all extortion-based attacks (such as the threat of DDoS), not just malware that locks system operations until a ransom is paid.

Although the House and Senate legislation cleared several hurdles, it did not pass Congress in 2021. However, we expect a renewed push for incident reporting, or other legislation to address ransomware, in 2022 and beyond.

A more collaborative, whole-of-government approach

The Biden Administration characterized ransomware as an economic and national security concern relatively early on and has detailed numerous federal efforts to counter it. We have also seen a marked increase in both international government and law enforcement cooperation, and public-private collaboration to identify, prosecute, and disrupt ransomware criminals, and address their safe harbors. In addition to the above, recent efforts have included:

In April 2021, the Department of Justice (DOJ) created a Digital Extortion Task Force, and in June elevated ransomware to be a priority on par with terrorism.
In June 2021, the US government attended the G7 Summit and discussed ransomware, making a commitment “to work together to urgently address the escalating shared threat from criminal ransomware networks.” They went on to “call on all states to urgently identify and disrupt ransomware criminal networks operating from within their borders, and hold those networks accountable for their actions.”
Also in June 2021, ransomware was discussed during the EU-US Justice and Home Affairs Ministerial Meeting, with commitments made to work together to combat “ransomware including through law enforcement action, raising public awareness on how to protect networks as well as the risk of paying the criminals responsible, and to encourage those states that turn a blind eye to this crime to arrest and extradite or effectively prosecute criminals on their territory.”
In August 2021, the Cybersecurity and Infrastructure Security Agency (CISA) announced the formation of the Joint Cyber Defense Collaborative (JCDC) to “integrate unique cyber capabilities across multiple federal agencies, many state and local governments, and countless private sector entities.”
In August 2021, the White House announced the voluntary Industrial Control System Cybersecurity Initiative to strengthen the resilience of critical infrastructure against ransomware.
In September 2021, NIST issued a ransomware risk management profile for its Cybersecurity Framework.
In October 2021, the White House hosted a Counter Ransomware Initiative Meeting, bringing together governments from 30 nations around the world “to discuss the escalating global security threat from ransomware” and identify potential solutions.
Also in October 2021, a group of international law enforcement agencies and private sector experts collaborated to force ransomware group REvil offline.
In November 2021, the US Department of Justice announced the arrest of three ransomware actors, charges against a fourth, and the “seizure of $6.1 million in funds traceable to alleged ransom payments.” It attributed these successes to “the culmination of close collaboration with our international, US government, and especially our private-sector partners.”
Collaboration by multiple federal agencies to produce the StopRansomware site, which provides basic resources on what ransomware is, how to reduce risks, and how to report an incident or request assistance.
Ongoing work of senior policymakers such as Deputy Attorney General Lisa Monaco, as well as federal agencies such as CISA and the FBI, to keep up a steady flow of timely alerts about the threat of ransomware and the need for public and private-sector collaboration to fight it.

Ransomware brings security center-stage

For years, it was arguable that most policymakers did not “get” the need for cybersecurity. Now the landscape has changed significantly, with ransomware and nation-state competition driving the renewed sense of urgency. Given the seriousness, persistence, and widespread nature of the ransomware threat, Rapid7 supports new measures to detect and mitigate these attacks. These trends do not seem likely to abate soon, and we expect regulatory activity and information sharing on cybersecurity to be driven by ransomware for some time to come.

Additional reading:

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Landscape of API Traffic

2022-01-26 Daniele Molteni

Post Syndicated from Daniele Molteni original https://blog.cloudflare.com/landscape-of-api-traffic/

Landscape of API Traffic

In recent years we have witnessed an explosion of Internet-connected applications. Whether it is a new mobile app to find your soulmate, the latest wearable to monitor your vitals, or an industrial solution to detect corrosion, our life is becoming packed with connected systems.

How is the Internet changing because of this shift? This blog provides an overview of how Internet traffic is evolving as Application Programming Interfaces (APIs) have taken the centre stage among the communication technologies. With help from the Cloudflare Radar team, we have harnessed the data from our global network to provide this snapshot of global APIs in 2021.

The huge growth in API traffic comes at a time when Cloudflare has been introducing new technologies that protect applications from nascent threats and vulnerabilities. The release of API Shield with API Discovery, Schema Validation, mTLS and API Abuse Detection has provided customers with a set of tools designed to protect their applications and data based on how APIs work and their challenges.

We are also witnessing increased adoption of new protocols. Among encryption protocols, for example, TLS v1.3 has become the most used protocol for APIs on Cloudflare while, for transport protocols, we saw an uptake of QUIC and gRPC (Cloudflare support announced in 2018 and 2020 respectively).

In the following sections we will quantify the growth of APIs and identify key industries affected by this shift. We will also look at the data to better understand the source and type of traffic we see on our network including how much malicious traffic our security systems block.

Why is API use exploding?

By working closely with our customers and observing the broader trends and data across our network in application security, we have identified three main trends behind API adoption: how applications are built is changing, API-first businesses are thriving, and finally machine-to-machine and human-to-machine communication is evolving.

During the last decade, APIs became popular because they allowed developers to separate backend and frontend, thus creating applications with better user experience. The Jamstack architecture is the most recent trend highlighting this movement, where technologies such as JavaScript, APIs and markup are being used to create responsive and high-performance applications. The growth of microservices and serverless architectures are other drivers behind using efficient HTTP-powered application interfaces.

APIs are also enabling companies to innovate their business models. Across many industries there is a trend of modularizing complex processes by integrating self-contained workflows and operations. The product has become the service delivered via APIs, allowing companies to scale and monetize their new capabilities. Financial Services is a prime example where a monolithic industry with vertically integrated service providers is giving way to a more fragmented landscape. The new Open Banking standard (PSD2) is an example of how small companies can provide modular financial services that can be easily integrated into larger applications. Companies like TrueLayer have productized APIs, allowing e-commerce organizations to onboard new sellers to a marketplace within seconds or to deliver more efficient payment options for their customers. A similar shift is happening in the logistics industry as well, where Shippo allows the same e-commerce companies to integrate with services to initiate deliveries, print labels, track goods and streamline the returns process. And of course, everything is powered by APIs.

Finally, the increase of connected devices such as wearables, sensors and robots are driving more APIs, but another aspect of this is the way manual and repetitive tasks are being automated. Infrastructure-as-Code is an example of relying on APIs to replace manual processes that have been used to manage Internet Infrastructure in the past. Cloudflare is itself a product of this trend as our solutions allow customers to use services like Terraform to configure how their infrastructure should work with our products.

Labelling traffic

The data presented in the following paragraphs is based on the total traffic proxied by Cloudflare and traffic is classified according to the Content-Type header generated in the response phase. Only requests returning a 200 response were included in the analysis except for the analysis in the ‘Security’ section where other error codes were included. Traffic generated by identified bots is not included.

When looking at trends, we compare data from the first week of February 2021 to the first week of December 2021. We chose these dates to compare how traffic changed over the year but excluding January which is affected by the holiday season.

Specifically, API traffic is labelled based on responses with types equal application/json, application/xml, and text/xml, while Web accounts for text/html, application/x-javascript, application/javascript, text/css, and text/javascript. Requests categorised as Text are text/plain; Binary are application/octet-stream; Media includes all image types, video and audio.

Finally, Other catches everything that doesn’t clearly fall into the labels above, which includes empty and unknown. Part of this traffic might be API and the categorisation might be missing due to the client or server not adding a Content-Type header.

API use in 2021

We begin by examining the current state of API traffic at our global network and the types of content served. During the first week of December 2021, API calls represented 54% of total requests, up from 52% during the first week of February 2021.

When looking at individual data types, API was by far the fastest growing data type (+21%) while Web only grew by 10%. Media (such as images and videos) grew just shy of 15% while binary was the only traffic that in aggregate experienced a reduction of 6%.

In summary, APIs have been one of the drivers of the traffic growth experienced by the Cloudflare network in 2021. APIs account for more than half of the total traffic generated by end users and connected devices, and they’re growing twice as fast as traditional web traffic.

New industries are contributing to this increase

We analysed where this growth comes from in terms of industry and application types. When looking at the total volume of API traffic, unsurprisingly the general Internet and Software industry accounts for almost 40% of total API traffic in 2021. The second-largest industry in terms of size is Cryptocurrency (7% of API traffic) followed by Banking and Retail (6% and 5% of API traffic respectively).

The following chart orders industries according to their API traffic growth. Banking, Retail and Financial Services have experienced the largest year-on-year growth with 70%, 51% and 50% increases since February 2021, respectively.

The growth of Banking and Financial Services traffic is aligned with the trends we have observed anecdotally in the sector. The industry has seen the entrance of a number of new platforms that aggregate accounts from different providers, streamline transactions, or allow investing directly from apps, all of which rely heavily on APIs. The new “challenger banks” movement is an example where newer startups are offering captivating mobile services based on APIs while putting pressure on larger institutions to modernise their infrastructure and applications.

A closer look at the API characteristics

Generally speaking, a RESTful API request is a call to invoke a function. It includes the address of a specific resource (the endpoint) and the action you want to perform on that resource (method). A payload might be present to carry additional data and HTTP headers might be populated to add information about the origin of the call, what software is requesting data, requisite authentication credentials, etc. The method (or verb) expresses the action you want to perform, such as retrieve information (GET) or update information (POST).

It’s useful to understand the composition and origin of API traffic, such as the most commonly used methods, the most common protocol used to encode the payload, or what service generates traffic (like Web, mobile apps, or IoT). This information will help us identify the macro source of vulnerabilities and design and deploy the best tools to protect traffic.

Methods

The vast majority of API traffic is the result of POST or GET requests (98% of all requests). POST itself accounts for 53.4% of all requests and GET 44.4%. Generally speaking, GET tends to transfer sensitive data in the HTTP request header, query and in the response body, while POST typically transfers data in the request header and body. While many security tools apply to both of these types of calls, this distinction can be useful when deploying tools such as API Schema Validation (request and response) or Data Loss Prevention/Sensitive Data Detection (response), both launched by Cloudflare in March 2021.

Payload encoding review

API payloads encode data using different rules and languages that are commonly referred to as transport protocols. When looking at the breakdown between two of the most common protocols, JSON has by far the largest number of requests (~97%) while XML has a smaller share of requests as it still carries the heaviest traffic. In the following figure, JSON and XML are compared in terms of response sizes. XML is the most verbose protocol and the one handling the largest payloads while JSON is more compact and results in smaller payloads.

Since we have started supporting gRPC (September 2020), we have seen a steady increase in gRPC traffic and many customers we speak with are in the planning stages of migrating from JSON to gRPC, or designing translation layers at the edge from external JSON callers to internal gRPC services.

Source of API traffic

We can look at the HTTP request headers to better understand the origin and intended use of the API. The User-Agent header allows us to identify what type of client made the call, and we can divide it into three broader groups: “browser”, “non-browser” and “unknown” (which indicates that the User-Agent header was not set).

About 38% of API calls are made by browsers as part of a web application built on top of backend APIs. Here, the browser loads an HTML page and populates dynamic fields by generating AJAX API calls against the backend service. This paradigm has become the de-facto standard as it provides an effective way to build dynamic yet flexible Web applications.

The next 56% comes from non-browsers, including mobile apps and IoT devices with a long tail of different types (wearables, connected sport equipment, gaming platforms and more). Finally, approximately 6% are “unknown” and since well-behaving browsers and tools like curl send a User-Agent by default, one could attribute much of this unknown to programmatic or automated tools, some of which could be malicious.

Encryption

A key aspect of securing APIs against snooping and tampering is encrypting the session. Clients use SSL/TLS to authenticate the server they are connecting with, for example, by making sure it is truly their cryptocurrency vendor. The benefit of transport layer encryption is that after handshaking, all application protocol bytes are encrypted, providing both confidentiality and integrity assurances.

Cloudflare launched the latest version of TLS (v1.3) in September 2016, and it was enabled by default on some properties in May 2018. When looking at API traffic today, TLS v1.3 is the most adopted protocol with 55.9% of traffic using it. The vulnerable v1.0 and v1.1 were deprecated in March 2021 and their use has virtually disappeared.

Transport security protocol	December 2021
TLS 1.3	55.9%
TLS 1.2	32.7%
QUIC	8.4%
None	2.8%
TLS 1.0	0.3%

The protocol that is growing fastest is QUIC. While QUIC can be used to carry many types of application protocols, Cloudflare has so far focused on HTTP/3, the mapping of HTTP over IETF QUIC. We started supporting draft versions of QUIC in 2018 and when QUIC version 1 was published as RFC 9000 in May 2021, we enabled it for everyone the next day. QUIC uses the TLS 1.3 handshake but has its own mechanism for protecting and securing packets. Looking at HTTP-based API traffic, we see HTTP/3 going from less than 3% in early February 2021 to more than 8% in December 2021. This growth broadly aligns RFC 9000 being published and during the periodHTTP/3 support being stabilized and enabled in a range of client implementations.

Mutual TLS, which is often used for mobile or IoT devices, accounts for 0.3% of total API traffic. Since we released the first version of mTLS in 2017 we’ve seen a growing number of inquiries from users across all Cloudflare plans, as we have recently made it easier for customers to start using mTLS with Cloudflare API Shield. Customers can now use Cloudflare dashboard to issue and manage certificates with one-click avoiding all the complexity of having to manage a Private Key Infrastructure and root certificates themselves.

Finally, unencrypted traffic can provide a great opportunity for attackers to access plain communications. The total unencrypted API traffic dropped from 4.6% of total requests in early 2021 to 2.6% in December 2021. This represents a significant step forward in establishing basic security for all API connections.

Security

Given the huge amount of traffic that Cloudflare handles every second, we can look for trends in blocked traffic and identify common patterns in threats or attacks.

When looking at the Cloudflare security systems, an HTML request is twice as likely to be blocked than an API request. Successful response codes (200, 201, 301 and 302) account for 91% of HTML and 97% of API requests, while 4XX error codes (like 400, 403, 404) are generated for 2.8% of API calls as opposed to 7% of HTML. Calls returning 5XXs codes (such as Internal Server Error, Bad Gateway, Service Unavailable) are almost nonexistent for APIs (less than 0.2% of calls) while are almost 2% of requests for HTML.

The relatively larger volume of unmitigated API requests can be explained by the automated nature of APIs, for example more API calls are generated in order to render a page that would require a single HTML request. Malicious or malformed requests are therefore diluted in a larger volume of calls generated by well-behaving automated systems.

We can further analyse the frequency of specific error codes to get a sense of what the most frequent malformed (and possibly malicious) requests are. In the following figure, we plot the share of a particular error code when compared to all 4XXs.

We can identify three groups of issues all equally likely (excluding the more obvious “404 Not Found” case): “400 Bad Request” (like malformed, invalid request), “429 Too Many Requests” (“Rate Limiting”), and the combination of Authentication and Authorization issues (“403 Forbidden” and “401 Unauthorized”). Those codes are followed by a long tail of other errors, including “422 Unprocessable Entity”, “409 Conflict”, and “402 Payment Required”.

This analysis confirms that the most common attacks rely on sending non-compliant requests, brute force efforts (24% of generated 4XXs are related to rate limiting), and accessing resources with invalid authentication or permission.

We can further analyse the reason why calls were blocked (especially relative to the 400s codes) by looking at what triggered the Cloudflare WAF. The OWASP and the Cloudflare Managed Ruleset are tools that scan incoming traffic looking for fingerprints of known vulnerabilities (such as SQLi, XSS, etc.) and they can provide context on what attack was detected.

A portion of the blocked traffic has triggered a managed rule for which we can identify the threat category. Although a malicious request can match multiple categories, the WAF assigns it to the first threat that is identified. User-Agent anomaly is the most common reason why traffic is blocked. This is usually triggered by the lack of or by a malformed User-Agent header, capturing requests that do not provide enough credible information on what type of client has sent the request. The next most common threat is cross-site scripting. After these two categories, there is a long tail of other anomalies that were identified.

Conclusions

More than one out of two requests we process is an API call, and industries such as Banking, Retail and Financial Services are leading in terms of adoption and growth.

Furthermore, API calls are growing twice as fast as HTML traffic, making it an ideal candidate for new security solutions aimed at protecting customer data.

The One day Republic: Carpatho-Ukraine

2022-01-26 The History Guy: History Deserves to Be Remembered

Post Syndicated from The History Guy: History Deserves to Be Remembered original https://www.youtube.com/watch?v=ccjKdCp9TZo

Goodbye, Pearl

2022-01-26 Eevee

Post Syndicated from Eevee original https://eev.ee/blog/2022/01/25/goodbye-pearl/

Pearl laying on carpet, bathed in a sunbeam that highlights her peach fuzz

A Chronicling of the Lyfe and Times of one Miss Pearl Twig Woods, who has Passed at a Young Age from Troubles of the Heart. She is survived by Anise, her Arch Nemesis; Cheeseball, her Adoptive Ruffian; and Napoleon, her Star-Crossed Suitor for Whom she Longed from Afar.

Pearl sits on a hat on the dining room table, regarding the camera with a slightly tilted head

Pearl settled across Ash's knees and arms, with one paw tucked in a way that looks like crossed arms

Pearl is… difficult to describe. She had such a strong, vibrant personality.

She was lovely, that’s for certain. She loved everyone she met. And while various people — friends, vets, etc. — have met our cats and always liked them, I don’t believe anyone has met Pearl and not adored her. Anise will check out your stuff and perhaps jump on you; Cheeseball will do antics for you and rub on your leg; but Pearl would accept you into her life and be very directly, personally affectionate with you specifically. She made you feel special.

At the same time, she was very fussy, very particular, and had a very strong sense of… her place in the world, I suppose. If she liked something, she would be having it. If she didn’t like something, she would make that exceptionally clear. She was never mean, but she would be very vocal about her boundaries.

It wasn’t uncommon to wake up to Pearl repeatedly headbutting me right in the face, pressing her head up under my chin, or giving me a nuzzle with the entire length of her body, purring all the while. If she was happy to see you, she made an entire production out of it. It wasn’t just us; guests who slept on the couch also got the Pearl wake-up call.

It was also not uncommon to wake up because Pearl had decided that she needed my pillow, and somehow this very small cat took up the entire thing. I couldn’t move her; trying to displace her from a comfortable spot would generally earn you a sad, offended meow, after which you felt guilty for even having entertained the notion in the first place.

One of her particular quirks was to often “bury” her food when she was done with it, or at least paw fruitlessly at nearby carpet. On its own, this is endearing but not unusual — burying leftovers is a common cat instinct, even if we’ve not seen it in our other cats. What made it a uniquely Pearl trait was that she would also perform this ritual if offered something she didn’t want at all. I laughed every time; it was such an audacious way to indicate utter disinterest. Take it away, please. Put it in a hole, if you would.

She got, more or less, everything she wanted. If she claimed a spot, everything about her expression and body language indicated it was clearly hers, even if that spot was your body. (Naturally, if you moved too much or even sneezed suddenly, she would tell you off for that too.) If she wanted to ride on your shoulders, that’s what would be happening. If it was time to feed her and she was too comfortable in her cat tree, well, we’d just have to hold the food up for her. She had a way of looking very pleased with herself that was impossible to argue with.

Pearl rests comfortably on a pillow, while Ash is asleep below it, having conceded it entirely to a small cat

Pearl sits right in the center of a beanbag that is vastly bigger than her, leaving no real space for anyone else

Pearl is held up in Ash's lap, making a very frumpy face, while Ash trims her claws

Pearl, a moment later, tries to lean her way out of Ash's lap

Pearl washes her face while laying across Ash's shoulders

Ash sits on a couch with a sketchbook in their lap and Pearl, wrapped in a blanket, under one arm, getting her chin rubbed

Pearl steps down over my shoulder, rubbing her whole body against my face as she goes, displacing my glasses

Pearl headbutts my nose so hard it bends to one side

I first met Pearl in 2014, shortly after we moved to Las Vegas. She was tiny, even for a kitten, and apparently the runt of her litter. I don’t remember what specifically compelled Ash to adopt another cat, except that they love cats, but what a selection.

I cannot stress enough how small she was. You know those solid wood desks that have a column of drawers built into them on one side? You know how they often have a little decorative shape carved out at the bottom with molded edges? Pearl could crawl into that space. I couldn’t believe it the first time I saw it; the gap is so short that I’d never even thought to categorize it as a space, let alone one a cat might enter, but she slunk into it like it was nothing. I was so worried we’d have to move the desk somehow to get her out, but she usually turned around and came right out again. I still remember the very last time she did it — I could tell she was having to shimmy a bit to fit in there, and she must have noticed too, because I never saw her even try it again.

The other cats had somewhat mixed feelings. Napoleon didn’t like her at all and hissed directly in her face, but… after that, I don’t remember any bad reactions from him at all, so I guess he warmed up quick. Anise did not seem to understand what a kitten was, tried to play with her, and then acted very confused when that didn’t seem to work. And Twigs…

Oh, Twigs. Twigs was jealous. He had always been Ash’s cat, he had made himself Ash’s cat, and he very quickly inferred that Pearl was a threat to his position. Another cat! In Ash’s lap! Unthinkable!

On one particular night Ash had barred Twigs from the bedroom to sleep with just Pearl, but came downstairs to visit the kitchen. Twigs ran up to them, looked them dead in the eye, and let out a huge sad wail to convey his feelings about the depths of this betrayal.

They let him into the bedroom after that, but he opted to sit across the room and stare daggers at Pearl, moving a little closer every half hour until he was on the far corner of the bed. Just staring.

Ash eventually had to bribe him by putting some cottage cheese on Pearl’s head, after which he decided Pearl was okay. Also he found out that he could fit himself in Ash’s lap alongside Pearl, so that probably helped.

Pearl, a very small sphynx kitten, sniffing the nose of a furred cat twice her height

Same pair as previous photo, but Napoleon is hissing and Pearl is flinching

Tiny kitten climbing up a single step of a flight of stairs, while our black sphynx Anise sits and watches

Anise straddles a couple higher steps and leans down to sniff Pearl

Anise lightly paps Pearl's head with one paw

Pearl yawns while Anise peeks around a corner at her from the top of the stairs

Pearl lays sleepily on Ash while Twigs sits on their desk and looks at her with a somewhat sour expression

Twigs licks some cottage cheese off of Pearl's head

Pearl lays outstretched alongside Ash's forearm, barely longer or wider than it

A very small Pearl settles on a slipper, which is very slightly bigger than her

Oh, and she loved to be cozy. She loved to be cozy. Sphynxes are naturally drawn to warmth, of course, but Pearl elevated it to an artform. If I’m propped up in bed, Anise might stand next to me to look at the covers expectantly, or he might just lay down nearby. Pearl would stand right on top of me and pull at the covers with impressive force until I lifted them for her, let her lay on my chest, and tucked her in.

We’d often find Pearl very awkwardly tucked under the edge of a blanket somewhere, having attempted to insert herself beneath it with mixed success. We described this as Pearl doing it all by herself, and complimented her on how talented she was, and then fixed the blanket for her.

Pearl lays tucked within a blanket, curled around her like a nest

Pearl is underneath a blanket but her entire back half is sticking out

Pearl lays beneath a single folded-over corner of an electric blanket, which barely covers her at all

We have heater vents in the floor, and one of her favorite pastimes was to sit on one of those, often covering the entire thing, and be gently toasted from below. Sometimes Anise would see what a great idea that was and try to share it and they would end up squabbling.

If there was a sunbeam to be found, Pearl would find it. Much like with vents, she didn’t like to share sunbeams, even if they were half the width of the room. She found it first, you see.

Other places she discovered that were lovely and toasty included: in front of the fridge where the warm air vented out from the bottom; straddling the PS4 so the fan blew onto her tummy; next to or underneath my laptop; on top of my computer case which has a fan vent on top; in front of the heat dish we got while our furnace wasn’t working; and in a laundry hamper full of freshly-dried laundry.

She liked to go outside, too, during the summer. All of our cats are indoor-only, but once in a while we’ll take the more well-behaved ones (not Cheeseball) into the backyard to wander around on the porch and sniff things and enjoy the sun and look at a bird.

Pearl sits atop a heater vent in the floor

Pearl sits in a small square sunbeam, roughly the size of her silhouette, in the middle of the floor

Pearl sits, paws folded, next to the vent at the bottom of the fridge

Pearl sits right in front of a heat dish

Pearl sits on top of my computer case with her front half mostly overlapping a fan vent

Pearl straddles a PS4, one back foot atop it while the rest of her is wedged behind it where the fan is

Pearl's head pokes out from within a basketful of laundry

Pearl sits on a table looking deeply content as the sun shines directly on her

Three cats look out through a sliding glass door; outside, in the distance, Pearl lies on sunny concrete

Pearl stands alongside the back fence of the yard, while Ash checks their phone nearby

Pearl sits on a scrap of garden fabric or something, bathed in a sunbeam

And I have never known a cat to be quite so comfortable. Perhaps Anise, on occasion, but he doesn’t have the raw talent Pearl was born with.

You could tell she was settling in if she tucked her paws in against her chest, something she always did quite deliberately and distinctly. But that was only the first stage of comfort. If you were lucky, she would stretch out one arm really far, perhaps to place her paw on you. As she dozed off she might lay flat on her side with her limbs outstretched, which meant we always had to check blankets carefully for a flattened Pearl before sitting down. And if you were really lucky, you might witness Pearl in a chaos configuration, upside-down with her paws wherever.

But even just sitting up with her eyes closed, she looked so content. Looking at photos makes me want to take a nap with her.

Pearl peeks out of a blanket with one arm outstretched and dangling a comically long distance downwards

Pearl's arm sticks out from under a blanket on the bed, while a peek of her face is barely visible

Pearl sleeps on her side on a blanket, fairly flattened out

Pearl lays in a funny pose tucked between two pillows, one of which Ash is sleeping on

Pearl makes a deeply contented expression

Pearl sleeps while flopped over someone's elbow

Pearl's head pokes out from under a blanket, upside-down

Pearl lays asleep on her back, head tucked against her shoulder, one paw outstretched towards the camera

Sadly, Pearl had some health troubles from the start. She had a kink right at the base of her tail from the day we got her, suggesting it had been injured while at the breeder and not healed right, so she was never able to raise her tail all the way. She also came home with some sort of intestinal parasite that gave her a lot of… um, gastric distress, and while we were able to clear that up quickly, it seemed to recur soon afterwards.

We took her to the vet again, suspecting more parasites, but multiple tests turned up nothing. We tried a number of things — different food, sensitive-stomach food, wet food, more water, different treats — but could not seem to figure it out, and so Pearl just had stomachaches on and off for a while. Sometimes she would sit by the litterbox and grumble, and all I could do was try to reassure her.

It wasn’t until a few years later that Ash’s then-husband, with no explanation whatsoever, spontaneously decided to just feed her some plain chicken mixed with pumpkin purée. Just like that, she was fine. I felt like kind of an idiot for not trying that earlier, but after giving her veterinary sensitive-stomach food and seeing no change, I thought we’d ruled out food sensitivity.

We swiftly outlined a general idea of what Pearl could or could not tolerate. Chicken, pork, pumpkin: OK. Beef or any kind of organs: she immediately threw up. Fish: no good. And yet manufactured food containing only very simple things still gave her stomachaches, so our best guess was that she also couldn’t tolerate fucking xantham gum or something, which is in pretty much all pet food, including the sensitive stuff.

Regardless, we had a diet she could stomach, so for the rest of her life we made her a custom diet of ground chicken, ground pork belly, pumpkin, and some nutrient powder that didn’t bother her (which took several attempts to find). That meant no more free-feeding the other cats, so we got a big dog cage to keep the kibble in, and we’d let the other cats in there while Pearl was eating her special princess food. Thus began a multi-year saga during which, every four hours, like clockwork, Anise would start bothering me to feed him.

Please do not tell me what I could have done to dissuade Anise or space out the schedule. I guarantee, he is vastly more dastardly and annoying than you are giving him credit for. The cats run this household, and I have long since made peace with that.

The closest to any real insight we got about Pearl was that perhaps her kitten parasites had left her with IBS — a very vague diagnosis of exclusion, and the best anyone could come up with. But Pearl was happy, so that was good enough. We eventually found new treats she could stomach, too.

Pearl eats some meat while three cats in a large dog cage stare at her

Pearl loafs in a sunbeam, right next to the empty cage

Four cats, one peeking over the rest from behind, crowd around Ash's dinner plate and look at it expectantly

Pearl stands against the back of an office chair with a scratchy texture and intensely rubs her chin and cheek against it

Pearl had relatively intense relationships with the other cats, much like she did with people.

She adored Napoleon, our furred and largest cat, for some reason. She often trotted up to him, very eager to sniff him; or when he trotted towards the kibble cage in recent years, she would run alongside him, staring sideways at him. I don’t really understand what her feelings were, and Napoleon didn’t really return them, but he at least tolerated them. Curiously, I can’t remember many attempts on Pearl’s part to snuggle up to Napoleon; she mostly snuggled with the other sphynxes.

She and Twigs (her uncle, incidentally) spent a ton of time together, and Anise was often in the mix as well. They’d often end up in a pile under or within a blanket, or all wedged into the same cat bed, or piled on a chair that had a towel on it. Sometimes she’d grumble at Anise for being too much in her personal space, but somehow Twigs’s presence seemed to defuse everything. I can’t remember her ever grumbling at Twigs, in fact.

Cheeseball is the only cat we have who’s younger than Pearl. When he was a kitten, she kind of doted on him like a mom, frequently trying to groom his head. She kept doing this into his adolescence, even as he was swiftly growing bigger than her, which was endearing and also very funny.

Napoleon and a comically smaller kitten Pearl share a cat bed

Pearl stares at Napoleon, who is in my lap

Moments later, Napoleon has his outstretched paws resting on Pearl's back while she sleeps

Twigs licks a kitten Pearl, who rolls on her back and waves her paws

Pearl and Twigs nestle together comfortably in Ash's arms

Pearl lounges in a cat bed while Twigs props himself up on her

Kitten Pearl bits Anise, who is clearly trying to sleep

Three sphynxes lay wrapped in a blanket, Pearl and Twigs both licking the ears of Anise in the middle

Three sphynxes lay in a pile under a blanket or two

Pearl lays in a sunbeam with Twigs nearby; Anise has tried to approach and she is pushing him away with a sour expression

Pearl's paws are now on Twigs's face, though he doesn't seem bothered at all

Pearl remains in the sunbeam and licks one paw, while Anise has settled a short distance away, facing the other way

Pearl licks kitten Cheeseball's head, though he's already bigger than her

We moved in 2018, and spent the summer with a former acquaintance’s parents, as they had a finished and furnished basement that was practically an apartment all on its own. Unfortunately, they had four cats of their own, for a total of nine crammed into a relatively small space. (One of the parents couldn’t be around cat hair in the medium term, due to reasons.)

One of the cats, Seamus, was a maine coon, and by all accounts kind of an asshole. He made a habit out of chasing Napoleon around, which Napoleon did not like at all, and which would result in Pearl chasing him to defend Napoleon, and then Anise chasing after Pearl because everyone is running around and he doesn’t quite understand why but he doesn’t want to be left out. We kept the cats separated as best we could, but we didn’t have much space to work with, and we were already trying to sequester Cheeseball, who we’d just adopted as a kitten. Everything was just kind of a mess.

Anyway this kinda stressed everyone out.

I bring it up because of one particular event. The only segregated parts of the basement were the bathroom and a somewhat awkwardly-shaped bedroom. The bedroom was exclusively for our cats. I don’t remember exactly what led up to this, but at some point Seamus made a beeline for the bedroom while Pearl was just inside the open door. I’m guessing Napoleon was in there too.

Pearl was absolutely not having this. She stood her ground and hissed hard enough to stop this absolutely massive cat in his tracks. She was so mad that she peed on the floor (which was, thankfully, vinyl). We got there to intervene about half a second later, but wow! She drew a line in the sand and under no circumstances was this bully going to cross it. We have always looked back fondly upon this “rage piss” incident.

I think Pearl was left a little rattled, though. Even at the time, she growled at the other maine coon there, who was an absolute sweetheart and rarely did more than sit nicely and ask to be pet. Once we were out of there, she seemed a little distrusting of Anise, often growling at him or biting his haunch merely for sitting nearby (which would entice a bewildered Anise into smacking her, justifying her reaction). I wish we hadn’t stayed there.

Cheeseball was also growing up and wanted to play with Pearl, because playing is how he engages with pretty much everything; alas, he was a bit too rowdy for Pearl. Twigs, infinitely patient, was there to absorb a lot of this.

But then Twigs died, and the cats’ relationships seemed to deteriorate. Cheeseball liked Pearl, but he always wanted to fight with her, which she didn’t like. Anise liked Pearl, but she seemed to resent him a lot of the time, and there was no Twigs to separate them. Pearl liked Napoleon, but Napoleon liked to be by himself.

It was okay, but tense.

Maybe I’m overstating this. Going back through photos of Pearl, I’ve found plenty from the post-Twigs era where she’s still hanging out with Anise peacefully. A number of their conflicts even started because she would approach Anise to sit by him, then growl at him. No wonder he was confused. Sometimes she would groom him and start growling, while licking his ear. Hello? What are you doing?? What do you want from him here.

Still, that must mean she still liked him. She just had some complicated feelings. It always made me a little sad when they couldn’t get along, though. I’d gotten Anise in the first place in part to give Twigs a friend, and Pearl and Twigs had always gotten along well, and now… well.

Pearl under a blanket, with Anise stretched out and resting his paws on top of her

Pearl and Anise lay in a row, wedged between some boxes and the wall

Pearl and Anise nap under a blanket, facing each other, with their arms kind of wrapped around each other

Pearl tilts her head down while Napoleon licks the top of it

Pearl, Anise, and Napoleon are all piled on top of Ash in bed

Having said all this about how great and lovely Pearl is, her presumptuousness also made her a huge pest in some very specific ways. For example, once we’d settled into the food routine that saved her from constant stomachaces, one of her favorite things to do was to go over to the kibble cage and try to find kibble that had escaped from it. If she could get away with it, she would stick her paw between the bars and pull kibble (or the entire bowl) out to eat.

It was slightly annoying, and also very funny. We called this pulling a heist. And then she’d have awful gas some hours later.

I also very distinctly remembering getting takeout one time, which happened to include a breaded and fried slab of fish. I had the little takeout container on the table in front of me, and I think I was fiddling with the wrapper on their plastic fork or something, when Pearl came along, sniffed it… and then bit the fish and pulled the whole filet out of the container. Right in front of me! Points for boldness, I guess. She wasn’t quite so audacious any other time, but she must’ve really liked the smell of that fish.

And while she was generally pretty picky about what she would consider a toy, she did, on occasion, like to bite the arms of my glasses. Once I was laying next to her and petting her while she purred, and she stuck a paw in between my glasses and my face, pulled them off, and tried to bite them — purring all the while.

Pearl takes a swipe at my glasses, which I am clearly offering to her

Three cats lurk, at various distances, around Ash, who is sitting on the couch and attempting to eat dinner

Pearl straddling Ash's shoulder and outstretched hand to sniff a lollipop that Ash is trying to keep out of reach

My favorite Pearl trick was what we dubbed “mouse alert”. If Pearl was looking for someone — often anyone at all, but sometimes a particular person who was absent or in a room with a closed door — she would find one of her toy mice and carry it around doing a very loud, muffled meow. If she saw you she would then drop the mouse and trot over, making happy high-pitched meows instead.

Sometimes she’d start out with regular meows, which we could hear from the other side of the house, but then they’d abruptly turn deeper and longer, and we knew she’d picked a mouse up. It was so charming and so funny. Every so often we’d find a pile of mice outside a door and we knew that Pearl had been trying to open it. She later expanded her roster to include Big Mouse — a plush almost half her size who became her favorite — and a plush of a single HIV virus that she must’ve stolen from my desk.

She didn’t play with the mice, either. I have video of her playing with a mouse when she was fairly young, but it’s not one of the mice we have now. She seemed to regard them as precious, her comforting belongings that she could almost always lure us out of hiding with. “Come look at my mouse!” Sometimes she’d carry them around quietly, just to have one or two nearby in a comfortable spot.

I tried for her whole life to get a recording of this, which proved nearly impossible, because she’d stop if she knew anyone was nearby! I got a clear recording only once, a week before she died; I was in our dark bedroom, filming into Ash’s office, and I don’t think she realized I was there. There’s a link at the bottom.

Her other favorite possession was string. Pearl loved to play string. She would ask for it by name. No, really. If she wanted to play string, she would find (or bring) a string and sit on it hoping someone noticed, and if that didn’t work, I’m pretty sure she had a specific meow for asking you to please follow her to string and then play with it.

Playing string with her was a slightly frustrating affair, but perhaps I just didn’t understand the rules. They seemed to be: I should wiggle the string; then Pearl grabs the string; then Pearl keeps the string. That doesn’t end the game, though. I should keep trying, in vain, to get the string back, while Pearl simply keeps winning.

A great thing to do was dangle it above her, at which point she’d stand up to try to get it and chomp at it, audibly. I loved her little chomp sound. I can’t even do it myself; I feel like I’d hurt my teeth.

After she was through adolescence, string was the only thing she really wanted to play with. She might’ve chased a laser pointer a couple of times, but string was the one thing she would ask for. Occasionally I’d try to play with Anise with a string, but Pearl had a fucking sixth sense for when string was happening, and she would appear from nowhere and go absolutely nuts over it while Anise sat back and watched.

Pearl looks longingly at the camera while hunkered down atop a length of twine

Pearl looks longingly at the camera, again, with a length of string dangling from a dresser in the background

Pearl sits in a box with a small plush mouse next to her

Pearl swipes, blurrily, at a string dangling in front of her

In March 2021, I took Pearl to an ER vet over very rapid breathing. They told me she’d had fluid in her lungs and diagnosed her with congestive heart failure. That’s when your heart can’t pump hard enough; part of Pearl’s heart wall had thinned and weakened, and one chamber was enlarged. She had to be hospitalized overnight. I drove home thinking I’d never see her again.

They couldn’t identify a cause. She was given a prognosis of “not fantastic” and prescribed a growing mountain of medication, which Ash dutifully gave to her every twelve hours for months on end, even when Pearl refused it. Sometimes Pearl had to be bribed with treats in order to eat at all, though I later traced that to a batch of food with insufficient pumpkin for her liking.

We had to keep her stress level low, which meant keeping her completely separated from the other cats (or at the very least Cheeseball) as much as possible. That meant Ash vanished into a closed room for most of every day to work while keeping an eye on Pearl — who was, after all, Ash’s cat. That also left me with three other cats constantly vying for my attention.

For several months we often couldn’t even sleep in the same room — Pearl and Anise couldn’t be left together, and Anise makes a racket all night if he’s shut out. Early on, our roommate would often take Pearl overnight (even despite being allergic to cats), but as time went on, Ash felt a stronger impulse to be around her as much as possible. Eventually we found we could have both Anise and Pearl overnight as long as we put a sweater on Anise and had sufficient extra blankets on the bed, but honestly it felt like a constant logistical nightmare.

Even with all this, we still had several more ER visits, several more hospitalizations.

Still, Pearl seemed to be doing okay. She was happy, she engaged with us, she purred, she snuggled, she nuzzled, she played. She was fine, and stable, until she wasn’t.

It was January 11, and it was the first ER visit for rapid breathing in a while. We handed her over, they hospitalized her, and we left, assuming we’d pick her up in the morning and she’d be fine, as had always happened.

We weren’t home for long before they called us. Pearl wasn’t recovering this time, and wouldn’t make it through the night.

We raced back. We saw Pearl, struggling to breathe, even on oxygen. We pet her and told her it would be okay. She cried out for help. Ash held her.

And then we let her go.

A bundle of string lays on the floor amidst some laundry

A nook surrounded by windows, with a towel in one corner and several plush toys neatly arranged on it

I love and miss so many little things. She had such beautiful eyes, like Twigs did, though she squinted a lot so it always felt like a special treat when I could see them clearly. Her whole face scrunched when she meowed. She had a marble pattern, so I guess she would’ve been a calico. I didn’t even notice it when we first got her, and then one day it jumped right out at me and I felt briefly like our kitten had been replaced with a different one. She had a funny little clump of four hairs that stuck out from her hip. She had marbling on her pawpads, too.

I love her wide vocabulary of very cute little meows, in contrast with Twigs’s more raucous ones. She reserved them for special occasions, opting to chirr most of the time.

I love how, when she was surprised by something, she would simply jump straight up in the air an inch, then come down. No other movement. It was like she was tweened. I never tried to spook her on purpose to see this, but she was a little prone to being spooked.

I love how, when she’d knead at a soft blanket, she did just a few quick little motions and then she was done. It was so dainty. I always called it kitty paws, to distinguish from cat paws.

I love how she’d do a straight upwards stretch that somehow made her ears flick inside out briefly.

I love the very deliberate way she tucked her paws, and how she would gently hold onto someone’s shoulders while getting a taxi ride. Everything she did came across as so purposeful.

I love how Ash had found that rubbing their face on Pearl’s side as a kitten would get her to purr, and that kept working for her whole life, and it’s basically what she ended up doing to people in return.

I love how she had a funny obsession with water. I can’t really explain it, and I don’t know what she found so interesting. If I took a swig from my water bottle with Pearl nearby, she would climb on whatever was necessary to sniff at the nozzle. If I opened a soda with Pearl nearby, she’d stick her nose right in the opening, then recoil when the bubbles fizzed her. She didn’t enjoy baths or anything, she just liked… water. From afar. Like with Napoleon, perhaps.

I love how she nuzzled so hard that she hit maximum nuzzle, and so she would also sort of gently swipe the air with her paw as well, for extra nuzzling power.

I love her funny “bug off” sweater, illustrated with a ladybug, which seemed to capture her personality well: don’t be rude to me, but expressed in a very cute manner.

I love how she adopted the sort of extended windowsill in our bathroom as her own, and would lay there on sunny days and roll around on a towel.

I love that she was pampered right to the end. Over the course of recent weeks, Ash would keep giving me updates on Pearl’s development of a new routine, where she would sit in a Treat Spot she had designated, possibly meow once or twice, and wait very nicely until Ash gave her a treat. And then Ash would eventually capitulate, helpful before the polite ministrations of this very tiny cat, and give her a treat. It seemed that the number of treats Pearl was managing to get per day was gradually increasing, and so I asked every time: why not simply not give her a treat? But I knew the answer.

If you cried, there were decent odds that Pearl would come and comfort you, come chirp at you and nuzzle until you felt better.

When we first moved here, Ash’s ex-husband had driven the truck containing all our stuff, and he slept here one night before leaving for good. The day after he’d left, we heard Pearl doing mouse alert in the room he’d slept in, and I just broke down sobbing at the kitchen table, thinking about how Pearl liked him despite everything and was just trying to find him, and we had no way to tell her he wasn’t coming back or explain any of it to her. To her, one of her favorite people had just disappeared, and that was so sad.

But Pearl heard me, came over, jumped on the kitchen table, and purred and headbutted me like crazy. The idea that I was sad for her and she still wanted to comfort me made me cry harder.

She would also headbutt and nuzzle Ash specifically on the mouth when they sang, or do the same to me if I whistled competently. I suppose she liked music, but only from us.

Most of all, I love… how much she doted on Ash. She slept alongside them (me only a few times), she followed them around, she waited outside doors for them. They were her favorite person. I feel so bad for them, to have lost both Twigs and Pearl back to back.

Pearl sniffs suspiciously at the nozzle of a reusable water bottle

Pearl stares down into a half-full bottled water

Pearl sits patiently outside the bathroom door

Pearl seen from above to be wearing a green and white striped shirt with "BUG OFF!" and a ladybug emblem

It’s been… two weeks now. Just over, because it took me another day to finish this post.

I don’t know if it’s fully clicked yet. I didn’t see Pearl much during the day, since she’d be tucked away in Ash’s office slash our bedroom. I saw her mostly at night and first thing in the morning. So while I’m out here, at my desk, it’s like nothing has changed. It only sinks in when I go upstairs and see the door left open, see a bed with no Pearl tucked in it somewhere.

It’s kind of dumbfounding just how much of this house and our lives had warped around Pearl, around this one tiny cat who loved everyone. So many things have disappeared or seem superfluous now. I was already free-feeding the other cats again since Pearl wasn’t allowed to roam the house unsupervised, but now we don’t need the kibble cage at all. Half our doors had been kept closed to make a few different places for Pearl to stay, but now none of that is necessary. Litterboxes had ended up scattered throughout the house so Pearl would always have access to one; now they’re back to being in a few central locations.

Ash doesn’t have to wake up at a specific time every day to give Pearl medicine. Pearl won’t wake us up to feed her. We don’t have to make her food, ever again.

And there are so many things that were only for Pearl. This wasn’t the case for anyone else. Styx only had communal cat sweaters; his favorite toy was loose change on my desk. Twigs, too, only had sweaters that Anise and Pearl inherited; his one dedicated toy was a single very tiny mouse he sometimes played with.

But Pearl? Half the sweaters we have only fit Pearl. Her mice were very much hers. Even her string was very much hers. We have a mortar and pestle that were specifically for grinding up her medication, oral syringes only Pearl used. She had possessions of her very own, things she’s left behind.

We knew this was coming, of course. Without the intervention of modern medicine, she would have died last March, and the outlook for heart failure in a cat isn’t great. I’ve already grieved for her several times over the past year. I didn’t see her much during the summer, but I’d been trying to spend more deliberate time with her in recent months, and I’m glad I did. I regret nothing. I earned her purrs, I played string with her exactly the right amount, I woke up to her stealing my pillow. I got the full Pearl experience.

And so did she. Ash took her outside extra over the summer, let her see a bit of the outside world (even if it was only our yard). We let her roam the house when we could, banishing Cheeseball to a room by himself if necessary, though she usually ended up sitting on a vent or my lap (or trying to heist some kibble). She got lots of treats, lots of love, lots of blankets, and even a vent all to herself. What more could she ask for?

She was living on borrowed time, but we borrowed every second we could. I don’t know what else we could’ve done. And we were there for her right up until the end. We didn’t have that opportunity with Twigs; he died in the back room, surrounded by strangers.

In the end, her heart was literally too big.

Pearl sits, paws folded, looking downwards with a frumpy frowny expression

Pearl does a big stretch, straight upwards, while standing on Ash's chest, as they try to look at their phone

Pearl and Anise stare at a video of a squirrel on the TV

Pearl stands up on her hind legs, making her shoulders look very strange

Pearl sits with tucked paws on a little pillow that's only a little bigger than she is

Pearl appears to nap, laying across Ash's shoulders while they draw

This sucks.

Pearl deserved better. She was dealt a bad hand from the beginning, but she was still friendly and kind, and then this happened. She was so young, too — her eighth birthday would’ve been next month. She, like Twigs, should’ve had twice as long.

Things won’t be difficult for her any more, I guess. I don’t know how much that comforts me.

Everything else moves on. Pearl continued until the night of January 11, 2022, but can go no further. We’re forced to leave her there, retaining only memories, while time carries us gently forward, ever further away.

So here is my landmark, my stake in the ground. Pearl was here. May this mark out the shape of who she was and leave that impression upon the world for much longer.

The finality of death resolves so many questions. I often wished I could improve Pearl’s tense relationships with Anise and Cheeseball, but now there’s no problem to solve. The interactions they had are all the interactions they will ever have. The tension is gone, now. The worries about how long Pearl’s heart will last are gone too.

The cat dynamic has shifted, again. Cheeseball and Napoleon have been much more affectionate towards Ash, and Napoleon has suddenly become a lap cat. I suppose the rest of the cats missed Ash while they were siloed away with Pearl for so long. Maybe they’re grieving? Cats are so open with their emotions, but sometimes they’re still inscrutable.

Pearl’s urn is on the dresser in our bedroom, right next to Twigs. Hers is bigger than his, somehow. But that’s Pearl for you — she always knew how to take up space.

…

No, this is too dire an ending. Pearl was dealt a bad hand, but she always tried to be nice despite that. She got to see a lot of places and make a lot of friends, both people and cats and even one dog. Even when she had complex and skeptical feelings about Anise, she kept trying to be friends with him. She faltered at times, but she always did her best to uphold her principles of loveliness, strong boundaries, and please give me a treat.

That’s a lot for a tiny cat. I admire her for it, and I will not forget it.

Pearl sitting contently next to Ash at their desk

Thank you for reading about Pearl. I hope you’ll remember her too. We loved her very much, and she put a lot of love back into the world. If you would like to experience more Pearl, here are some videos of her. I have some more to sift through, so this list may grow in the coming days.

And here are some games she has starred in. Or, rather, her fursona Purrl has starred in them.

NEON PHASE — a shortish platformer with items and puzzles where Purrl is an NPC
Lunar Depot 38 — a little platforming shooter where you play as Purrl, with a gun
Star Anise Chronicles: Escape from the Chamber of Despair — an illustrated text adventure where you must work together with Purrl to escape
Star Anise Chronicles: Oh No Wheres Twig?? — a short platformer with items and puzzles where Purrl is an NPC, but differently, and made after Twigs died
a very tiny purrl game — a tiny 3D platformer where you play as Purrl and must collect things

Със или без Нинова, БСП е обречена да се стопява

2022-01-26 Венелина Попова

Post Syndicated from Венелина Попова original https://toest.bg/sus-ili-bez-ninova-bsp-e-obrechena-da-se-stopyava/

Заседанието от миналата събота на 50-тия Конгрес на БСП съвсем не мина в юбилеен дух: декорите бяха бутафорни, а говоренето в огромната зала на НДК – помпозно, клиширано и безсмислено. Опитите провалът на партията на последните избори да се представи като успех бяха унизително глупави. Делегат от Хасково нарече БСП опитен гросмайстор, който е финализирал успешно партия шах с участието си във властта. Александър Симов, един от т.нар. преторианци на Нинова, изтърси, че БСП не била идейно подменена, а продължавала да бъде ярка, лява и запомняща се партия. Кое от това е вярно?

Единственият опонент на Нинова от партията Крум Зарков определи случилото се като фарс. Преди форума председателят на Партията на европейските социалисти Сергей Станишев нарече резултатите от последните избори „катастрофа“ и заяви, че ако оставката на Корнелия Нинова не се препотвърди от делегатите, това в очите на избирателите ще изглежда гротескно и ще бъде прието като неглижиране на вота им. Евродепутатката Елена Йончева, която е сред най-яростните критици на Нинова, заяви в предаването „Неделя 150“ по БНР: „Конгресът беше срамен и режисиран, хвърли БСП в категорията на маргиналните партии. Конгресът нямаше връзка с реалността. На Конгреса не получихме отговор на въпроса защо изгубихме доверието на 800 000 души.“

В крайна сметка Корнелия Нинова запази властта си в партията, а така – и позицията си на вицепремиер. С друг председател социалистите можеха да свалят доверието си от нея и да издигнат за вицепремиер ново лице – мотиви винаги ще се намерят. Сега Нинова може спокойно да продължи да се обгражда и в министерството с послушковци и дори да раздава постове като индулгенции на свои покаяли се партийни врагове. Може да продължи да наказва непреклонните, дори да ги изхвърля от партията, както направи с Кирил Добрев, с пловдивския олигарх Георги Гергов и с други свои другари.

Анализаторите, идейните противници на Корнелия Нинова и приближените ѝ в БСП ще продължат (и по инерция) още известно време да коментират събитието и евентуалните му последици за партията столетница. За разцепването ѝ и за необходимостта от създаване на нова и автентична лява партия, от която в България имало нужда, заговориха изхабени политици като Михаил Миков, който не беше запомнен с нищо по време на краткия му престой на върха на БСП.

Но това не е първото „време разделно“ в средите на българските социалисти.

Преди осем години знакови фигури в левицата, като бившия президент Георги Първанов, Румен Петков, Евгений Жеков и други, създадоха отначало идейна платформа в БСП, която след изключването им от партията превърнаха в нов политически проект под името „Алтернатива за българско възраждане“ (АБВ). Новата партия беше обявена за борец срещу задкулисието в политиката, срещу лобизма и подмяната на легитимните демократични механизми при вземането на политически и управленски решения и при назначаването на кадри. А всъщност АБВ се появи в резултат от конфликта между Сергей Станишев и Георги Първанов, когато първият беше премиер на кабинета на тройната коалиция, а президентът – неин архитект.

През 2014 г. АБВ дори участва в управлението на страната, след като гарантира парламентарното мнозинство по време на втория коалиционен кабинет на Бойко Борисов. Тогава Ивайло Калфин от партията стана четвъртият вицепремиер в правителството на ГЕРБ и Реформаторския блок. През пролетта на 2016 г. Калфин подаде оставка, а от АБВ обявиха, че оттеглят подкрепата си за парламентарното мнозинство и кабинета. Това „съвсем случайно“ съвпадна с избора на Корнелия Нинова за председател на БСП на 8 май същата година, когато тя спечели с 6 гласа лидерската битка срещу Михаил Миков. И вече беше дала знак, че е склонна да подаде ръка на отцепниците от АБВ, ако заблудените овце се върнат в стадото.

Дали сега ще има ново разцепване на партийния кораб, което този път може окончателно да го потопи, няма смисъл да се гадае. Защото ако се случи, всички неизбежно ще го разберем. Макар че една анкета сред социалистите вероятно би показала, че те възприемат партията си като безсмъртна…

По-важният въпрос, който доминира в този сюжет, е

дали лидерът носи цялата вина и отговорност за драматичния разгром на БСП на последните парламентарни избори през ноември 2021 г.

За да се отговори сравнително смислено на този въпрос, е нужно да се вгледаме в графиката на доверието към левицата в годините на Прехода и да проследим нейните пикове и спадове. И тогава да правим изводи.

Графика

Първият голям срив в доверието към БСП дойде след втория кабинет на Андрей Луканов. Той управлява само няколко месеца, на практика докара държавата до банкрут и спря да обслужва външния ѝ дълг. След т.нар. Луканова зима, изборите през 1991 г. бяха спечелени от СДС, а социалистическата партия изгуби над един милион гласоподаватели. Но властта на практика беше още в ръцете на бившите комунисти и твърде скоро Държавна сигурност чрез ДПС свали правителството на Филип Димитров.

По време на управлението на кабинета „Беров“, съставен с мандата на ДПС, бяха поставени основите на паралелната държава и на организираната престъпност, създадоха се силовите групировки и финансовите пирамиди, с които започна ограбването на българските граждани. Впоследствие БСП начело с Жан Виденов спечели парламентарните избори през 1994 г. и получи близо 2,26 млн. от гласовете на избирателите. Но правителството му изкара половин мандат и падна. Тогавашният Министерски съвет се оказа с рекорден брой агенти от Държавна сигурност – през 2001 г. Комисията по досиетата съобщи, че за 22 лица в състава му са открити данни за сътрудничество с ДС и Разузнавателното управление на Генералния щаб, от които за 11 души има безспорни доказателства. Самият премиер с прякор Дунав е бил съдържател на явочна квартира в Пловдив.

След т.нар. Виденова зима, на изборите през 1997 г., когато лидер на БСП вече е Георги Първанов, изборният резултат за партията е драматичен.

Гласовете за нея вече са близо два пъти и половина по-малко, отколкото на предишния вот.

Уплашени от успешния мандат на ОДС и правителството на Иван Костов, определени среди организират завръщането на Симеон Сакскобургготски в България. Със създаденото национално движение на негово име (НДСВ) той спечели изборите през 2001 г., стана първият в света монарх премиер в парламентарна република и създаде най-непрозрачната коалиция заедно с ДПС и „Новото време“. Сакскобургготски спря и отварянето на досиетата на Държавна сигурност, а бившите агенти се радваха на особена почит по време на неговия мандат. През 2001 г. БСП понесе сериозна загуба – партията за първи път в годините на Прехода падна до резултат малко над 783 000 гласа. Лидер тогава е вече Сергей Станишев, а неочаквано същата година Георги Първанов спечели изборите за президент.

През 2005 г. БСП за последен път получи доверието на повече от един милион избиратели и с близо 1,13 млн. гласа стана първа политическа сила в страната. След управлението на тройната коалиция и кабинета на Сергей Станишев още два пъти БСП успя да вдигне резултатите си на избори – на предсрочните през 2013 г. (след оставката на първия кабинет „Борисов“) и през 2017 г., когато левицата, вече начело с Корнелия Нинова, спечели 80 места в парламента. Но така и не успя да прескочи границата от един милион гласа.

За последните три десетилетия левицата катастрофира няколко пъти на избори. През 2014 г., когато протестиращите срещу модела #КОЙ свалиха правителството на БСП и ДПС с премиер Пламен Орешарски, социалистите паднаха до резултат от около половин милион гласа. А на последните три парламентарни избори през 2021 г. резултатите за социалистическата партия постепенно се сриват, за да стигнат до 267 817 гласа на вота през ноември.

Нинова безспорно има отношение към погрома на левицата, но би било повърхностно той да бъде обяснен само с нейното лидерство в БСП.

Корнелия Нинова може и да е влюбена във властта и напълно способна да прегази хората, които се опитват да ѝ я измъкнат от ръцете, както я описват нейни опоненти. Лица от обкръжението ѝ изглеждат като апаратчици от епохата на социализма и беше странно да видим компрометирани бивши министри, като Емилия Масларова и Румен Гечев, на коалиционните преговори за кабинет в качеството им на експерти. И в листите за изборите, и в подреждането им, и в разприте за това кой да ги води, се видя, че Нинова не действа принципно, а дава преднина на клакьорите и на тези, които ѝ угодничат. Програмата на БСП „Визия за България“ пък се оказа скучен, безидеен и лишен от всякаква визия документ за управлението на страната. И така нататък…

Но нали БСП не е пирамидална, вождистка партия? Защо ръководството ѝ проспа лидерските грешки на Нинова? От конформизъм, ленивост, безразличие? Които познават партията отвътре, интригите, битките за надмощие, лобитата – да кажат.

За външните наблюдатели обаче е ясно, че БСП ще продължи да върви надолу, защото:

БСП не се отказа от комунистическото си минало и наследство, а прие цялата история на тоталитарната БКП за своя. И като знак за това Корнелия Нинова и другарите ѝ продължават да се покланят ритуално на гроба на Тодор Живков в Правец.

БСП не осъди престъпленията на бившия режим. Не се извини за жертвите на Народния съд, за издевателствата и убийствата в лагерите, за унищожаването на елита на нацията ни след преврата на 9 септември 1944 г. Не поиска искрено прошка от българските мюсюлмани заради т.нар. Възродителен процес. (Извинението на Сергей Станишев и целувката на Местан на Орлов мост през 2014 г. не изпратиха ясни послания към малцинството.)

БСП не прекъсва и до днес обвързаността си с бившите тайни служби на тоталитарната държава.

БСП продължава да нарича съветската окупация на България „освобождение“ и да бъде „пета колона“ на Русия и днес.

Но най-важното е, че БСП не е нито лява, нито социална партия.

И доказателство за това са не само милионерите и олигарсите в нея, но и политиката ѝ в данъчната, пенсионната и социалната сфера. А всяко нейно правителство или е катастрофирало (Луканов, Виденов), или е управлявало непрозрачно и силно зависимо от дълбоката държава (Станишев, Орешарски).

Партията ще продължи да губи електорална подкрепа, защото няма какво да предложи на избирателите си. А и защото, както видяхме, не позволява смяна на поколенията в ръководството си. БСП, както и останалите партии, създадени от матрицата на Прехода, ще изчезнат или ще продължат да съществуват като маргинални организации без влияние в политиката. И това е неизбежно.

Заглавна снимка: Корнелия Нинова изнася политически доклад пред Конгреса на БСП на 22 януари 2022 г. Стопкадър от видеоклип на БСТВ

Източник

Deploy and Manage Gitlab Runners on Amazon EC2

2022-01-26 Sylvia Qi

Post Syndicated from Sylvia Qi original https://aws.amazon.com/blogs/devops/deploy-and-manage-gitlab-runners-on-amazon-ec2/

Gitlab CI is a tool utilized by many enterprises to automate their Continuous integration, continuous delivery and deployment (CI/CD) process. A Gitlab CI/CD pipeline consists of two major components: A .gitlab-ci.yml file describing a pipeline’s jobs, and a Gitlab Runner, an application that executes the pipeline jobs.

Setting up the Gitlab Runner is a time-consuming process. It involves provisioning the necessary infrastructure, installing the necessary software to run pipeline workloads, and configuring the runner. For enterprises running hundreds of pipelines across multiple environments, it is essential to automate the Gitlab Runner deployment process so as to be deployed quickly in a repeatable, consistent manner.

This post will guide you through utilizing Infrastructure-as-Code (IaC) to automate Gitlab Runner deployment and administrative tasks on Amazon EC2. With IaC, you can quickly and consistently deploy the entire Gitlab Runner architecture by running a script. You can track and manage changes efficiently. And, you can enforce guardrails and best practices via code. The solution presented here also offers autoscaling so that you save costs by terminating resources when not in use. You will learn:

How to deploy Gitlab Runner quickly and consistently across multiple AWS accounts.
How to enforce guardrails and best practices on the Gitlab Runner through IaC.
How to autoscale Gitlab Runner based on workloads to ensure best performance and save costs.

This post comes from a DevOps engineer perspective, and assumes that the engineer is familiar with the practices and tools of IaC and CI/CD.

Overview of the solution

The following diagram displays the solution architecture. We use AWS CloudFormation to describe the infrastructure that is hosting the Gitlab Runner. The main steps are as follows:

The user runs a deploy script in order to deploy the CloudFormation template. The template is parameterized, and the parameters are defined in a properties file. The properties file specifies the infrastructure configuration, as well as the environment in which to deploy the template.
The deploy script calls CloudFormation CreateStack API to create a Gitlab Runner stack in the specified environment.
During stack creation, an EC2 autoscaling group is created with the desired number of EC2 instances. Each instance is launched via a launch template, which is created with values from the properties file. An IAM role is created and attached to the EC2 instance. The role contains permissions required for the Gitlab Runner to execute pipeline jobs. A lifecycle hook is attached to the autoscaling group on instance termination events. This ensures graceful instance termination.
During instance launch, CloudFormation uses a cfn-init helper script to install and configure the Gitlab Runner:
1. cfn-init installs the Gitlab Runner software on the EC2 instance.
2. cfn-init configures the Gitlab Runner as a docker executor using a pre-defined docker image in the Gitlab Container Registry. The docker executor implementation lets the Gitlab Runner run each build in a separate and isolated container. The docker image contains the software required to run the pipeline workloads, thereby eliminating the need to install these packages during each build.
3. cfn-init registers the Gitlab Runner to Gitlab projects specified in the properties file, so that these projects can utilize the Gitlab Runner to run pipelines.

The user may repeat the same steps to deploy Gitlab Runner into another environment.

Architecture diagram previously explained in post.

Walkthrough

This walkthrough will demonstrate how to deploy the Gitlab Runner, and how easy it is to conduct Gitlab Runner administrative tasks via this architecture. We will walk through the following tasks:

Build a docker executor image for the Gitlab Runner.
Deploy the Gitlab Runner stack.
Update the Gitlab Runner.
Terminate the Gitlab Runner.
Add/Remove Gitlab projects from the Gitlab Runner.
Autoscale the Gitlab Runner based on workloads.

The code in this post is available at https://github.com/aws-samples/amazon-ec2-gitlab-runner.git

Prerequisites

For this walkthrough, you need the following:

A Gitlab account (all tiers including Gitlab Free self-managed, Gitlab Free SaaS, and higher tiers). This demo uses gitlab.com free tire.
A Gitlab Container Registry.
A Git client to clone the source code provided.
An AWS account with local credentials properly configured (typically under ~/.aws/credentials).
The latest version of the AWS CLI. For more information, see Installing, updating, and uninstalling the AWS CLI.
Docker is installed and running on the localhost/laptop.
Nodejs and npm installed on the localhost/laptop.
A VPC with 2 private subnets and that is connected to the internet via NAT gateway allowing outbound traffic.
The following IAM service-linked role created in the AWS account: AWSServiceRoleForAutoScaling
An Amazon S3 bucket for storing Lambda deployment packages.
Familiarity with Git, Gitlab CI/CD, Docker, EC2, CloudFormation and Amazon CloudWatch.

Build a docker executor image for the Gitlab Runner

The Gitlab Runner in this solution is implemented as docker executor. The Docker executor connects to Docker Engine and runs each build in a separate and isolated container via a predefined docker image. The first step in deploying the Gitlab Runner is building a docker executor image. We provided a simple Dockerfile in order to build this image. You may customize the Dockerfile to install your own requirements.

To build a docker image using the sample Dockerfile:

Create a directory where we will store our demo code. From your terminal run:

mkdir demo-repos && cd demo-repos

Clone the source code repository found in the following location:

git clone https://github.com/aws-samples/amazon-ec2-gitlab-runner.git

Create a new project on your Gitlab server. Name the project any name you like.
Clone your newly created repo to your laptop. Ignore the warning about cloning an empty repository.

git clone <your-repo-url>

Copy the demo repo files into your newly created repo on your laptop, and push it to your Gitlab repository. You may customize the Dockerfile before pushing it to Gitlab.

cp -r amazon-ec2-gitlab-runner/* <your-repo-dir>
cd <your-repo-dir>
git add .
git commit -m “Initial commit”
git push

On the Gitlab console, go to your repository’s Package & Registries -> Container Registry. Follow the instructions provided on the Container Registry page in order to build and push a docker image to your repository’s container registry.

Deploy the Gitlab Runner stack

Once the docker executor image has been pushed to the Gitlab Container Registry, we can deploy the Gitlab Runner. The Gitlab Runner infrastructure is described in the Cloudformation template gitlab-runner.yaml. Its configuration is stored in a properties file called sample-runner.properties. A launch template is created with the values in the properties file. Then it is used to launch instances. This architecture lets you deploy Gitlab Runner to as many environments as you like by utilizing the configurations provided in the appropriate properties files.

During the provisioning process, utilize a cfn-init helper script to run a series of commands to install and configure the Gitlab Runner.

          commands:
            01InstallDocker:
              command: sudo yum -y install docker
            02StartDocker:
              command: sudo service docker start
            03DownloadGitlabRunner:
              command: sudo wget -O /usr/bin/gitlab-runner https://gitlab-runner-downloads.s3.amazonaws.com/latest/binaries/gitlab-runner-linux-amd64
            04ChmodGitlabRunner:
              command: sudo chmod a+x /usr/bin/gitlab-runner
            05AddUser:
              command: sudo useradd --comment 'GitLab Runner' --create-home gitlab-runner --shell /bin/bash
            06InstallGitlabRunner:
              command: sudo gitlab-runner install --user=gitlab-runner --working-directory=/home/gitlab-runner
            07SetRegion:
              command: !Sub 'aws configure set default.region ${AWS::Region}'
            08ConfigureDockerExecutor:
              command: !Sub 
                - |
                  for GitlabGroupToken in `aws ssm get-parameters --names /${AWS::StackName}/ci-tokens --query 'Parameters[0].Value' | sed -e "s/\"//g" | sed "s/,/ /g"`;do
                      sudo gitlab-runner register \
                      --non-interactive \
                      --url "${GitlabServerURL}" \
                      --registration-token $GitlabGroupToken \
                      --executor "docker" \
                      --docker-image "${DockerImagePath}" \
                      --description "Gitlab Runner with Docker Executor" \
                      --locked="${isLOCKED}" --access-level "${ACCESS}" \
                      --docker-volumes "/var/run/docker.sock:/var/run/docker.sock" \
                      --tag-list "${RunnerEnvironment}-${RunnerVersion}-docker"
                  done
                - isLOCKED: !FindInMap [GitlabRunnerRegisterOptionsMap, !Ref RunnerEnvironment, isLOCKED]
                  ACCESS: !FindInMap [GitlabRunnerRegisterOptionsMap, !Ref RunnerEnvironment, ACCESS]                              
            09StartGitlabRunner:
              command: sudo gitlab-runner start

The helper script ensures that the Gitlab Runner setup is consistent and repeatable for each deployment. If a configuration change is required, users simply update the configuration steps and redeploy the stack. Furthermore, all changes are tracked in Git, which allows for versioning of the Gitlab Runner.

To deploy the Gitlab Runner stack:

Obtain the runner registration tokens of the Gitlab projects that you want registered to the Gitlab Runner. Obtain the token by selecting the project’s Settings > CI/CD and expand the Runners section.
Update the sample-runner.properties file parameters according to your own environment. Refer to the gitlab-runner.yaml file for a description of these parameters. Rename the file if you like. You may also create an additional properties file for deploying into other environments.
Run the deploy script to deploy the runner:

cd <your-repo-dir>
./deploy-runner.sh <properties-file> <region> <aws-profile> <stack-name>

<properties-file> is the name of the properties file.

<region> is the region where you want to deploy the stack.

<aws-profile> is the name of the CLI profile you set up in the prerequisites section.

<stack-name> is the name you chose for the CloudFormation stack.

For example:

./deploy-runner.sh sample-runner.properties us-east-1 dev amazon-ec2-gitlab-runner-demo

After the stack is deployed successfully, you will see the Gitlab Runner autoscaling group created in the EC2 console:

After the stack is deployed successfully, you will see the Gitlab Runner autoscaling group created in the EC2 console.

Under your Gitlab project Settings > CICD > Runners > Available specific runners, you will see the fully configured Gitlab Runner. The green circle indicates that the Gitlab Runner is ready for use.

Now go to your Gitlab project Settings  CICD  Runners  Available specific runners, you will see the fully configured Gitlab Runner. The green circle indicates that the Gitlab Runner is ready for use.

Updating the Gitlab Runner

There are times when you would want to update the Gitlab Runner. For example, updating the instance VolumeSize in order to resolve a disk space issue, or updating the AMI ID when a new AMI becomes available.

Utilizing the properties file and launch template makes it easy to update the Gitlab Runner. Simply update the Gitlab Runner configuration parameters in the properties file. Then, run the deploy script to udpate the Gitlab Runner stack. To ensure that the changes take effect immediately (e.g., existing instances are replaced by new instances with the new configuration), we utilize an AutoscalingRollingUpdate update policy to automatically update the instances in the autoscaling group.

    UpdatePolicy:
      AutoScalingRollingUpdate:
        MinInstancesInService: !Ref MinInstancesInService
        MaxBatchSize: !Ref MaxBatchSize
        PauseTime: "PT5M"
        WaitOnResourceSignals: true
        SuspendProcesses:
          - HealthCheck
          - ReplaceUnhealthy
          - AZRebalance
          - AlarmNotification
          - ScheduledActions

The policy tells CloudFormation that when changes are detected in the launch template, update the instances in batch size of MaxBatchSize, while keeping a number of instances (specified in MinInstanceInService) in service during the update.

Below is an example of updating the Gitlab Runner instance type.

To update the instance type of the runner instance:

Update the “InstanceType” parameter in the properties file.

InstanceType=t2.medium

Run the deploy-runner.sh script to update the CloudFormation stack:

cd <your-repo-dir>
./deploy-runner.sh <properties-file> <region> <aws-profile> <stack-name>

In the CloudFormation console, you will see that the launch template is updated first, then a rolling update is initiated. The instance type update requires a replacement of the original instance, so a temporary instance was launched and put in service. Then, the temporary instance was terminated when the new instance was launched successfully.

After the update is complete, you will see that on the Gitlab project’s console, the old Gitlab Runner, ez_5x8Rv, is replaced by the new Gitlab Runner, N1_UQ7yc.

Terminate the Gitlab Runner

There are times when an autoscaling group instance must be terminated. For example, during an autoscaling scale-in event, or when the instance is being replaced by a new instance during a stack update, as seen previously. When terminating an instance, you must ensure that the Gitlab Runner finishes executing any running jobs before the instance is terminated, otherwise your environment could be left in an inconsistent state. Also, we want to ensure that the terminated Gitlab Runner is removed from the Gitlab project. We utilize an autoscaling lifecycle hook to achieve these goals.

The lifecycle hook works like this: A CloudWatch event rule actively listens for the EC2 Instance-terminate events. When one is detected, the event rule triggers a Lambda function. The Lambda function calls SSM Run Command to run a series of commands on the EC2 instances, via a SSM Document. The commands include stopping the Gitlab Runner gracefully when all running jobs are finished, de-registering the runner from Gitlab projects, and signaling the autoscaling group to terminate the instance.

There are also times when you want to terminate an instance manually. For example, when an instance is suspected to not be functioning properly. To terminate an instance from the Gitlab Runner autoscaling group, use the following command:

aws autoscaling terminate-instance-in-auto-scaling-group \
    --instance-id="${InstanceId}" \
    --no-should-decrement-desired-capacity \
    --region="${region}" \
    --profile="${profile}"

The above command terminates the instance. The lifecycle hook ensures that the cleanup steps are conducted properly, and the autoscaling group launches another new instance to replace the old one.

Note that if you terminate the instance by using the “ec2 terminate-instance” command, then the autoscaling lifecycle hook actions will not be triggered.

Add/Remove Gitlab projects from the Gitlab Runner

As new projects are added to your enterprise, you may want to register them to the Gitlab Runner, so that those projects can utilize the Gitlab Runner to run pipelines. On the other hand, you would want to remove the Gitlab Runner from a project if it no longer wants to utilize the Gitlab Runner, or if it qualifies to utilize the Gitlab Runner. For example, if a project is no longer allowed to deploy to an environment configured by the Gitlab Runner. Our architecture offers a simple way to add and remove projects from the Gitlab Runner. To add new projects to the Gitlab Runner, update the RunnerRegistrationTokens parameter in the properties file, and then rerun the deploy script to update the Gitlab Runner stack.

To add new projects to the Gitlab Runner:

Update the RunnerRegistrationTokens parameter in the properties file. For example:

RunnerRegistrationTokens=ps8RjBSruy1sdRdP2nZX,XbtZNv4yxysbYhqvjEkC

Update the Gitlab Runner stack. This updates the SSM parameter which stores the tokens.

cd <your-repo-dir>
./deploy-runner.sh <properties-file> <region> <aws-profile> <stack-name>

Relaunch the instances in the Gitlab Runner autoscaling group. The new instances will use the new RunnerRegistrationTokens value. Run the following command to relaunch the instances:

./cycle-runner.sh <runner-autoscaling-group-name> <region> <optional-aws-profile>

To remove projects from the Gitlab Runner, follow the steps described above, with just one difference. Instead of adding new tokens to the RunnerRegistrationTokens parameter, remove the token(s) of the project that you want to dissociate from the runner.

Autoscale the runner based on custom performance metrics

Each Gitlab Runner can be configured to handle a fixed number of concurrent jobs. Once this capacity is reached for every runner, any new jobs will be in a Queued/Waiting status until the current jobs complete, which would be a poor experience for our team. Setting the number of concurrent jobs too high on our runners would also result in a poor experience, because all jobs leverage the same CPU, memory, and storage in order to conduct the builds.

In this solution, we utilize a scheduled Lambda function that runs every minute in order to inspect the number of jobs running on every runner, leveraging the Prometheus Metrics endpoint that the runners expose. If we approach the concurrent build limit of the group, then we increase the Autoscaling Group size so that it can take on more work. As the number of concurrent jobs decreases, then the scheduled Lambda function will scale the Autoscaling Group back in an effort to minimize cost. The Scaling-Up operation will ignore the Autoscaling Group’s cooldown period, which will help ensure that our team is not waiting on a new instance, whereas the Scale-Down operation will obey the group’s cooldown period.

Here is the logical sequence diagram for the work:

Sequence diagram

For operational monitoring, the Lambda function also publishes custom CloudWatch Metrics for the count of active jobs, along with the target and actual capacities of the Autoscaling group. We can utilize this information to validate that the system is working properly and determine if we need to modify any of our autoscaling parameters.

Congratulations! You have completed the walkthrough. Take some time to review the resources you have deployed, and practice the various runner administrative tasks that we have covered in this post.

Troubleshooting

Problem: I deployed the CloudFormation template, but no runner is listed in my repository.

Possible Cause: Errors have been encountered during cfn-init, causing runner registration to fail. Connect to your runner EC2 instance, and check /var/log/cfn-*.log files.

Cleaning up

To avoid incurring future charges, delete every resource provisioned in this demo by deleting the CloudFormation stack created in the “Deploy the Gitlab Runner stack” section.

Conclusion

This article demonstrated how to utilize IaC to efficiently conduct various administrative tasks associated with a Gitlab Runner. We deployed Gitlab Runner consistently and quickly across multiple accounts. We utilized IaC to enforce guardrails and best practices, such as tracking Gitlab Runner configuration changes, terminating the Gitlab Runner gracefully, and autoscaling the Gitlab Runner to ensure best performance and minimum cost. We walked through the deploying, updating, autoscaling, and terminating of the Gitlab Runner. We also saw how easy it was to clean up the entire Gitlab Runner architecture by simply deleting a CloudFormation stack.

About the authors

Alien Mission

2022-01-26

Post Syndicated from original https://xkcd.com/2573/

Fine, we can go search the Himalayas for the Yeti ONE more time, but keep a safe altitude over the Pacific and PLEASE watch where you're going. We can't afford another Amelia Earhart incident.

Minimizing Dependencies in a Disaster Recovery Plan

2022-01-26 Randy DeFauw

Post Syndicated from Randy DeFauw original https://aws.amazon.com/blogs/architecture/minimizing-dependencies-in-a-disaster-recovery-plan/

The Availability and Beyond whitepaper discusses the concept of static stability for improving resilience. What does static stability mean with regard to a multi-Region disaster recovery (DR) plan? What if the very tools that we rely on for failover are themselves impacted by a DR event?

In this post, you’ll learn how to reduce dependencies in your DR plan and manually control failover even if critical AWS services are disrupted. As a bonus, you’ll see how to use service control policies (SCPs) to help simulate a Regional outage, so that you can test failover scenarios more realistically.

Failover plan dependencies and considerations

Let’s dig into the DR scenario in more detail. Using Amazon Route 53 for Regional failover routing is a common pattern for DR events. In the simplest case, we’ve deployed an application in a primary Region and a backup Region. We have a Route 53 DNS record set with records for both Regions, and all traffic goes to the primary Region. In an event that triggers our DR plan, we manually or automatically switch the DNS records to direct all traffic to the backup Region.

Relying on an automated health check to control Regional failover can be tricky. A health check might not be perfectly reliable if a Region is experiencing some type of degradation. Often, we prefer to initiate our DR plan manually, which then initiates with automation.

What are the dependencies that we’ve baked into this failover plan? First, Route 53, our DNS service, has to be available. It must continue to serve DNS queries, and we have to be able to change DNS records manually. Second, if we do not have a full set of resources already deployed in the backup Region, we must be able to deploy resources into it.

Both dependencies might violate static stability, because we are relying on resources in our DR plan that might be affected by the outage we’re seeing. Ideally, we don’t want to depend on other services running so we can failover and continue to serve our own traffic. How do we reduce additional dependencies?

Static stability

Let’s look at our first dependency on Route 53 – control planes and data planes. Briefly, a control plane is used to configure resources, and the data plane delivers services (see Understanding Availability Needs for a more complete definition.)

The Route 53 data plane, which responds to DNS queries, is highly resilient across Regions. We can safely rely on it during the failure of any single Region. But let’s assume that for some reason we are not able to call on the Route 53 control plane.

Amazon Route 53 Application Recovery Controller (Route 53 ARC) was built to handle this scenario. It provisions a Route 53 health check that we can manually control with a Route 53 ARC routing control, and is a data plane operation. The Route 53 ARC data plane is highly resilient, using a cluster of five Regional endpoints. You can revise the health check if three of the five Regions are available.

Figure 1. Simple Regional failover scenario using Route 53 Application Recovery Controller

The second dependency, being able to deploy resources into the second Region, is not a concern if we run a fully scaled-out set of resources. We must make sure that our deployment mechanism doesn’t rely only on the primary Region. Most AWS services have Regional control planes, so this isn’t an issue.

The AWS Identity and Access Management (IAM) data plane is highly available in each Region, so you can authorize the creation of new resources as long as you’ve already defined the roles. Note: If you use federated authentication through an identity provider, you should test that the IdP does not itself have a dependency on another Region.

Testing your disaster recovery plan

Once we’ve identified our dependencies, we need to decide how to simulate a disaster scenario. Two mechanisms you can use for this are network access control lists (NACLs) and SCPs. The first one enables us to restrict network traffic to our service endpoints. However, the second allows defining policies that specify the maximum permissions for the target accounts. It also allows us to simulate a Route 53 or IAM control plane outage by restricting access to the service.

For the end-to-end DR simulation, we’ve published an AWS samples repository on GitHub that you can use to deploy. This evaluates Route 53 ARC capabilities if both Route 53 and IAM control planes aren’t accessible.

By deploying test applications across us-east-1 and us-west-1 AWS Regions, we can simulate a real-world scenario that determines the business continuity impact, failover timing, and procedures required for successful failover with unavailable control planes.

Figure 2. Simulating Regional failover using service control policies

Before you conduct the test outlined in our scenario, we strongly recommend that you create a dedicated AWS testing environment with an AWS Organizations setup. Make sure that you don’t attach SCPs to your organization’s root but instead create a dedicated organization unit (OU). You can use this pattern to test SCPs and ensure that you don’t inadvertently lock out users from key services.

Chaos engineering

Chaos engineering is the discipline of experimenting on a system to build confidence in its capability to withstand turbulent production conditions. Chaos engineering and its principles are important tools when you plan for disaster recovery. Even a simple distributed system may be too complex to operate reliably. It can be hard or impossible to plan for every failure scenario in non-trivial distributed systems, because of the number of failure permutations. Chaos experiments test these unknowns by injecting failures (for example, shutting down EC2 instances) or transient anomalies (for example, unusually high network latency.)

In the context of multi-Region DR, these techniques can help challenge assumptions and expose vulnerabilities. For example, what happens if a health check passes but the system itself is unhealthy, or vice versa? What will you do if your entire monitoring system is offline in your primary Region, or too slow to be useful? Are there control plane operations that you rely on that themselves depend on a single AWS Region’s health, such as Amazon Route 53? How does your workload respond when 25% of network packets are lost? Does your application set reasonable timeouts or does it hang indefinitely when it experiences large network latencies?

Questions like these can feel overwhelming, so start with a few, then test and iterate. You might learn that your system can run acceptably in a degraded mode. Alternatively, you might find out that you need to be able to failover quickly. Regardless of the results, the exercise of performing chaos experiments and challenging assumptions is critical when developing a robust multi-Region DR plan.

Conclusion

In this blog, you learned about reducing dependencies in your DR plan. We showed how you can use Amazon Route 53 Application Recovery Controller to reduce a dependency on the Route 53 control plane, and how to simulate a Regional failure using SCPs. As you evaluate your own DR plan, be sure to take advantage of chaos engineering practices. Formulate questions and test your static stability assumptions. And of course, you can incorporate these questions into a custom lens when you run a Well-Architected review using the AWS Well-Architected Tool.

[$] Supporting PGP keys and signatures in the kernel

2022-01-26

Post Syndicated from original https://lwn.net/Articles/882426/rss

A few weeks back, we looked at a proposal
to add an integrity-management feature to Fedora. One of the selling
points was that the integrity checking could be done using the PGP
signatures that are already embedded into the RPM package files that Fedora
uses. But the kernel needs to be able to verify PGP signatures in order
for the Fedora feature to work. That addition to the kernel has been proposed, but
some in the kernel-development community seem less than completely
enthusiastic about bringing PGP support into the kernel itself.

A new Polkit vulnerability

2022-01-26

Post Syndicated from original https://lwn.net/Articles/882609/rss

Qualys has announced
the disclosure of a local-root vulnerability in Polkit. They are calling
it “PwnKit” and have even provided a proof-of-concept video.

Successful exploitation of this vulnerability allows any
unprivileged user to gain root privileges on the vulnerable
host. Qualys security researchers have been able to independently
verify the vulnerability, develop an exploit, and obtain full root
privileges on default installations of Ubuntu, Debian, Fedora, and
CentOS. Other Linux distributions are likely vulnerable and
probably exploitable. This vulnerability has been hiding in plain
sight for 12+ years and affects all versions of pkexec since its
first version in May 2009.

Updates from distributors are already rolling out.

New – Replication for Amazon Elastic File System (EFS)

2022-01-26 Jeff Barr

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-replication-for-amazon-elastic-file-system-efs/

Amazon Elastic File System (Amazon EFS) allows EC2 instances, AWS Lambda functions, and containers to share access to a fully-managed file system. First announced in 2015 and generally available in 2016, Amazon EFS delivers low-latency performance for a wide variety of workloads and can scale to thousands of concurrent clients or connections. Since the 2016 launch we have continued to listen and to innovate, and have added many new features and capabilities in response to your feedback. These include on-premises access via Direct Connect (2016), encryption of data at rest (2017), provisioned throughput and encryption of data in transit (2018), an infrequent access storage class (2019), IAM authorization & access points (2020), lower-cost one zone storage classes (2021), and more.

Introducing Replication
Today I am happy to announce that you can now use replication to automatically maintain copies of your EFS file systems for business continuity or to help you to meet compliance requirements as part of your disaster recovery strategy. You can set this up in minutes for new or existing EFS file systems, with replication either within a single AWS region or between two AWS regions in the same AWS partition.

Once configured, replication begins immediately. All replication traffic stays on the AWS global backbone, and most changes are replicated within a minute, with an overall Recovery Point Objective (RPO) of 15 minutes for most file systems. Replication does not consume any burst credits and it does not count against the provisioned throughput of the file system.

Configuring Replication
To configure replication, I open the Amazon EFS Console , view the file system that I want to replicate, and select the Replication tab:

I click Create replication, choose the desired destination region, and select the desired storage (Regional or One Zone). I can use the default KMS key for encryption or I can choose another one. I review my settings and click Create replication to proceed:

Replication begins right away and I can see the new, read-only file system immediately:

A new CloudWatch metric, TimeSinceLastSync, is published when the initial replication is complete, and periodically after that:

The replica is created in the selected region. I create any necessary mount targets and mount the replica on an EC2 instance:

EFS tracks modifications to the blocks (currently 4 MB) that are used to store files and metadata, and replicates the changes at a rate of up to 300 MB per second. Because replication is block-based, it is not crash-consistent; if you need crash-consistency you may want to take a look at AWS Backup.

After I have set up replication, I can change the lifecycle management, intelligent tiering, throughput mode, and automatic backup setting for the destination file system. The performance mode is chosen when the file system is created, and cannot be changed.

Initiating a Fail-Over
If I need to fail over to the replica, I simply delete the replication. I can do this from either side (source or destination), by clicking Delete and confirming my intent:

I enter delete, and click Delete replication to proceed:

The former read-only replica is now a writable file system that I can use as part of my recovery process. To fail-back, I create a replica in the original location, wait for replication to finish, and delete the replication.

I can also use the command line and the EFS APIs to manage replication. For example:

createreplication-configuration / CreateReplicationConfiguration – Establish replication for an existing file system.

describe-replication-configurations / DescribeReplicationConfigurations – See the replication configuration for a source or destination file system, or for all replication configurations in an AWS account. The data returned for a destination file system also includes LastReplicatedTimestamp, the time of the last successful sync.

delete-replication-configuration / DeleteReplicationConfiguration – End replication for a file system.

Available Now
This new feature is available now and you can start using it today in the AWS US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Asia Pacific (Mumbai), Asia Pacific (Osaka), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), Europe (London), Europe (Paris), Europe (Stockholm), South America (São Paulo), and GovCloud Regions.

You pay the usual storage fees for the original and replica file systems and any applicable cross-region or intra-region data transfer charges.

— Jeff;

Validate streaming data over Amazon MSK using schemas in cross-account AWS Glue Schema Registry

2022-01-26 Vikas Bajaj

Post Syndicated from Vikas Bajaj original https://aws.amazon.com/blogs/big-data/validate-streaming-data-over-amazon-msk-using-schemas-in-cross-account-aws-glue-schema-registry/

Today’s businesses face an unprecedented growth in the volume of data. A growing portion of the data is generated in real time by IoT devices, websites, business applications, and various other sources. Businesses need to process and analyze this data as soon as it arrives to make business decisions in real time. Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed service that enables building and running stream processing applications that use Apache Kafka to collect and process data in real time.

Stream processing applications using Apache Kafka don’t communicate with each other directly; they communicate via sending and receiving messages over Kafka topics. For stream processing applications to communicate efficiently and confidently, a message payload structure must be defined in terms of attributes and data types. This structure describes the schema applications use when sending and receiving messages. However, with a large number of producer and consumer applications, even a small change in schema (removing a field, adding a new field, or change in data type) may cause issues for downstream applications that are difficult to debug and fix.

Traditionally, teams have relied on change management processes (such as approvals and maintenance windows) or other informal mechanisms (documentation, emails, collaboration tools, and so on) to inform one another of data schema changes. However, these mechanisms don’t scale and are prone to mistakes. The AWS Glue Schema Registry allows you to centrally publish, discover, control, validate, and evolve schemas for stream processing applications. With the AWS Glue Schema Registry, you can manage and enforce schemas on data streaming applications using Apache Kafka, Amazon MSK, Amazon Kinesis Data Streams, Amazon Kinesis Data Analytics for Apache Flink, and AWS Lambda.

This post demonstrates how Apache Kafka stream processing applications validate messages using an Apache Avro schema stored in the AWS Glue Schema registry residing in a central AWS account. We use the AWS Glue Schema Registry SerDe library and Avro SpecificRecord to validate messages in stream processing applications while sending and receiving messages from a Kafka topic on an Amazon MSK cluster. Although we use an Avro schema for this post, the same approach and concept applies to JSON schemas as well.

Use case

Let’s assume a fictitious rideshare company that offers unicorn rides. To draw actionable insights, they need to process a stream of unicorn ride request messages. They expect rides to be very popular and want to make sure their solution can scale. They’re also building a central data lake where all their streaming and operation data is stored for analysis. They’re customer obsessed, so they expect to add new fun features to future rides, like choosing the hair color of your unicorn, and will need to reflect these attributes in the ride request messages. To avoid issues in downstream applications due to future schema changes, they need a mechanism to validate messages with a schema hosted in a central schema registry. Having schemas in a central schema registry makes it easier for the application teams to publish, validate, evolve, and maintain schemas in a single place.

Solution overview

The company uses Amazon MSK to capture and distribute the unicorn ride request messages at scale. They define an Avro schema for unicorn ride requests because it provides rich data structures, supports direct mapping to JSON, as well as a compact, fast, and binary data format. Because the schema was agreed in advance, they decided to use Avro SpecificRecord.SpecificRecord is an interface from the Avro library that allows the use of an Avro record as a POJO. This is done by generating a Java class (or classes) from the schema, by using avro-maven-plugin. They use AWS Identity and Access Management (IAM) cross-account roles to allow producer and consumer applications from the other AWS account to safely and securely access schemas in the central Schema Registry account.

The AWS Glue Schema Registry is in Account B, whereas the MSK cluster and Kafka producer and consumer applications are in Account A. We use the following two IAM roles to enable cross-account access to the AWS Glue Schema Registry. Apache Kafka clients in Account A assume a role in Account B using an identity-based policy because the AWS Glue Schema Registry doesn’t support resource-based policies.

Account A IAM role – Allows producer and consumer applications to assume an IAM role in Account B.
Account B IAM role – Trusts all IAM principals from Account A and allows them to perform read actions on the AWS Glue Schema Registry in Account B. In a real use case scenario, IAM principals that can assume cross-account roles should be scoped more specifically.

The following architecture diagram illustrates the solution:

The solution works as follows:

A Kafka producer running in Account A assumes the cross-account Schema Registry IAM role in Account B by calling the AWS Security Token Service (AWS STS) assumeRole API.
The Kafka producer retrieves the unicorn ride request Avro schema version ID from the AWS Glue Schema Registry for the schema that’s embedded in the unicorn ride request POJO. Fetching the schema version ID is internally managed by the AWS Glue Schema Registry SerDe’s serializer. The serializer has to be configured as part of the Kafka producer configuration.
If the schema exists in the AWS Glue Schema Registry, the serializer decorates the data record with the schema version ID and then serializes it before delivering it to the Kafka topic on the MSK cluster.
The Kafka consumer running in Account A assumes the cross-account Schema Registry IAM role in Account B by calling the AWS STS assumeRole API.
The Kafka consumer starts polling the Kafka topic on the MSK cluster for data records.
The Kafka consumer retrieves the unicorn ride request Avro schema from the AWS Glue Schema Registry, matching the schema version ID that’s encoded in the unicorn ride request data record. Fetching the schema is internally managed by the AWS Glue Schema Registry SerDe’s deserializer. The deserializer has to be configured as part of the Kafka consumer configuration. If the schema exists in the AWS Glue Schema Registry, the deserializer deserializes the data record into the unicorn ride request POJO for the consumer to process it.

The AWS Glue Schema Registry SerDe library also supports optional compression configuration to save on data transfers. For more information about the Schema Registry, see How the Schema Registry works.

Unicorn ride request Avro schema

The following schema (UnicornRideRequest.avsc) defines a record representing a unicorn ride request, which contains ride request attributes along with the customer attributes and system-recommended unicorn attributes:

{
    "type": "record",
    "name": "UnicornRideRequest",
    "namespace": "demo.glue.schema.registry.avro",
    "fields": [
      {"name": "request_id", "type": "int", "doc": "customer request id"},
      {"name": "pickup_address","type": "string","doc": "customer pickup address"},
      {"name": "destination_address","type": "string","doc": "customer destination address"},
      {"name": "ride_fare","type": "float","doc": "ride fare amount (USD)"},
      {"name": "ride_duration","type": "int","doc": "ride duration in minutes"},
      {"name": "preferred_unicorn_color","type": {"type": "enum","name": "UnicornPreferredColor","symbols": ["WHITE","BLACK","RED","BLUE","GREY"]}, "default": "WHITE"},
      {
        "name": "recommended_unicorn",
        "type": {
          "type": "record",
          "name": "RecommendedUnicorn",
          "fields": [
            {"name": "unicorn_id","type": "int", "doc": "recommended unicorn id"},
            {"name": "color","type": {"type": "enum","name": "unicorn_color","symbols": ["WHITE","RED","BLUE"]}},
            {"name": "stars_rating", "type": ["null", "int"], "default": null, "doc": "unicorn star ratings based on customers feedback"}
          ]
        }
      },
      {
        "name": "customer",
        "type": {
          "type": "record",
          "name": "Customer",
          "fields": [
            {"name": "customer_account_no","type": "int", "doc": "customer account number"},
            {"name": "first_name","type": "string"},
            {"name": "middle_name","type": ["null","string"], "default": null},
            {"name": "last_name","type": "string"},
            {"name": "email_addresses","type": ["null", {"type":"array", "items":"string"}]},
            {"name": "customer_address","type": "string","doc": "customer address"},
            {"name": "mode_of_payment","type": {"type": "enum","name": "ModeOfPayment","symbols": ["CARD","CASH"]}, "default": "CARD"},
            {"name": "customer_rating", "type": ["null", "int"], "default": null}
          ]
        }
      }
    ]
  }

Prerequisites

To use this solution, you must have two AWS accounts:

Account A – For the MSK cluster, Kafka producer and consumer Amazon Elastic Compute Cloud (Amazon EC2) instances, and AWS Cloud9 environment
Account B – For the Schema Registry and schema

For this solution, we use Region us-east-1, but you can change this as per your requirements.

Next, we create the resources in each account using AWS CloudFormation templates.

Create resources in Account B

We create the following resources in Account B:

A schema registry
An Avro schema
An IAM role with the AWSGlueSchemaRegistryReadonlyAccess managed policy and an instance profile, which allows all Account A IAM principals to assume it
The UnicornRideRequest.avsc Avro schema shown earlier, which is used as a schema definition in the CloudFormation template

Make sure you have the appropriate permissions to create these resources.

Log in to Account B.
Launch the following CloudFormation stack.
For Stack name, enter SchemaRegistryStack.
For Schema Registry name, enter unicorn-ride-request-registry.
For Avro Schema name, enter unicorn-ride-request-schema-avro.
For the Kafka client’s AWS account ID, enter your Account A ID.
For ExternalId, enter a unique random ID (for example, demo10A), which should be provided by the Kafka clients in Account A while assuming the IAM role in this account.

For more information about cross-account security, see The confused deputy problem.

When the stack is complete, on the Outputs tab of the stack, copy the value for CrossAccountGlueSchemaRegistryRoleArn.

The Kafka producer and consumer applications created in Account A assume this role to access the Schema Registry and schema in Account B.

To verify the resources were created, on the AWS Glue console, choose Schema registries in the navigation bar, and locate unicorn-ride-request-registry.
Choose the registry unicorn-ride-request-registry and verify that it contains unicorn-ride-request-schema-avro in the Schemas section.
Choose the schema to see its content.

The IAM role created by the SchemaRegistryStack stack allows all Account A IAM principals to assume it and perform read actions on the AWS Glue Schema Registry. Let’s look at the trust relationships of the IAM role.

On the SchemaRegistryStack stack Outputs tab, copy the value for CrossAccountGlueSchemaRegistryRoleName.
On the IAM console, search for this role.
Choose Trust relationships and look at its trusted entities to confirm that Account A is listed.
In the Conditions section, confirm that sts:ExternalId has the same unique random ID provided during stack creation.

Create resources in Account A

We create the following resources in Account A:

A VPC
EC2 instances for the Kafka producer and consumer
An AWS Cloud9 environment
An MSK cluster

As a prerequisite, create an EC2 keypair and download it on your machine to be able to SSH into EC2 instances. Also create an MSK cluster configuration with default values. You need to have permissions to create the CloudFormation stack, EC2 instances, AWS Cloud9 environment, MSK cluster, MSK cluster configuration, and IAM role.

Log in to Account A.
Launch the following CloudFormation stack to launch the VPC, EC2 instances, and AWS Cloud9 environment.
For Stack name, enter MSKClientStack.
Provide the VPC and subnet CIDR ranges.
For EC2 Keypair, choose an existing EC2 keypair.
For the latest EC2 AMI ID, select the default option.
For the cross-account IAM role ARN, use the value for CrossAccountGlueSchemaRegistryRoleArn (available on the Outputs tab of SchemaRegistryStack).
Wait for the stack to create successfully.
Launch the following CloudFormation stack to create the MSK cluster.
For Stack name, enter MSKClusterStack.
Use Amazon MSK version 2.7.1.
For the MSK cluster configuration ARN, enter the MSK cluster configuration ARN. One that you created as part of the prerequisite.
For the MSK cluster configuration revision number, enter 1 or change it according to your version.
For the client CloudFormation stack name, enter MSKClientStack (the stack name that you created prior to this stack).

Configure the Kafka producer

To configure the Kafka producer accessing the Schema Registry in the central AWS account, complete the following steps:

Log in to Account A.
On the AWS Cloud9 console, choose the Cloud9EC2Bastion environment created by the MSKClientStack stack.
On the File menu, choose Upload Local Files.
Upload the EC2 keypair file that you used earlier while creating the stack.
Open a new terminal and change the EC2 keypair permissions:
```
chmod 0400 <keypair PEM file>
```

SSH into the KafkaProducerInstance EC2 instance and set the Region as per your requirement:

ssh -i <keypair PEM file> ec2-user@<KafkaProducerInstance Private IP address>
aws configure set region <region>

Set the environment variable MSK_CLUSTER_ARN pointing to the MSK cluster’s ARN:

export MSK_CLUSTER_ARN=$(aws kafka list-clusters |  jq '.ClusterInfoList[] | select (.ClusterName == "MSKClusterStack") | {ClusterArn} | join (" ")' | tr -d \")

Change the .ClusterName value in the code if you used a different name for the MSK cluster CloudFormation stack. The cluster name is the same as the stack name.

Set the environment variable BOOTSTRAP_BROKERS pointing to the bootstrap brokers:

export BOOTSTRAP_BROKERS=$(aws kafka get-bootstrap-brokers --cluster-arn $MSK_CLUSTER_ARN | jq -r .BootstrapBrokerString)

Verify the environment variables:

echo $MSK_CLUSTER_ARN
echo $BOOTSTRAP_BROKERS

Create a Kafka topic called unicorn-ride-request-topic in your MSK cluster, which is used by the Kafka producer and consumer applications later:

cd ~/kafka

./bin/kafka-topics.sh --bootstrap-server $BOOTSTRAP_BROKERS \
--topic unicorn-ride-request-topic \
--create --partitions 3 --replication-factor 2

./bin/kafka-topics.sh --bootstrap-server $BOOTSTRAP_BROKERS --list

The MSKClientStack stack copied the Kafka producer client JAR file called kafka-cross-account-gsr-producer.jar to the KafkaProducerInstance instance. It contains the Kafka producer client that sends messages to the Kafka topic unicorn-ride-request-topic on the MSK cluster and accesses the unicorn-ride-request-schema-avro Avro schema from the unicorn-ride-request-registry schema registry in Account B. The Kafka producer code, which we cover later in this post, is available on GitHub.

Run the following commands and verify kafka-cross-account-gsr-producer.jar exists:
```
cd ~
ls -ls
```

Run the following command to run the Kafka producer in the KafkaProducerInstance terminal:

java -jar kafka-cross-account-gsr-producer.jar -bs $BOOTSTRAP_BROKERS \
-rn <Account B IAM role arn that Kafka producer application needs to assume> \
-topic unicorn-ride-request-topic \
-reg us-east-1 \
-nm 500 \
-externalid <Account B IAM role external Id that you used while creating a CF stack in Account B>

The code has the following parameters:

-bs – $BOOTSTRAP_BROKERS (the MSK cluster bootstrap brokers)
-rn – The CrossAccountGlueSchemaRegistryRoleArn value from the SchemaRegistryStack stack outputs in Account B
-topic – the Kafka topic unicorn-ride-request-topic
-reg – us-east-1 (change it according to your Region, it’s used for the AWS STS endpoint and Schema Registry)
-nm: 500 (the number of messages the producer application sends to the Kafka topic)
-externalId – The same external ID (for example, demo10A) that you used while creating the CloudFormation stack in Account B

The following screenshot shows the Kafka producer logs showing Schema Version Id received..., which means it has retrieved the Avro schema unicorn-ride-request-schema-avro from Account B and messages were sent to the Kafka topic on the MSK cluster in Account A.

Kafka producer code

The complete Kafka producer implementation is available on GitHub. In this section, we break down the code.

getProducerConfig() initializes the producer properties, as shown in the following code:
- VALUE_SERIALIZER_CLASS_CONFIG – The GlueSchemaRegistryKafkaSerializer.class.getName() AWS serializer implementation that serializes data records (the implementation is available on GitHub)
- REGISTRY_NAME – The Schema Registry from Account B
- SCHEMA_NAME – The schema name from Account B
- AVRO_RECORD_TYPE – AvroRecordType.SPECIFIC_RECORD

private Properties getProducerConfig() {
        Properties props = new Properties();
        props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, this.bootstrapServers);
        props.put(ProducerConfig.ACKS_CONFIG, "-1");
        props.put(ProducerConfig.CLIENT_ID_CONFIG,"msk-cross-account-gsr-producer");
        props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, GlueSchemaRegistryKafkaSerializer.class.getName());
        props.put(AWSSchemaRegistryConstants.DATA_FORMAT, DataFormat.AVRO.name());
        props.put(AWSSchemaRegistryConstants.AWS_REGION,regionName);
        props.put(AWSSchemaRegistryConstants.REGISTRY_NAME, "unicorn-ride-request-registry");
        props.put(AWSSchemaRegistryConstants.SCHEMA_NAME, "unicorn-ride-request-schema-avro");
        props.put(AWSSchemaRegistryConstants.AVRO_RECORD_TYPE, AvroRecordType.SPECIFIC_RECORD.getName());
        return props;
}

startProducer() assumes the role in Account B to be able to connect with the Schema Registry in Account B and sends messages to the Kafka topic on the MSK cluster:

public void startProducer() {
        assumeGlueSchemaRegistryRole();
        KafkaProducer<String, UnicornRideRequest> producer = 
		new KafkaProducer<String,UnicornRideRequest>(getProducerConfig());
        int numberOfMessages = Integer.valueOf(str_numOfMessages);
        logger.info("Starting to send records...");
        for(int i = 0;i < numberOfMessages;i ++)
        {
            UnicornRideRequest rideRequest = getRecord(i);
            String key = "key-" + i;
            ProducerRecord<String, UnicornRideRequest> record = 
		new ProducerRecord<String, UnicornRideRequest>(topic, key, rideRequest);
            producer.send(record, new ProducerCallback());
        }
 }

assumeGlueSchemaRegistryRole() as shown in the following code uses AWS STS to assume the cross-account Schema Registry IAM role in Account B. (For more information, see Temporary security credentials in IAM.) The response from stsClient.assumeRole(roleRequest) contains the temporary credentials, which include accessKeyId, secretAccessKey, and a sessionToken. It then sets the temporary credentials in the system properties. The AWS SDK for Java uses these credentials while accessing the Schema Registry (through the Schema Registry serializer). For more information, see Using Credentials.

public void assumeGlueSchemaRegistryRole() {
        try {
	   Region region = Region.of(regionName);
            if(!Region.regions().contains(region))
                 throw new RuntimeException("Region : " + regionName + " is invalid.");
            StsClient stsClient = StsClient.builder().region(region).build();
            AssumeRoleRequest roleRequest = AssumeRoleRequest.builder()
                    .roleArn(this.assumeRoleARN)
                    .roleSessionName("kafka-producer-cross-account-glue-schemaregistry-demo")
	           .externalId(this.externalId)	
                    .build();
            AssumeRoleResponse roleResponse = stsClient.assumeRole(roleRequest);
            Credentials myCreds = roleResponse.credentials();
            System.setProperty("aws.accessKeyId", myCreds.accessKeyId());
            System.setProperty("aws.secretAccessKey", myCreds.secretAccessKey());
            System.setProperty("aws.sessionToken", myCreds.sessionToken());
            stsClient.close();
        } catch (StsException e) {
            logger.error(e.getMessage());
            System.exit(1);
        }
    }

createUnicornRideRequest() uses the Avro schema (unicorn ride request schema) generated classes to create a SpecificRecord. For this post, the unicorn ride request attributes values are hard-coded in this method. See the following code:

public UnicornRideRequest getRecord(int requestId){
            /*
             Initialise UnicornRideRequest object of
             class that is generated from AVRO Schema
             */
           UnicornRideRequest rideRequest = UnicornRideRequest.newBuilder()
            .setRequestId(requestId)
            .setPickupAddress("Melbourne, Victoria, Australia")
            .setDestinationAddress("Sydney, NSW, Aus")
            .setRideFare(1200.50F)
            .setRideDuration(120)
            .setPreferredUnicornColor(UnicornPreferredColor.WHITE)
            .setRecommendedUnicorn(RecommendedUnicorn.newBuilder()
                    .setUnicornId(requestId*2)
                    .setColor(unicorn_color.WHITE)
                    .setStarsRating(5).build())
            .setCustomer(Customer.newBuilder()
                    .setCustomerAccountNo(1001)
                    .setFirstName("Dummy")
                    .setLastName("User")
                    .setEmailAddresses(Arrays.asList("[email protected]"))
                    .setCustomerAddress("Flinders Street Station")
                    .setModeOfPayment(ModeOfPayment.CARD)
                    .setCustomerRating(5).build()).build();
            logger.info(rideRequest.toString());
            return rideRequest;
    }

Configure the Kafka consumer

The MSKClientStack stack created the KafkaConsumerInstance instance for the Kafka consumer application. You can view all the instances created by the stack on the Amazon EC2 console.

To configure the Kafka consumer accessing the Schema Registry in the central AWS account, complete the following steps:

Open a new terminal in the Cloud9EC2Bastion AWS Cloud9 environment.

SSH into the KafkaConsumerInstance EC2 instance and set the Region as per your requirement:

ssh -i <keypair PEM file> ec2-user@<KafkaConsumerInstance Private IP address>
aws configure set region <region>

Set the environment variable MSK_CLUSTER_ARN pointing to the MSK cluster’s ARN:

export MSK_CLUSTER_ARN=$(aws kafka list-clusters |  jq '.ClusterInfoList[] | select (.ClusterName == "MSKClusterStack") | {ClusterArn} | join (" ")' | tr -d \")

Change the .ClusterName value if you used a different name for the MSK cluster CloudFormation stack. The cluster name is the same as the stack name.

Set the environment variable BOOTSTRAP_BROKERS pointing to the bootstrap brokers:

export BOOTSTRAP_BROKERS=$(aws kafka get-bootstrap-brokers --cluster-arn $MSK_CLUSTER_ARN | jq -r .BootstrapBrokerString)

Verify the environment variables:

echo $MSK_CLUSTER_ARN
echo $BOOTSTRAP_BROKERS

The MSKClientStack stack copied the Kafka consumer client JAR file called kafka-cross-account-gsr-consumer.jar to the KafkaConsumerInstance instance. It contains the Kafka consumer client that reads messages from the Kafka topic unicorn-ride-request-topic on the MSK cluster and accesses the unicorn-ride-request-schema-avro Avro schema from the unicorn-ride-request-registry registry in Account B. The Kafka consumer code, which we cover later in this post, is available on GitHub.

Run the following commands and verify kafka-cross-account-gsr-consumer.jar exists:
```
cd ~
ls -ls
```

Run the following command to run the Kafka consumer in the KafkaConsumerInstance terminal:

java -jar kafka-cross-account-gsr-consumer.jar -bs $BOOTSTRAP_BROKERS \
-rn <Account B IAM role arn that Kafka consumer application needs to assume> \
-topic unicorn-ride-request-topic \
-reg us-east-1 \
-externalid <Account B IAM role external Id that you used while creating a CF stack in Account B>

The code has the following parameters:

-bs – $BOOTSTRAP_BROKERS (the MSK cluster bootstrap brokers)
-rn – The CrossAccountGlueSchemaRegistryRoleArn value from the SchemaRegistryStack stack outputs in Account B
-topic – The Kafka topic unicorn-ride-request-topic
-reg – us-east-1 (change it according to your Region, it’s used for the AWS STS endpoint and Schema Registry)
-externalId – The same external ID (for example, demo10A) that you used while creating the CloudFormation stack in Account B

The following screenshot shows the Kafka consumer logs successfully reading messages from the Kafka topic on the MSK cluster in Account A and accessing the Avro schema unicorn-ride-request-schema-avro from the unicorn-ride-request-registry schema registry in Account B.

If you see the similar logs, it means both the Kafka consumer applications have been able to connect successfully with the centralized Schema Registry in Account B and are able to validate messages while sending and consuming messages from the MSK cluster in Account A.

Kafka consumer code

The complete Kafka consumer implementation is available on GitHub. In this section, we break down the code.

getConsumerConfig() initializes consumer properties, as shown in the following code:
- VALUE_DESERIALIZER_CLASS_CONFIG – The GlueSchemaRegistryKafkaDeserializer.class.getName() AWS deserializer implementation that deserializes the SpecificRecord as per the encoded schema ID from the Schema Registry (the implementation is available on GitHub).
- AVRO_RECORD_TYPE – AvroRecordType.SPECIFIC_RECORD

private Properties getConsumerConfig() {
        Properties props = new Properties();
        props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, this.bootstrapServers);
        props.put(ConsumerConfig.GROUP_ID_CONFIG, "unicorn.riderequest.consumer");
        props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"earliest");
        props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, GlueSchemaRegistryKafkaDeserializer.class.getName());
        props.put(AWSSchemaRegistryConstants.AWS_REGION, regionName);
        props.put(AWSSchemaRegistryConstants.AVRO_RECORD_TYPE, AvroRecordType.SPECIFIC_RECORD.getName());
        return props;
}

startConsumer() assumes the role in Account B to be able to connect with the Schema Registry in Account B and reads messages from the Kafka topic on the MSK cluster:

public void startConsumer() {
  logger.info("starting consumer...");
  assumeGlueSchemaRegistryRole();
  KafkaConsumer<String, UnicornRideRequest> consumer = new KafkaConsumer<String, UnicornRideRequest>(getConsumerConfig());
  consumer.subscribe(Collections.singletonList(topic));
  int count = 0;
  while (true) {
            final ConsumerRecords<String, UnicornRideRequest> records = consumer.poll(Duration.ofMillis(1000));
            for (final ConsumerRecord<String, UnicornRideRequest> record : records) {
                final UnicornRideRequest rideRequest = record.value();
                logger.info(String.valueOf(rideRequest.getRequestId()));
                logger.info(rideRequest.toString());
            }
        }
}

assumeGlueSchemaRegistryRole() as shown in the following code uses AWS STS to assume the cross-account Schema Registry IAM role in Account B. The response from stsClient.assumeRole(roleRequest) contains the temporary credentials, which include accessKeyId, secretAccessKey, and a sessionToken. It then sets the temporary credentials in the system properties. The SDK for Java uses these credentials while accessing the Schema Registry (through the Schema Registry serializer). For more information, see Using Credentials.

public void assumeGlueSchemaRegistryRole() {
        try {
	Region region = Region.of(regionName);
            if(!Region.regions().contains(region))
                 throw new RuntimeException("Region : " + regionName + " is invalid.");
            StsClient stsClient = StsClient.builder().region(region).build();
            AssumeRoleRequest roleRequest = AssumeRoleRequest.builder()
                    .roleArn(this.assumeRoleARN)
                    .roleSessionName("kafka-consumer-cross-account-glue-schemaregistry-demo")
                    .externalId(this.externalId)
                    .build();
            AssumeRoleResponse roleResponse = stsClient.assumeRole(roleRequest);
            Credentials myCreds = roleResponse.credentials();
            System.setProperty("aws.accessKeyId", myCreds.accessKeyId());
            System.setProperty("aws.secretAccessKey", myCreds.secretAccessKey());
            System.setProperty("aws.sessionToken", myCreds.sessionToken());
            stsClient.close();
        } catch (StsException e) {
            logger.error(e.getMessage());
            System.exit(1);
        }
    }

Compile and generate Avro schema classes

Like any other part of building and deploying your application, schema compilation and the process of generating Avro schema classes should be included in your CI/CD pipeline. There are multiple ways to generate Avro schema classes; we use avro-maven-plugin for this post. The CI/CD process can also use avro-tools to compile Avro schema to generate classes. The following code is an example of how you can use avro-tools:

java -jar /path/to/avro-tools-1.10.2.jar compile schema <schema file> <destination>

//compiling unicorn_ride_request.avsc
java -jar avro-tools-1.10.2.jar compile schema unicorn_ride_request.avsc .

Implementation overview

To recap, we start with defining and registering an Avro schema for the unicorn ride request message in the AWS Glue Schema Registry in Account B, the central data lake account. In Account A, we create an MSK cluster and Kafka producer and consumer EC2 instances with their respective application code (kafka-cross-account-gsr-consumer.jar and kafka-cross-account-gsr-producer.jar) and deployed in them using the CloudFormation stack.

When we run the producer application in Account A, the serializer (GlueSchemaRegistryKafkaSerializer) from the AWS Glue Schema Registry SerDe library provided as the configuration gets the unicorn ride request schema (UnicornRideRequest.avsc) from the central Schema Registry residing in Account B to serialize the unicorn ride request message. It uses the IAM role (temporary credentials) in Account B and Region, schema registry name (unicorn-ride-request-registry), and schema name (unicorn-ride-request-schema-avro) provided as the configuration to connect to the central Schema Registry. After the message is successfully serialized, the producer application sends it to the Kafka topic (unicorn-ride-request-topic) on the MSK cluster.

When we run the consumer application in Account A, the deserializer (GlueSchemaRegistryKafkaDeserializer) from the Schema Registry SerDe library provided as the configuration extracts the encoded schema ID from the message read from the Kafka topic (unicorn-ride-request-topic) and gets the schema for the same ID from the central Schema Registry in Account B. It then deserializes the message. It uses the IAM role (temporary credentials) in Account B and the Region provided as the configuration to connect to the central Schema Registry. The consumer application also configures Avro’s SPECIFIC_RECORD to inform the deserializer that the message is of a specific type (unicorn ride request). After the message is successfully deserialized, the consumer application processes it as per the requirements.

Clean up

The final step is to clean up. To avoid unnecessary charges, you should remove all the resources created by the CloudFormation stacks used for this post. The simplest way to do so is to delete the stacks. First delete the MSKClusterStack followed by MSKClientStack from Account A. Then delete the SchemaRegistryStack from Account B.

Conclusion

In this post, we demonstrated how to use AWS Glue Schema Registry with Amazon MSK and stream processing applications to validate messages using an Avro schema. We created a distributed architecture where the Schema Registry resides in a central AWS account (data lake account) and Kafka producer and consumer applications reside in a separate AWS account. We created an Avro schema in the schema registry in the central account to make it efficient for the application teams to maintain schemas in a single place. Because AWS Glue Schema Registry supports identity-based access policies, we used the cross-account IAM role to allow the Kafka producer and consumer applications running in a separate account to securely access the schema from the central account to validate messages. Because the Avro schema was agreed in advance, we used Avro SpecificRecord to ensure type safety at compile time and avoid runtime schema validation issues at the client side. The code used for this post is available on GitHub for reference.

To learn more about the services and resources in this solution, refer to AWS Glue Schema Registry, the Amazon MSK Developer Guide, the AWS Glue Schema Registry SerDe library, and IAM tutorial: Delegate access across AWS accounts using IAM roles.

About the Author

Vikas Bajaj is a Principal Solutions Architect at Amazon Web Service. Vikas works with digital native customers and advises them on technology architecture and modeling, and options and solutions to meet strategic business objectives. He makes sure designs and solutions are efficient, sustainable, and fit-for-purpose for current and future business needs. Apart from architecture and technology discussions, he enjoys watching and playing cricket.

How to automate AWS account creation with SSO user assignment

2022-01-25 Rafael Koike

Post Syndicated from Rafael Koike original https://aws.amazon.com/blogs/security/how-to-automate-aws-account-creation-with-sso-user-assignment/

Background

AWS Control Tower offers a straightforward way to set up and govern an Amazon Web Services (AWS) multi-account environment, following prescriptive best practices. AWS Control Tower orchestrates the capabilities of several other AWS services, including AWS Organizations, AWS Service Catalog, and AWS Single Sign-On (AWS SSO), to build a landing zone very quickly. AWS SSO is a cloud-based service that simplifies how you manage SSO access to AWS accounts and business applications using Security Assertion Markup Language (SAML) 2.0. You can use AWS Control Tower to create and provision new AWS accounts and use AWS SSO to assign user access to those newly-created accounts.

Some customers need to provision tens, if not hundreds, of new AWS accounts at one time and assign access to many users. If you are using AWS Control Tower, doing this requires that you provision an AWS account in AWS Control Tower, and then assign the user access to the AWS account in AWS SSO before moving to the next AWS account. This process adds complexity and time for administrators who manage the AWS environment while delaying users’ access to their AWS accounts.

In this blog post, we’ll show you how to automate creating multiple AWS accounts in AWS Control Tower, and how to automate assigning user access to the AWS accounts in AWS SSO, with the ability to repeat the process easily for subsequent batches of accounts. This solution simplifies the provisioning and assignment processes, while enabling automation for your AWS environment, and allows your builders to start using and experimenting on AWS more quickly.

Services used

This solution uses the following AWS services:

High level solution overview

Figure 1 shows the architecture and workflow of the batch AWS account creation and SSO assignment processes.

Figure 1: Batch AWS account creation and SSO assignment automation architecture and workflow

Before starting

This solution is configured to be deployed in the North Virginia Region (us-east-1). But you can change the CloudFormation template to run in any Region that supports all the services required in the solution.

AWS Control Tower Account Factory can take up to 25 minutes to create and provision a new account. During this time, you will be unable to use AWS Control Tower to perform actions such as creating an organizational unit (OU) or enabling a guardrail on an OU. As a recommendation, running this solution during a time period when you do not anticipate using AWS Control Tower’s features is best practice.

Collect needed information

Note: You must have already configured AWS Control Tower, AWS Organizations, and AWS SSO to use this solution.

Before deploying the solution, you need to first collect some information for AWS CloudFormation.

The required information you’ll need to gather in these steps is:

AWS SSO instance ARN
AWS SSO Identity Store ID
Admin email address
Amazon S3 bucket
AWS SSO user group ARN

Prerequisite information: AWS SSO instance ARN

From the web console

You can find this information under Settings in the AWS SSO web console as shown in Figure 2.

Figure 2: AWS SSO instance ARN

From the CLI

You can also get this information by running the following CLI command using AWS Command Line Interface (AWS CLI):

aws sso-admin list-instances

The output is similar to the following:

{
    "Instances": [
        {
        "InstanceArn": "arn:aws:sso:::instance/ssoins-abc1234567",
        "IdentityStoreId": "d-123456abcd"
        }
    ]
}

Make a note of the InstanceArn value from the output, as this will be used in the AWS SSO instance ARN.

Prerequisite information: AWS SSO Identity Store ID

This is available from either the web console or the CLI.

From the web console

You can find this information in the same screen as the AWS SSO Instance ARN, as shown in Figure 3.

Figure 3: AWS SSO identity store ID

From the CLI

To find this from the AWS CLI command aws sso-admin list-instances, use the IdentityStoreId from the second key-value pair returned.

Prerequisite information: Admin email address

The admin email address notified when a new AWS account is created.

This email address is used to receive notifications when a new AWS account is created.

Prerequisite information: S3 bucket

The name of the Amazon S3 bucket where the AWS account list CSV files will be uploaded to automate AWS account creation.

This globally unique bucket name will be used to create a new Amazon S3 Bucket, and the automation script will receive events from new objects uploaded to this bucket.

Prerequisite information: AWS SSO user group ARN

Go to AWS SSO > Groups and select the user group whose permission set you would like to assign to the new AWS account. Copy the Group ID from the selected user group. This can be a local AWS SSO user group, or a third-party identity provider-synced user group.

Note: For the AWS SSO user group, there is no AWS CLI equivalent; you need to use the AWS web console to collect this information.

Figure 4: AWS SSO user group ARN

Prerequisite information: AWS SSO permission set

The ARN of the AWS SSO permission set to be assigned to the user group.

From the web console

To view existing permission sets using the AWS SSO web console, go to AWS accounts > Permission sets. From there, you can see a list of permission sets and their respective ARNs.

Figure 5: AWS SSO permission sets list

You can also select the permission set name and from the detailed permission set window, copy the ARN of the chosen permission set. Alternatively, create your own unique permission set to be assigned to the intended user group.

Figure 6: AWS SSO permission set ARN

From the CLI

To get permission set information from the CLI, run the following AWS CLI command:

aws sso-admin list-permission-sets --instance-arn <SSO Instance ARN>

This command will return an output similar to this:

{
    "PermissionSets": [
    "arn:aws:sso:::permissionSet/ssoins-abc1234567/ps-1234567890abcdef",
    "arn:aws:sso:::permissionSet/ssoins-abc1234567/ps-abcdef1234567890"
    ]
}

If you can’t determine the details for your permission set from the output of the CLI shown above, you can get the details of each permission set by running the following AWS CLI command:

aws sso-admin describe-permission-set --instance-arn <SSO Instance ARN> --permission-set-arn <PermissionSet ARN>

The output will be similar to this:

{
    "PermissionSet": {
    "Name": "AWSPowerUserAccess",
    "PermissionSetArn": "arn:aws:sso:::permissionSet/ssoins-abc1234567/ps-abc123def4567890",
    "Description": "Provides full access to AWS services and resources, but does not allow management of Users and groups",
    "CreatedDate": "2020-08-28T11:20:34.242000-04:00",
    "SessionDuration": "PT1H"
    }
}

The output above lists the name and description of each permission set, which can help you identify which permission set ARN you will use.

Solution initiation

The solution steps are in two parts: the initiation, and the batch account creation and SSO assignment processes.

To initiate the solution

Log in to the management account as the AWS Control Tower administrator, and deploy the provided AWS CloudFormation stack with the required parameters filled out.

Note: To fill out the required parameters of the solution, refer to steps 1 to 6 of the To launch the AWS CloudFormation stack procedure below.
When the stack is successfully deployed, it performs the following actions to set up the batch process. It creates:
- The S3 bucket where you will upload the AWS account list CSV file.
- A DynamoDB table. This table tracks the AWS account creation status.
- A Lambda function, NewAccountHandler.
- A Lambda function, CreateManagedAccount. This function is triggered by the entries in the Amazon DynamoDB table and initiates the batch account creation process.
- An Amazon CloudWatch Events rule to detect the AWS Control Tower CreateManagedAccount lifecycle event.
- Another Lambda function, CreateAccountAssignment. This function is triggered by AWS Control Tower Lifecycle Events via Amazon CloudWatch Events to assign the AWS SSO Permission Set to the specified User Group and AWS account

To create the AWS Account list CSV file

After you deploy the solution stack, you need to create a CSV file based on this sample.csv and upload it to the Amazon S3 bucket created in this solution. This CSV file will be used to automate the new account creation process.

CSV file format

The CSV file must follow the following format:

AccountName,SSOUserEmail,AccountEmail,SSOUserFirstName,SSOUserLastName,OrgUnit,Status,AccountId,ErrorMsg
Test-account-1,[email protected],[email protected],Fname-1,Lname-1,Test-OU-1,,,
Test-account-2,[email protected],[email protected],Fname-2,Lname-2,Test-OU-2,,,
Test-account-3,[email protected],[email protected],Fname-3,Lname-3,Test-OU-1,,,

Where the first line is the column names, and each subsequent line contains the new AWS accounts that you want to create and automatically assign that SSO user group to the permission set.

CSV fields

AccountName:	String between 1 and 50 characters [a-zA-Z0-9_-]
SSOUserEmail:	String with more than seven characters and be a valid email address for the primary AWS Administrator of the new AWS account
AccountEmail:	String with more than seven characters and be a valid email address not used by other AWS accounts
SSOUserFirstName:	String with the first name of the primary AWS Administrator of the new AWS account
SSOUserLastName:	String with the last name of the primary AWS Administrator of the new AWS account
OrgUnit:	String and must be an existing AWS Organizations OrgUnit
Status:	String, for future use
AccountId:	String, for future use
ErrorMsg:	String, for future use

Figure 7 shows the details that are included in our example for the two new AWS accounts that will be created.

Figure 7: Sample AWS account list CSV

The NewAccountHandler function is triggered from an object upload into the Amazon S3 bucket, validates the input file entries, and uploads the validated input file entries to the Amazon DynamoDB table.
The CreateManagedAccount function queries the DynamoDB table to get the details of the next account to be created. If there is another account to be created, then the batch account creation process moves on to Step 4, otherwise it completes.
The CreateManagedAccount function launches the AWS Control Tower Account Factory product in AWS Service Catalog to create and provision a new account.
After Account Factory has completed the account creation workflow, it generates the CreateManagedAccount lifecycle event, and the event log states if the workflow SUCCEEDED or FAILED.
The CloudWatch Events rule detects the CreateManagedAccount AWS Control Tower Lifecycle Event, and triggers the CreateManagedAccount and CreateAccountAssignment functions, and sends email notification to the administrator via AWS SNS.
The CreateManagedAccount function updates the Amazon DynamoDB table with the results of the AWS account creation workflow. If the account was successfully created, it updates the input file entry in the Amazon DynamoDB table with the account ID; otherwise, it updates the entry in the table with the appropriate failure or error reason.
The CreateAccountAssignment function assigns the AWS SSO Permission Set with the appropriate AWS IAM policies to the User Group specified in the Parameters when launching the AWS CloudFormation stack.
When the Amazon DynamoDB table is updated, the Amazon DynamoDB stream triggers the CreateManagedAccount function for subsequent AWS accounts or when new AWS account list CSV files are updated, then steps 1-9 are repeated.

Upload the CSV file

Once the AWS account list CSV file has been created, upload it into the Amazon S3 bucket created by the stack.

Deploying the solution

To launch the AWS CloudFormation stack

Now that all the requirements and the specifications to run the solution are ready, you can launch the AWS CloudFormation stack:

Open the AWS CloudFormation launch wizard in the console.
In the Create stack page, choose Next.

Figure 8: Create stack in CloudFormation
On the Specify stack details page, update the default parameters to use the information you captured in the prerequisites as shown in Figure 9, and choose Next.

Figure 9: Input parameters into AWS CloudFormation
On the Configure stack option page, choose Next.
On the Review page, check the box “I acknowledge that AWS CloudFormation might create IAM resources.” and choose Create Stack.
Once the AWS CloudFormation stack has completed, go to the Amazon S3 web console and select the Amazon S3 bucket that you defined in the AWS CloudFormation stack.
Upload the AWS account list CSV file with the information to create new AWS accounts. See To create the AWS Account list CSV file above for details on creating the CSV file.

Workflow and solution details

When a new file is uploaded to the Amazon S3 bucket, the following actions occur:

When you upload the AWS account list CSV file to the Amazon S3 bucket, the Amazon S3 service triggers an event for newly uploaded objects that invokes the Lambda function NewAccountHandler.
This Lambda function executes the following steps:
- Checks whether the Lambda function was invoked by an Amazon S3 event, or the CloudFormation CREATE event.
- If the event is a new object uploaded from Amazon S3, read the object.
- Validate the content of the CSV file for the required columns and values.
- If the data has a valid format, insert a new item with the data into the Amazon DynamoDB table, as shown in Figure 10 below.
  
  Figure 10: DynamoDB table items with AWS accounts details
- Amazon DynamoDB is configured to initiate the Lambda function CreateManagedAccount when insert, update, or delete items are initiated.
- The Lambda function CreateManagedAccount checks for update event type. When an item is updated in the table, this item is checked by the Lambda function, and if the AWS account is not created, the Lambda function invokes the AWS Control Tower Account Factory from the AWS Service Catalog to create a new AWS account with the details stored in the Amazon DynamoDB item.
- AWS Control Tower Account Factory starts the AWS account creation process. When the account creation process completes, the status of Account Factory will show as Available in Provisioned products, as shown in Figure 11.
  
  Figure 11: AWS Service Catalog provisioned products for AWS account creation
- Based on the Control Tower lifecycle events, the CreateAccountAssignment Lambda function will be invoked when the CreateManagedAccount event is sent to CloudWatch Events. An AWS SNS topic is also triggered to send an email notification to the administrator email address as shown in Figure 12 below.
  
  Figure 12: AWS email notification when account creation completes
- When invoked, the Lambda function CreateAccountAssignment assigns the AWS SSO user group to the new AWS account with the permission set defined in the AWS CloudFormation stack.
  
  Figure 13: New AWS account showing user groups with permission sets assigned

Figure 13 above shows the new AWS account with the user groups and the assigned permission sets. This completes the automation process. The AWS SSO users that are part of the user group will automatically be allowed to access the new AWS account with the defined permission set.

Handling common sources of error

This solution connects multiple components to facilitate the new AWS account creation and AWS SSO permission set assignment. The correctness of the parameters in the AWS CloudFormation stack is important to make sure that when AWS Control Tower creates a new AWS account, it is accessible.

To verify that this solution works, make sure that the email address is a valid email address, you have access to that email, and it is not being used for any existing AWS account. After a new account is created, it is not possible to change its root account email address, so if you input an invalid or inaccessible email, you will need to create a new AWS account and remove the invalid account.

You can view common errors by going to AWS Service Catalog web console. Under Provisioned products, you can see all of your AWS Control Tower Account Factory-launched AWS accounts.

Figure 14: AWS Service Catalog provisioned product with error

Selecting Error under the Status column shows you the source of the error. Figure 15 below is an example of the source of the error:

Figure 15: AWS account creation error explanation

Conclusion

In this post, we’ve shown you how to automate batch creation of AWS accounts in AWS Control Tower and batch assignment of user access to AWS accounts in AWS SSO. When the batch AWS accounts creation and AWS SSO user access assignment processes are complete, the administrator will be notified by emails from AWS SNS. We’ve also explained how to handle some common sources of errors and how to avoid them.

As you automate the batch AWS account creation and user access assignment, you can reduce the time you spend on the undifferentiated heavy lifting work, and onboard your users in your organization much more quickly, so they can start using and experimenting on AWS right away.

To learn more about the best practices of setting up an AWS multi-account environment, check out this documentation for more information.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Cinema Cafe: Dakota Johnson & Keke Palmer with Hannah Giorgis

2022-01-25 The Atlantic

Post Syndicated from The Atlantic original https://www.youtube.com/watch?v=b4CrP_8djII

Cinema Cafe: Karen Gillan & Emma Thompson with Shirley Li

2022-01-25 The Atlantic

Post Syndicated from The Atlantic original https://www.youtube.com/watch?v=47j9H81kFEQ

Anker PowerConf H700 – Is This The Best Business Headset?

2022-01-25 Crosstalk Solutions

Post Syndicated from Crosstalk Solutions original https://www.youtube.com/watch?v=LixuNwKGdTU

Backup Solutions for Dentist Offices

2022-01-25 Molly Clancy

Post Syndicated from Molly Clancy original https://www.backblaze.com/blog/backup-solutions-for-dentist-offices/

On top of providing excellent care to patients, dental practices today are tasked with the care of ever more complex IT solutions. Complying with regulations like HIPAA, protecting patient health records, and managing stores of data from X-rays to insurance information are among the demands that dental practices have to meet.

Whether you outsource these tasks to a managed service provider (MSP) or you manage your data infrastructure in house with network attached storage (NAS) or other hardware, understanding backup best practices and the different options available to help you manage your practice’s data is important for your continued success.

Keeping your data safe and accessible doesn’t have to be complicated or expensive. In this post, learn more about records retention for dental offices and how you can implement some simple strategies to keep data safe and protected, including 3-2-1 backups, common NAS devices, and insight from an MSP that specializes in IT services specifical for dental practices.

How Long Should a Dental Office Keep Records?

When thinking about backup and data storage solutions for your dental practice, it helps to first have a good understanding of the records retention requirements for dental offices. The best way to understand how long a dental office should keep records is to check with your state board of dentistry. Regulations on records retention vary by state and by patient type.

Retaining records for at least five to seven years is good practice, but some states will require longer retention periods of up to 10 years. Specific types of patients, including minors, may have different retention periods.

Regardless of your state regulations, records must be kept for five years for patients who receive Medicare or Medicaid. If your state regulations are less than five years, plan to retain records longer for these patients.

Finally, it’s good practice to keep all records for patients with whom you’re involved in any kind of legal dispute until the dispute is settled.

What Is the HIPAA Regulation for Storage of Dental Records?

HIPAA does not govern how long medical or dental records must be retained, but it does govern how long HIPAA-related documentation must be retained. Any HIPAA-related documentation, including things like policies, procedures, authorization forms, etc., must be retained for six years according to guidance in HIPAA policy § 164.316(b)(2)(i) on time limits. Some states may have longer or shorter retention periods. If shorter, HIPAA supersedes state regulations.

How Long Does a Dental Office Need to Keep Insurance EOBs?

Explanations of benefits or EOBs are documents from insurance providers that explain the amounts insurance will pay for services. Retention periods for these documents vary by state as well, so check with your state dental board to see how long you should keep them. Additionally, insurance providers may stipulate how long records must be kept. As a general rule of thumb, the longer retention period supersedes others. The best advice—err on the side of caution and keep records for the longest retention period required by either state or federal law. Fortunately, cloud storage provides you with a simple, affordable way to ensure your retention periods meet or exceed requirements.

3-2-1 Backup Strategy

Understanding how long you need to keep records is the first step in structuring your dental practice’s backup plan. The second is understanding what a good backup strategy looks like. The 3-2-1 backup strategy is a tried and true method for protecting data. It means keeping at least three copies of your data on two different media (i.e. devices) with at least one off-site, generally in the cloud. For a dental practice, we can use a simple X-ray file as an example. That file should live on two different devices on-premises, let’s say a machine reserved for storing X-rays which backs up to a NAS device. That’s two copies. If you then back your NAS device up to cloud storage, that’s your third, off-site copy.

The Benefits of Backing Up Your Dental Practice

Why do you need that many copies, you might ask. There are some tried and true benefits that make a strong case for using a 3-2-1 strategy rather than hoping for the best with fewer copies of your data.

Fast access to files. When you accidentally delete a file, you can restore it quickly from either your on-site or cloud backup. And if you need a file while you’re away from your desk, you can simply log in to your cloud backup and access it immediately.
Quick recoveries from computer crashes. Keeping one copy on-site means you can quickly restore files if one of your machines crashes. You can start up another computer and get immediate access, or you can restore all of the files to a replacement computer.
Reliable recoveries from damage and disaster. Floods, fires, and other disasters do happen. With a copy off-site, your data is one less thing you have to worry about in that unfortunate event. You can access your files remotely if needed and restore them completely when you are able.
Safe recoveries from ransomware attacks. After hearing about so many major ransomware attacks in the news this past year, you might be surprised to know that most attacks are carried out on small to medium-sized businesses. Keeping an off-site copy in the cloud, especially if you take advantage of features like Object Lock, can better prepare you to recover from a ransomware attack.
Compliance with regulatory requirements. As mentioned above, dental practices are subject to retention regulations. Using a cloud backup solution that offers AES encryption helps your practice achieve compliance.

Using NAS for Dental Practices

NAS is essentially a computer connected to a network that provides file-based data storage services to other devices on the network. The primary strength of NAS is how simple it is to set up and deploy.

NAS is frequently the next step up for a small business that is using external hard drives or direct attached storage, which can be especially vulnerable to drive failure. Moving up to NAS offers businesses like dental practices a number of benefits, including:

The ability to share files locally and remotely.
24/7 file availability.
Data redundancy.
Integrations with cloud storage that provides a location for necessary automatic data backups.

If you’re interested in upgrading to NAS, check out our Complete NAS Guide for advice on provisioning the right NAS for your needs and getting the most out of it after you buy it.

Hybrid Cloud Strategy for Dental Practices: NAS + Cloud Storage

Most NAS devices come with cloud storage integrations that enable businesses to adopt a hybrid cloud strategy for their data. A hybrid cloud strategy uses a private cloud and public cloud in combination. To expand on that a bit, a hybrid cloud refers to a cloud environment made up of a mixture of typically on-premises, private cloud resources combined with third-party public cloud resources that use some kind of orchestration between them. In this case, your NAS device serves as the on-premises private cloud, as it’s dedicated to only you or your organization, and then you connect it to the public cloud.

Some cloud providers are already integrated with NAS systems. (Backblaze B2 Cloud Storage is integrated with NAS systems from Synology and QNAP, for example.) Check if your preferred NAS system is already integrated with a cloud storage provider to ensure setting up cloud backup, storage, and sync is as easy as possible.

Your NAS should come with a built-in backup manager, like Hyper Backup from Synology or Hybrid Backup Sync from QNAP. Once you download and install the appropriate backup manager app, you can configure it to send backups to your preferred cloud provider. You can also fine-tune the behavior of the backup jobs, including what gets backed up and how often.

Now, you can send backups to the cloud as a third, off-site backup and use your cloud instance to access files anywhere in the world with an internet connection.

Using an MSP for Dental Practices

Many dental practices choose to outsource some or all IT services to an MSP. Making the decision of whether or not to hire an MSP will depend on your individual circumstances and comfort level. Either way, coming to the conversation with an understanding of your backup needs and the cloud backup landscape can help.

Nate Smith, Technical Project Manager at DTC, is responsible for backing up 6,000+ endpoints on 500+ servers at more than 450 dental and doctor’s offices in the mid-Atlantic region. He explained that, due to the sheer number of objects most dentists need to restore (e.g., hundreds of thousands of X-rays), the cost of certain cloud providers can be prohibitive. “If you need something and you need it fast, Amazon Glacier will hit you hard,” he said, referring to the service’s warming fees and retrieval costs.

When seeking out an MSP, make sure to ask about the cloud provider they’re using and how they charge for storage and data transfer. And if you’re not using an MSP, compare costs from different cloud providers to make sure you’re getting the most for your investment in backing up your data.

Cloud Storage and Your Dental Practice

Whether you’re managing your data infrastructure in house with NAS or other hardware, or you’re planning to outsource your IT needs to an MSP, cloud storage should be part of your backup strategy. To recap, having a third copy of your data off-site in the cloud gives you a number of benefits, including:

Fast access to your files.
Quick recoveries from computer crashes.
Reliable recoveries from natural disasters and theft.
Protection from ransomware.
Compliance with regulatory requirements.

Have questions about choosing a cloud storage provider to back up your dental practice? Let us know in the comments. Ready to get started? Click here to get your first 10GB free with Backblaze B2.

The post Backup Solutions for Dentist Offices appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

11 Years on YouTube – FLASHBACK

2022-01-25 Matt Granger

Post Syndicated from Matt Granger original https://www.youtube.com/watch?v=6CpAlsbdXJk

How to use tokenization to improve data security and reduce audit scope

2022-01-25 Tim Winston

Post Syndicated from Tim Winston original https://aws.amazon.com/blogs/security/how-to-use-tokenization-to-improve-data-security-and-reduce-audit-scope/

Tokenization of sensitive data elements is a hot topic, but you may not know what to tokenize, or even how to determine if tokenization is right for your organization’s business needs. Industries subject to financial, data security, regulatory, or privacy compliance standards are increasingly looking for tokenization solutions to minimize distribution of sensitive data, reduce risk of exposure, improve security posture, and alleviate compliance obligations. This post provides guidance to determine your requirements for tokenization, with an emphasis on the compliance lens given our experience as PCI Qualified Security Assessors (PCI QSA).

What is tokenization?

Tokenization is the process of replacing actual sensitive data elements with non-sensitive data elements that have no exploitable value for data security purposes. Security-sensitive applications use tokenization to replace sensitive data, such as personally identifiable information (PII) or protected health information (PHI), with tokens to reduce security risks.

De-tokenization returns the original data element for a provided token. Applications may require access to the original data, or an element of the original data, for decisions, analysis, or personalized messaging. To minimize the need to de-tokenize data and to reduce security exposure, tokens can retain attributes of the original data to enable processing and analysis using token values instead of the original data. Common characteristics tokens may retain from the original data are:

Format attributes

Length	for compatibility with storage and reports of applications written for the original data
Character set	for compatibility with display and data validation of existing applications
Preserved character positions	such as first 6 and last 4 for credit card PAN

Analytics attributes

Mapping consistency	where the same data always results in the same token
Sort order

Retaining functional attributes in tokens must be implemented in ways that do not defeat the security of the tokenization process. Using attribute preservation functions can possibly reduce the security of a specific tokenization implementation. Limiting the scope and access to tokens addresses limitations introduced when using attribute retention.

Why tokenize? Common use cases

I need to reduce my compliance scope

Tokens are generally not subject to compliance requirements if there is sufficient separation of the tokenization implementation and the applications using the tokens. Encrypted sensitive data may not reduce compliance obligations or scope. Such industry regulatory standards as PCI DSS 3.2.1 still consider systems that store, process, or transmit encrypted cardholder data as in-scope for assessment; whereas tokenized data may remove those systems from assessment scope. A common use case for PCI DSS compliance is replacing PAN with tokens in data sent to a service provider, which keeps the service provider from being subject to PCI DSS.

I need to restrict sensitive data to only those with a “need-to-know”

Tokenization can be used to add a layer of explicit access controls to de-tokenization of individual data items, which can be used to implement and demonstrate least-privileged access to sensitive data. For instances where data may be co-mingled in a common repository such as a data lake, tokenization can help ensure that only those with the appropriate access can perform the de-tokenization process and reveal sensitive data.

I need to avoid sharing sensitive data with my service providers

Replacing sensitive data with tokens before providing it to service providers who have no access to de-tokenize data can eliminate the risk of having sensitive data within service providers’ control, and avoid having compliance requirements apply to their environments. This is common for customers involved in the payment process, which provides tokenization services to merchants that tokenize the card holder data, and return back to their customers a token they can use to complete card purchase transactions.

I need to simplify data lake security and compliance

A data lake centralized repository allows you to store all your structured and unstructured data at any scale, to be used later for not-yet-determined analysis. Having multiple sources and data stored in multiple structured and unstructured formats creates complications for demonstrating data protection controls for regulatory compliance. Ideally, sensitive data should not be ingested at all; however, that is not always feasible. Where ingestion of such data is necessary, tokenization at each data source can keep compliance-subject data out of data lakes, and help avoid compliance implications. Using tokens that retain data attributes, such as data-to-token consistency (idempotence) can support many of the analytical capabilities that make it useful to store data in the data lake.

I want to allow sensitive data to be used for other purposes, such as analytics

Your organization may want to perform analytics on the sensitive data for other business purposes, such as marketing metrics, and reporting. By tokenizing the data, you can minimize the locations where sensitive data is allowed, and provide tokens to users and applications needing to conduct data analysis. This allows numerous applications and processes to access the token data and maintain security of the original sensitive data.

I want to use tokenization for threat mitigation

Using tokenization can help you mitigate threats identified in your workload threat model, depending on where and how tokenization is implemented. At the point where the sensitive data is tokenized, the sensitive data element is replaced with a non-sensitive equivalent throughout the data lifecycle, and across the data flow. Some important questions to ask are:

What are the in-scope compliance, regulatory, privacy, or security requirements for the data that will be tokenized?
When does the sensitive data need to be tokenized in order to meet security and scope reduction objectives?
What attack vector is being addressed for the sensitive data by tokenizing it?
Where is the tokenized data being hosted? Is it in a trusted environment or an untrusted environment?

For additional information on threat modeling, see the AWS security blog post How to approach threat modeling.

Tokenization or encryption consideration

Tokens can provide the ability to retain processing value of the data while still managing the data exposure risk and compliance scope. Encryption is the foundational mechanism for providing data confidentiality.

Encryption rarely results in cipher text with a similar format to the original data, and may prevent data analysis, or require consuming applications to adapt.

Your decision to use tokenization instead of encryption should be based on the following:

Reduction of compliance scope	As discussed above, by properly utilizing tokenization to obfuscate sensitive data you may be able to reduce the scope of certain framework assessments such as PCI DSS 3.2.1.
Format attributes	Used for compatibility with existing software and processes.
Analytics attributes	Used to support planned data analysis and reporting.
Elimination of encryption key management	A tokenization solution has one essential API—create token—and one optional API—retrieve value from token. Managing access controls can be simpler than some non-AWS native general purpose cryptographic key use policies. In addition, the compromise of the encryption key compromises all data encrypted by that key, both past and future. The compromise of the token database compromises only existing tokens.

Where encryption may make more sense

Although scope reduction, data analytics, threat mitigation, and data masking for the protection of sensitive data make very powerful arguments for tokenization, we acknowledge there may be instances where encryption is the more appropriate solution. Ask yourself these questions to gain better clarity on which solution is right for your company’s use case.

Scalability	If you require a solution that scales to large data volumes, and have the availability to leverage encryption solutions that require minimal key management overhead, such as AWS Key Management Services (AWS KMS), then encryption may be right for you.
Data format	If you need to secure data that is unstructured, then encryption may be the better option given the flexibility of encryption at various layers and formats.
Data sharing with 3rd parties	If you need to share sensitive data in its original format and value with a 3rd party, then encryption may be the appropriate solution to minimize external access to your token vault for de-tokenization processes.

What type of tokenization solution is right for your business?

When trying to decide which tokenization solution to use, your organization should first define your business requirements and use cases.

What are your own specific use cases for tokenized data, and what is your business goal? Identifying which use cases apply to your business and what the end state should be is important when determining the correct solution for your needs.
What type of data does your organization want to tokenize? Understanding what data elements you want to tokenize, and what that tokenized data will be used for may impact your decision about which type of solution to use.
Do the tokens need to be deterministic, the same data always producing the same token? Knowing how the data will be ingested or used by other applications and processes may rule out certain tokenization solutions.
Will tokens be used internally only, or will the tokens be shared across other business units and applications? Identifying a need for shared tokens may increase the risk of token exposure and, therefore, impact your decisions about which tokenization solution to use.
How long does a token need to be valid? You will need to identify a solution that can meet your use cases, internal security policies, and regulatory framework requirements.

Choosing between self-managed tokenization or tokenization as a service

Do you want to manage the tokenization within your organization, or use Tokenization as a Service (TaaS) offered by a third-party service provider? Some advantages to managing the tokenization solution with your company employees and resources are the ability to direct and prioritize the work needed to implement and maintain the solution, customizing the solution to the application’s exact needs, and building the subject matter expertise to remove a dependency on a third party. The primary advantages of a TaaS solution are that it is already complete, and the security of both tokenization and access controls are well tested. Additionally, TaaS inherently demonstrates separation of duties, because privileged access to the tokenization environment is owned by the tokenization provider.

Choosing a reversible tokenization solution

Do you have a business need to retrieve the original data from the token value? Reversible tokens can be valuable to avoid sharing sensitive data with internal or third-party service providers in payments and other financial services. Because the service providers are passed only tokens, they can avoid accepting additional security risk and compliance scope. If your company implements or allows de-tokenization, you will need to be able to demonstrate strict controls on the management and use of de-tokenization privilege. Eliminating the implementation of de-tokenization is the clearest way to demonstrate that downstream applications cannot have sensitive data. Given the security and compliance risks of converting tokenized data back into its original data format, this process should be highly monitored, and you should have appropriate alerting in place to detect each time this activity is performed.

Operational considerations when deciding on a tokenization solution

While operational considerations are outside the scope of this post, they are important factors for choosing a solution. Throughput, latency, deployment architecture, resiliency, batch capability, and multi-regional support can impact the tokenization solution of choice. Integration mechanisms with identity and access control and logging architectures, for example, are important for compliance controls and evidence creation.

No matter which deployment model you choose, the tokenization solution needs to meet security standards, similar to encryption standards, and must prevent determining what the original data is from the token values.

Conclusion

Using tokenization solutions to replace sensitive data offers many security and compliance benefits. These benefits include lowered security risk and smaller audit scope, resulting in lower compliance costs and a reduction in regulatory data handling requirements.

Your company may want to use sensitive data in new and innovative ways, such as developing personalized offerings that use predictive analysis and consumer usage trends and patterns, fraud monitoring and minimizing financial risk based on suspicious activity analysis, or developing business intelligence to improve strategic planning and business performance. If you implement a tokenization solution, your organization can alleviate some of the regulatory burden of protecting sensitive data while implementing solutions that use obfuscated data for analytics.

On the other hand, tokenization may also add complexity to your systems and applications, as well as adding additional costs to maintain those systems and applications. If you use a third-party tokenization solution, there is a possibility of being locked into that service provider due to the specific token schema they may use, and switching between providers may be costly. It can also be challenging to integrate tokenization into all applications that use the subject data.

In this post, we have described some considerations to help you determine if tokenization is right for you, what to consider when deciding which type of tokenization solution to use, and the benefits. disadvantages, and comparison of tokenization and encryption. When choosing a tokenization solution, it’s important for you to identify and understand all of your organizational requirements. This post is intended to generate questions your organization should answer to make the right decisions concerning tokenization.

You have many options available to tokenize your AWS workloads. After your organization has determined the type of tokenization solution to implement based on your own business requirements, explore the tokenization solution options available in AWS Marketplace. You can also build your own solution using AWS guides and blog posts. For further reading, see this blog post: Building a serverless tokenization solution to mask sensitive data.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon Security Assurance Services or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.