Goodbye, Pearl

Post Syndicated from Eevee original

Pearl laying on carpet, bathed in a sunbeam that highlights her peach fuzz

A Chronicling of the Lyfe and Times of one Miss Pearl Twig Woods, who has Passed at a Young Age from Troubles of the Heart. She is survived by Anise, her Arch Nemesis; Cheeseball, her Adoptive Ruffian; and Napoleon, her Star-Crossed Suitor for Whom she Longed from Afar.

Pearl is… difficult to describe. She had such a strong, vibrant personality.

She was lovely, that’s for certain. She loved everyone she met. And while various people — friends, vets, etc. — have met our cats and always liked them, I don’t believe anyone has met Pearl and not adored her. Anise will check out your stuff and perhaps jump on you; Cheeseball will do antics for you and rub on your leg; but Pearl would accept you into her life and be very directly, personally affectionate with you specifically. She made you feel special.

At the same time, she was very fussy, very particular, and had a very strong sense of… her place in the world, I suppose. If she liked something, she would be having it. If she didn’t like something, she would make that exceptionally clear. She was never mean, but she would be very vocal about her boundaries.

It wasn’t uncommon to wake up to Pearl repeatedly headbutting me right in the face, pressing her head up under my chin, or giving me a nuzzle with the entire length of her body, purring all the while. If she was happy to see you, she made an entire production out of it. It wasn’t just us; guests who slept on the couch also got the Pearl wake-up call.

It was also not uncommon to wake up because Pearl had decided that she needed my pillow, and somehow this very small cat took up the entire thing. I couldn’t move her; trying to displace her from a comfortable spot would generally earn you a sad, offended meow, after which you felt guilty for even having entertained the notion in the first place.

One of her particular quirks was to often “bury” her food when she was done with it, or at least paw fruitlessly at nearby carpet. On its own, this is endearing but not unusual — burying leftovers is a common cat instinct, even if we’ve not seen it in our other cats. What made it a uniquely Pearl trait was that she would also perform this ritual if offered something she didn’t want at all. I laughed every time; it was such an audacious way to indicate utter disinterest. Take it away, please. Put it in a hole, if you would.

She got, more or less, everything she wanted. If she claimed a spot, everything about her expression and body language indicated it was clearly hers, even if that spot was your body. (Naturally, if you moved too much or even sneezed suddenly, she would tell you off for that too.) If she wanted to ride on your shoulders, that’s what would be happening. If it was time to feed her and she was too comfortable in her cat tree, well, we’d just have to hold the food up for her. She had a way of looking very pleased with herself that was impossible to argue with.

I first met Pearl in 2014, shortly after we moved to Las Vegas. She was tiny, even for a kitten, and apparently the runt of her litter. I don’t remember what specifically compelled Ash to adopt another cat, except that they love cats, but what a selection.

I cannot stress enough how small she was. You know those solid wood desks that have a column of drawers built into them on one side? You know how they often have a little decorative shape carved out at the bottom with molded edges? Pearl could crawl into that space. I couldn’t believe it the first time I saw it; the gap is so short that I’d never even thought to categorize it as a space, let alone one a cat might enter, but she slunk into it like it was nothing. I was so worried we’d have to move the desk somehow to get her out, but she usually turned around and came right out again. I still remember the very last time she did it — I could tell she was having to shimmy a bit to fit in there, and she must have noticed too, because I never saw her even try it again.

The other cats had somewhat mixed feelings. Napoleon didn’t like her at all and hissed directly in her face, but… after that, I don’t remember any bad reactions from him at all, so I guess he warmed up quick. Anise did not seem to understand what a kitten was, tried to play with her, and then acted very confused when that didn’t seem to work. And Twigs…

Oh, Twigs. Twigs was jealous. He had always been Ash’s cat, he had made himself Ash’s cat, and he very quickly inferred that Pearl was a threat to his position. Another cat! In Ash’s lap! Unthinkable!

On one particular night Ash had barred Twigs from the bedroom to sleep with just Pearl, but came downstairs to visit the kitchen. Twigs ran up to them, looked them dead in the eye, and let out a huge sad wail to convey his feelings about the depths of this betrayal.

They let him into the bedroom after that, but he opted to sit across the room and stare daggers at Pearl, moving a little closer every half hour until he was on the far corner of the bed. Just staring.

Ash eventually had to bribe him by putting some cottage cheese on Pearl’s head, after which he decided Pearl was okay. Also he found out that he could fit himself in Ash’s lap alongside Pearl, so that probably helped.

Oh, and she loved to be cozy. She loved to be cozy. Sphynxes are naturally drawn to warmth, of course, but Pearl elevated it to an artform. If I’m propped up in bed, Anise might stand next to me to look at the covers expectantly, or he might just lay down nearby. Pearl would stand right on top of me and pull at the covers with impressive force until I lifted them for her, let her lay on my chest, and tucked her in.

We’d often find Pearl very awkwardly tucked under the edge of a blanket somewhere, having attempted to insert herself beneath it with mixed success. We described this as Pearl doing it all by herself, and complimented her on how talented she was, and then fixed the blanket for her.

We have heater vents in the floor, and one of her favorite pastimes was to sit on one of those, often covering the entire thing, and be gently toasted from below. Sometimes Anise would see what a great idea that was and try to share it and they would end up squabbling.

If there was a sunbeam to be found, Pearl would find it. Much like with vents, she didn’t like to share sunbeams, even if they were half the width of the room. She found it first, you see.

Other places she discovered that were lovely and toasty included: in front of the fridge where the warm air vented out from the bottom; straddling the PS4 so the fan blew onto her tummy; next to or underneath my laptop; on top of my computer case which has a fan vent on top; in front of the heat dish we got while our furnace wasn’t working; and in a laundry hamper full of freshly-dried laundry.

She liked to go outside, too, during the summer. All of our cats are indoor-only, but once in a while we’ll take the more well-behaved ones (not Cheeseball) into the backyard to wander around on the porch and sniff things and enjoy the sun and look at a bird.

And I have never known a cat to be quite so comfortable. Perhaps Anise, on occasion, but he doesn’t have the raw talent Pearl was born with.

You could tell she was settling in if she tucked her paws in against her chest, something she always did quite deliberately and distinctly. But that was only the first stage of comfort. If you were lucky, she would stretch out one arm really far, perhaps to place her paw on you. As she dozed off she might lay flat on her side with her limbs outstretched, which meant we always had to check blankets carefully for a flattened Pearl before sitting down. And if you were really lucky, you might witness Pearl in a chaos configuration, upside-down with her paws wherever.

But even just sitting up with her eyes closed, she looked so content. Looking at photos makes me want to take a nap with her.

Sadly, Pearl had some health troubles from the start. She had a kink right at the base of her tail from the day we got her, suggesting it had been injured while at the breeder and not healed right, so she was never able to raise her tail all the way. She also came home with some sort of intestinal parasite that gave her a lot of… um, gastric distress, and while we were able to clear that up quickly, it seemed to recur soon afterwards.

We took her to the vet again, suspecting more parasites, but multiple tests turned up nothing. We tried a number of things — different food, sensitive-stomach food, wet food, more water, different treats — but could not seem to figure it out, and so Pearl just had stomachaches on and off for a while. Sometimes she would sit by the litterbox and grumble, and all I could do was try to reassure her.

It wasn’t until a few years later that Ash’s then-husband, with no explanation whatsoever, spontaneously decided to just feed her some plain chicken mixed with pumpkin purée. Just like that, she was fine. I felt like kind of an idiot for not trying that earlier, but after giving her veterinary sensitive-stomach food and seeing no change, I thought we’d ruled out food sensitivity.

We swiftly outlined a general idea of what Pearl could or could not tolerate. Chicken, pork, pumpkin: OK. Beef or any kind of organs: she immediately threw up. Fish: no good. And yet manufactured food containing only very simple things still gave her stomachaches, so our best guess was that she also couldn’t tolerate fucking xantham gum or something, which is in pretty much all pet food, including the sensitive stuff.

Regardless, we had a diet she could stomach, so for the rest of her life we made her a custom diet of ground chicken, ground pork belly, pumpkin, and some nutrient powder that didn’t bother her (which took several attempts to find). That meant no more free-feeding the other cats, so we got a big dog cage to keep the kibble in, and we’d let the other cats in there while Pearl was eating her special princess food. Thus began a multi-year saga during which, every four hours, like clockwork, Anise would start bothering me to feed him.

Please do not tell me what I could have done to dissuade Anise or space out the schedule. I guarantee, he is vastly more dastardly and annoying than you are giving him credit for. The cats run this household, and I have long since made peace with that.

The closest to any real insight we got about Pearl was that perhaps her kitten parasites had left her with IBS — a very vague diagnosis of exclusion, and the best anyone could come up with. But Pearl was happy, so that was good enough. We eventually found new treats she could stomach, too.

Pearl had relatively intense relationships with the other cats, much like she did with people.

She adored Napoleon, our furred and largest cat, for some reason. She often trotted up to him, very eager to sniff him; or when he trotted towards the kibble cage in recent years, she would run alongside him, staring sideways at him. I don’t really understand what her feelings were, and Napoleon didn’t really return them, but he at least tolerated them. Curiously, I can’t remember many attempts on Pearl’s part to snuggle up to Napoleon; she mostly snuggled with the other sphynxes.

She and Twigs (her uncle, incidentally) spent a ton of time together, and Anise was often in the mix as well. They’d often end up in a pile under or within a blanket, or all wedged into the same cat bed, or piled on a chair that had a towel on it. Sometimes she’d grumble at Anise for being too much in her personal space, but somehow Twigs’s presence seemed to defuse everything. I can’t remember her ever grumbling at Twigs, in fact.

Cheeseball is the only cat we have who’s younger than Pearl. When he was a kitten, she kind of doted on him like a mom, frequently trying to groom his head. She kept doing this into his adolescence, even as he was swiftly growing bigger than her, which was endearing and also very funny.

We moved in 2018, and spent the summer with a former acquaintance’s parents, as they had a finished and furnished basement that was practically an apartment all on its own. Unfortunately, they had four cats of their own, for a total of nine crammed into a relatively small space. (One of the parents couldn’t be around cat hair in the medium term, due to reasons.)

One of the cats, Seamus, was a maine coon, and by all accounts kind of an asshole. He made a habit out of chasing Napoleon around, which Napoleon did not like at all, and which would result in Pearl chasing him to defend Napoleon, and then Anise chasing after Pearl because everyone is running around and he doesn’t quite understand why but he doesn’t want to be left out. We kept the cats separated as best we could, but we didn’t have much space to work with, and we were already trying to sequester Cheeseball, who we’d just adopted as a kitten. Everything was just kind of a mess.

Anyway this kinda stressed everyone out.

I bring it up because of one particular event. The only segregated parts of the basement were the bathroom and a somewhat awkwardly-shaped bedroom. The bedroom was exclusively for our cats. I don’t remember exactly what led up to this, but at some point Seamus made a beeline for the bedroom while Pearl was just inside the open door. I’m guessing Napoleon was in there too.

Pearl was absolutely not having this. She stood her ground and hissed hard enough to stop this absolutely massive cat in his tracks. She was so mad that she peed on the floor (which was, thankfully, vinyl). We got there to intervene about half a second later, but wow! She drew a line in the sand and under no circumstances was this bully going to cross it. We have always looked back fondly upon this “rage piss” incident.

I think Pearl was left a little rattled, though. Even at the time, she growled at the other maine coon there, who was an absolute sweetheart and rarely did more than sit nicely and ask to be pet. Once we were out of there, she seemed a little distrusting of Anise, often growling at him or biting his haunch merely for sitting nearby (which would entice a bewildered Anise into smacking her, justifying her reaction). I wish we hadn’t stayed there.

Cheeseball was also growing up and wanted to play with Pearl, because playing is how he engages with pretty much everything; alas, he was a bit too rowdy for Pearl. Twigs, infinitely patient, was there to absorb a lot of this.

But then Twigs died, and the cats’ relationships seemed to deteriorate. Cheeseball liked Pearl, but he always wanted to fight with her, which she didn’t like. Anise liked Pearl, but she seemed to resent him a lot of the time, and there was no Twigs to separate them. Pearl liked Napoleon, but Napoleon liked to be by himself.

It was okay, but tense.

Maybe I’m overstating this. Going back through photos of Pearl, I’ve found plenty from the post-Twigs era where she’s still hanging out with Anise peacefully. A number of their conflicts even started because she would approach Anise to sit by him, then growl at him. No wonder he was confused. Sometimes she would groom him and start growling, while licking his ear. Hello? What are you doing?? What do you want from him here.

Still, that must mean she still liked him. She just had some complicated feelings. It always made me a little sad when they couldn’t get along, though. I’d gotten Anise in the first place in part to give Twigs a friend, and Pearl and Twigs had always gotten along well, and now… well.

Having said all this about how great and lovely Pearl is, her presumptuousness also made her a huge pest in some very specific ways. For example, once we’d settled into the food routine that saved her from constant stomachaces, one of her favorite things to do was to go over to the kibble cage and try to find kibble that had escaped from it. If she could get away with it, she would stick her paw between the bars and pull kibble (or the entire bowl) out to eat.

It was slightly annoying, and also very funny. We called this pulling a heist. And then she’d have awful gas some hours later.

I also very distinctly remembering getting takeout one time, which happened to include a breaded and fried slab of fish. I had the little takeout container on the table in front of me, and I think I was fiddling with the wrapper on their plastic fork or something, when Pearl came along, sniffed it… and then bit the fish and pulled the whole filet out of the container. Right in front of me! Points for boldness, I guess. She wasn’t quite so audacious any other time, but she must’ve really liked the smell of that fish.

And while she was generally pretty picky about what she would consider a toy, she did, on occasion, like to bite the arms of my glasses. Once I was laying next to her and petting her while she purred, and she stuck a paw in between my glasses and my face, pulled them off, and tried to bite them — purring all the while.

My favorite Pearl trick was what we dubbed “mouse alert”. If Pearl was looking for someone — often anyone at all, but sometimes a particular person who was absent or in a room with a closed door — she would find one of her toy mice and carry it around doing a very loud, muffled meow. If she saw you she would then drop the mouse and trot over, making happy high-pitched meows instead.

Sometimes she’d start out with regular meows, which we could hear from the other side of the house, but then they’d abruptly turn deeper and longer, and we knew she’d picked a mouse up. It was so charming and so funny. Every so often we’d find a pile of mice outside a door and we knew that Pearl had been trying to open it. She later expanded her roster to include Big Mouse — a plush almost half her size who became her favorite — and a plush of a single HIV virus that she must’ve stolen from my desk.

She didn’t play with the mice, either. I have video of her playing with a mouse when she was fairly young, but it’s not one of the mice we have now. She seemed to regard them as precious, her comforting belongings that she could almost always lure us out of hiding with. “Come look at my mouse!” Sometimes she’d carry them around quietly, just to have one or two nearby in a comfortable spot.

I tried for her whole life to get a recording of this, which proved nearly impossible, because she’d stop if she knew anyone was nearby! I got a clear recording only once, a week before she died; I was in our dark bedroom, filming into Ash’s office, and I don’t think she realized I was there. There’s a link at the bottom.

Her other favorite possession was string. Pearl loved to play string. She would ask for it by name. No, really. If she wanted to play string, she would find (or bring) a string and sit on it hoping someone noticed, and if that didn’t work, I’m pretty sure she had a specific meow for asking you to please follow her to string and then play with it.

Playing string with her was a slightly frustrating affair, but perhaps I just didn’t understand the rules. They seemed to be: I should wiggle the string; then Pearl grabs the string; then Pearl keeps the string. That doesn’t end the game, though. I should keep trying, in vain, to get the string back, while Pearl simply keeps winning.

A great thing to do was dangle it above her, at which point she’d stand up to try to get it and chomp at it, audibly. I loved her little chomp sound. I can’t even do it myself; I feel like I’d hurt my teeth.

After she was through adolescence, string was the only thing she really wanted to play with. She might’ve chased a laser pointer a couple of times, but string was the one thing she would ask for. Occasionally I’d try to play with Anise with a string, but Pearl had a fucking sixth sense for when string was happening, and she would appear from nowhere and go absolutely nuts over it while Anise sat back and watched.

In March 2021, I took Pearl to an ER vet over very rapid breathing. They told me she’d had fluid in her lungs and diagnosed her with congestive heart failure. That’s when your heart can’t pump hard enough; part of Pearl’s heart wall had thinned and weakened, and one chamber was enlarged. She had to be hospitalized overnight. I drove home thinking I’d never see her again.

They couldn’t identify a cause. She was given a prognosis of “not fantastic” and prescribed a growing mountain of medication, which Ash dutifully gave to her every twelve hours for months on end, even when Pearl refused it. Sometimes Pearl had to be bribed with treats in order to eat at all, though I later traced that to a batch of food with insufficient pumpkin for her liking.

We had to keep her stress level low, which meant keeping her completely separated from the other cats (or at the very least Cheeseball) as much as possible. That meant Ash vanished into a closed room for most of every day to work while keeping an eye on Pearl — who was, after all, Ash’s cat. That also left me with three other cats constantly vying for my attention.

For several months we often couldn’t even sleep in the same room — Pearl and Anise couldn’t be left together, and Anise makes a racket all night if he’s shut out. Early on, our roommate would often take Pearl overnight (even despite being allergic to cats), but as time went on, Ash felt a stronger impulse to be around her as much as possible. Eventually we found we could have both Anise and Pearl overnight as long as we put a sweater on Anise and had sufficient extra blankets on the bed, but honestly it felt like a constant logistical nightmare.

Even with all this, we still had several more ER visits, several more hospitalizations.

Still, Pearl seemed to be doing okay. She was happy, she engaged with us, she purred, she snuggled, she nuzzled, she played. She was fine, and stable, until she wasn’t.

It was January 11, and it was the first ER visit for rapid breathing in a while. We handed her over, they hospitalized her, and we left, assuming we’d pick her up in the morning and she’d be fine, as had always happened.

We weren’t home for long before they called us. Pearl wasn’t recovering this time, and wouldn’t make it through the night.

We raced back. We saw Pearl, struggling to breathe, even on oxygen. We pet her and told her it would be okay. She cried out for help. Ash held her.

And then we let her go.

I love and miss so many little things. She had such beautiful eyes, like Twigs did, though she squinted a lot so it always felt like a special treat when I could see them clearly. Her whole face scrunched when she meowed. She had a marble pattern, so I guess she would’ve been a calico. I didn’t even notice it when we first got her, and then one day it jumped right out at me and I felt briefly like our kitten had been replaced with a different one. She had a funny little clump of four hairs that stuck out from her hip. She had marbling on her pawpads, too.

I love her wide vocabulary of very cute little meows, in contrast with Twigs’s more raucous ones. She reserved them for special occasions, opting to chirr most of the time.

I love how, when she was surprised by something, she would simply jump straight up in the air an inch, then come down. No other movement. It was like she was tweened. I never tried to spook her on purpose to see this, but she was a little prone to being spooked.

I love how, when she’d knead at a soft blanket, she did just a few quick little motions and then she was done. It was so dainty. I always called it kitty paws, to distinguish from cat paws.

I love how she’d do a straight upwards stretch that somehow made her ears flick inside out briefly.

I love the very deliberate way she tucked her paws, and how she would gently hold onto someone’s shoulders while getting a taxi ride. Everything she did came across as so purposeful.

I love how Ash had found that rubbing their face on Pearl’s side as a kitten would get her to purr, and that kept working for her whole life, and it’s basically what she ended up doing to people in return.

I love how she had a funny obsession with water. I can’t really explain it, and I don’t know what she found so interesting. If I took a swig from my water bottle with Pearl nearby, she would climb on whatever was necessary to sniff at the nozzle. If I opened a soda with Pearl nearby, she’d stick her nose right in the opening, then recoil when the bubbles fizzed her. She didn’t enjoy baths or anything, she just liked… water. From afar. Like with Napoleon, perhaps.

I love how she nuzzled so hard that she hit maximum nuzzle, and so she would also sort of gently swipe the air with her paw as well, for extra nuzzling power.

I love her funny “bug off” sweater, illustrated with a ladybug, which seemed to capture her personality well: don’t be rude to me, but expressed in a very cute manner.

I love how she adopted the sort of extended windowsill in our bathroom as her own, and would lay there on sunny days and roll around on a towel.

I love that she was pampered right to the end. Over the course of recent weeks, Ash would keep giving me updates on Pearl’s development of a new routine, where she would sit in a Treat Spot she had designated, possibly meow once or twice, and wait very nicely until Ash gave her a treat. And then Ash would eventually capitulate, helpful before the polite ministrations of this very tiny cat, and give her a treat. It seemed that the number of treats Pearl was managing to get per day was gradually increasing, and so I asked every time: why not simply not give her a treat? But I knew the answer.

If you cried, there were decent odds that Pearl would come and comfort you, come chirp at you and nuzzle until you felt better.

When we first moved here, Ash’s ex-husband had driven the truck containing all our stuff, and he slept here one night before leaving for good. The day after he’d left, we heard Pearl doing mouse alert in the room he’d slept in, and I just broke down sobbing at the kitchen table, thinking about how Pearl liked him despite everything and was just trying to find him, and we had no way to tell her he wasn’t coming back or explain any of it to her. To her, one of her favorite people had just disappeared, and that was so sad.

But Pearl heard me, came over, jumped on the kitchen table, and purred and headbutted me like crazy. The idea that I was sad for her and she still wanted to comfort me made me cry harder.

She would also headbutt and nuzzle Ash specifically on the mouth when they sang, or do the same to me if I whistled competently. I suppose she liked music, but only from us.

Most of all, I love… how much she doted on Ash. She slept alongside them (me only a few times), she followed them around, she waited outside doors for them. They were her favorite person. I feel so bad for them, to have lost both Twigs and Pearl back to back.

It’s been… two weeks now. Just over, because it took me another day to finish this post.

I don’t know if it’s fully clicked yet. I didn’t see Pearl much during the day, since she’d be tucked away in Ash’s office slash our bedroom. I saw her mostly at night and first thing in the morning. So while I’m out here, at my desk, it’s like nothing has changed. It only sinks in when I go upstairs and see the door left open, see a bed with no Pearl tucked in it somewhere.

It’s kind of dumbfounding just how much of this house and our lives had warped around Pearl, around this one tiny cat who loved everyone. So many things have disappeared or seem superfluous now. I was already free-feeding the other cats again since Pearl wasn’t allowed to roam the house unsupervised, but now we don’t need the kibble cage at all. Half our doors had been kept closed to make a few different places for Pearl to stay, but now none of that is necessary. Litterboxes had ended up scattered throughout the house so Pearl would always have access to one; now they’re back to being in a few central locations.

Ash doesn’t have to wake up at a specific time every day to give Pearl medicine. Pearl won’t wake us up to feed her. We don’t have to make her food, ever again.

And there are so many things that were only for Pearl. This wasn’t the case for anyone else. Styx only had communal cat sweaters; his favorite toy was loose change on my desk. Twigs, too, only had sweaters that Anise and Pearl inherited; his one dedicated toy was a single very tiny mouse he sometimes played with.

But Pearl? Half the sweaters we have only fit Pearl. Her mice were very much hers. Even her string was very much hers. We have a mortar and pestle that were specifically for grinding up her medication, oral syringes only Pearl used. She had possessions of her very own, things she’s left behind.

We knew this was coming, of course. Without the intervention of modern medicine, she would have died last March, and the outlook for heart failure in a cat isn’t great. I’ve already grieved for her several times over the past year. I didn’t see her much during the summer, but I’d been trying to spend more deliberate time with her in recent months, and I’m glad I did. I regret nothing. I earned her purrs, I played string with her exactly the right amount, I woke up to her stealing my pillow. I got the full Pearl experience.

And so did she. Ash took her outside extra over the summer, let her see a bit of the outside world (even if it was only our yard). We let her roam the house when we could, banishing Cheeseball to a room by himself if necessary, though she usually ended up sitting on a vent or my lap (or trying to heist some kibble). She got lots of treats, lots of love, lots of blankets, and even a vent all to herself. What more could she ask for?

She was living on borrowed time, but we borrowed every second we could. I don’t know what else we could’ve done. And we were there for her right up until the end. We didn’t have that opportunity with Twigs; he died in the back room, surrounded by strangers.

In the end, her heart was literally too big.

This sucks.

Pearl deserved better. She was dealt a bad hand from the beginning, but she was still friendly and kind, and then this happened. She was so young, too — her eighth birthday would’ve been next month. She, like Twigs, should’ve had twice as long.

Things won’t be difficult for her any more, I guess. I don’t know how much that comforts me.

Everything else moves on. Pearl continued until the night of January 11, 2022, but can go no further. We’re forced to leave her there, retaining only memories, while time carries us gently forward, ever further away.

So here is my landmark, my stake in the ground. Pearl was here. May this mark out the shape of who she was and leave that impression upon the world for much longer.

The finality of death resolves so many questions. I often wished I could improve Pearl’s tense relationships with Anise and Cheeseball, but now there’s no problem to solve. The interactions they had are all the interactions they will ever have. The tension is gone, now. The worries about how long Pearl’s heart will last are gone too.

The cat dynamic has shifted, again. Cheeseball and Napoleon have been much more affectionate towards Ash, and Napoleon has suddenly become a lap cat. I suppose the rest of the cats missed Ash while they were siloed away with Pearl for so long. Maybe they’re grieving? Cats are so open with their emotions, but sometimes they’re still inscrutable.

Pearl’s urn is on the dresser in our bedroom, right next to Twigs. Hers is bigger than his, somehow. But that’s Pearl for you — she always knew how to take up space.

No, this is too dire an ending. Pearl was dealt a bad hand, but she always tried to be nice despite that. She got to see a lot of places and make a lot of friends, both people and cats and even one dog. Even when she had complex and skeptical feelings about Anise, she kept trying to be friends with him. She faltered at times, but she always did her best to uphold her principles of loveliness, strong boundaries, and please give me a treat.

That’s a lot for a tiny cat. I admire her for it, and I will not forget it.

Pearl sitting contently next to Ash at their desk

Thank you for reading about Pearl. I hope you’ll remember her too. We loved her very much, and she put a lot of love back into the world. If you would like to experience more Pearl, here are some videos of her. I have some more to sift through, so this list may grow in the coming days.

And here are some games she has starred in. Or, rather, her fursona Purrl has starred in them.

Със или без Нинова, БСП е обречена да се стопява

Post Syndicated from Венелина Попова original

Заседанието от миналата събота на 50-тия Конгрес на БСП съвсем не мина в юбилеен дух: декорите бяха бутафорни, а говоренето в огромната зала на НДК – помпозно, клиширано и безсмислено. Опитите провалът на партията на последните избори да се представи като успех бяха унизително глупави. Делегат от Хасково нарече БСП опитен гросмайстор, който е финализирал успешно партия шах с участието си във властта. Александър Симов, един от т.нар. преторианци на Нинова, изтърси, че БСП не била идейно подменена, а продължавала да бъде ярка, лява и запомняща се партия. Кое от това е вярно?

Единственият опонент на Нинова от партията Крум Зарков определи случилото се като фарс. Преди форума председателят на Партията на европейските социалисти Сергей Станишев нарече резултатите от последните избори „катастрофа“ и заяви, че ако оставката на Корнелия Нинова не се препотвърди от делегатите, това в очите на избирателите ще изглежда гротескно и ще бъде прието като неглижиране на вота им. Евродепутатката Елена Йончева, която е сред най-яростните критици на Нинова, заяви в предаването „Неделя 150“ по БНР: „Конгресът беше срамен и режисиран, хвърли БСП в категорията на маргиналните партии. Конгресът нямаше връзка с реалността. На Конгреса не получихме отговор на въпроса защо изгубихме доверието на 800 000 души.“

В крайна сметка Корнелия Нинова запази властта си в партията, а така – и позицията си на вицепремиер. С друг председател социалистите можеха да свалят доверието си от нея и да издигнат за вицепремиер ново лице – мотиви винаги ще се намерят. Сега Нинова може спокойно да продължи да се обгражда и в министерството с послушковци и дори да раздава постове като индулгенции на свои покаяли се партийни врагове. Може да продължи да наказва непреклонните, дори да ги изхвърля от партията, както направи с Кирил Добрев, с пловдивския олигарх Георги Гергов и с други свои другари.

Анализаторите, идейните противници на Корнелия Нинова и приближените ѝ в БСП ще продължат (и по инерция) още известно време да коментират събитието и евентуалните му последици за партията столетница. За разцепването ѝ и за необходимостта от създаване на нова и автентична лява партия, от която в България имало нужда, заговориха изхабени политици като Михаил Миков, който не беше запомнен с нищо по време на краткия му престой на върха на БСП.

Но това не е първото „време разделно“ в средите на българските социалисти.

Преди осем години знакови фигури в левицата, като бившия президент Георги Първанов, Румен Петков, Евгений Жеков и други, създадоха отначало идейна платформа в БСП, която след изключването им от партията превърнаха в нов политически проект под името „Алтернатива за българско възраждане“ (АБВ). Новата партия беше обявена за борец срещу задкулисието в политиката, срещу лобизма и подмяната на легитимните демократични механизми при вземането на политически и управленски решения и при назначаването на кадри. А всъщност АБВ се появи в резултат от конфликта между Сергей Станишев и Георги Първанов, когато първият беше премиер на кабинета на тройната коалиция, а президентът – неин архитект.

През 2014 г. АБВ дори участва в управлението на страната, след като гарантира парламентарното мнозинство по време на втория коалиционен кабинет на Бойко Борисов. Тогава Ивайло Калфин от партията стана четвъртият вицепремиер в правителството на ГЕРБ и Реформаторския блок. През пролетта на 2016 г. Калфин подаде оставка, а от АБВ обявиха, че оттеглят подкрепата си за парламентарното мнозинство и кабинета. Това „съвсем случайно“ съвпадна с избора на Корнелия Нинова за председател на БСП на 8 май същата година, когато тя спечели с 6 гласа лидерската битка срещу Михаил Миков. И вече беше дала знак, че е склонна да подаде ръка на отцепниците от АБВ, ако заблудените овце се върнат в стадото.

Дали сега ще има ново разцепване на партийния кораб, което този път може окончателно да го потопи, няма смисъл да се гадае. Защото ако се случи, всички неизбежно ще го разберем. Макар че една анкета сред социалистите вероятно би показала, че те възприемат партията си като безсмъртна…

По-важният въпрос, който доминира в този сюжет, е

дали лидерът носи цялата вина и отговорност за драматичния разгром на БСП на последните парламентарни избори през ноември 2021 г.

За да се отговори сравнително смислено на този въпрос, е нужно да се вгледаме в графиката на доверието към левицата в годините на Прехода и да проследим нейните пикове и спадове. И тогава да правим изводи.


Първият голям срив в доверието към БСП дойде след втория кабинет на Андрей Луканов. Той управлява само няколко месеца, на практика докара държавата до банкрут и спря да обслужва външния ѝ дълг. След т.нар. Луканова зима, изборите през 1991 г. бяха спечелени от СДС, а социалистическата партия изгуби над един милион гласоподаватели. Но властта на практика беше още в ръцете на бившите комунисти и твърде скоро Държавна сигурност чрез ДПС свали правителството на Филип Димитров.

По време на управлението на кабинета „Беров“, съставен с мандата на ДПС, бяха поставени основите на паралелната държава и на организираната престъпност, създадоха се силовите групировки и финансовите пирамиди, с които започна ограбването на българските граждани. Впоследствие БСП начело с Жан Виденов спечели парламентарните избори през 1994 г. и получи близо 2,26 млн. от гласовете на избирателите. Но правителството му изкара половин мандат и падна. Тогавашният Министерски съвет се оказа с рекорден брой агенти от Държавна сигурност – през 2001 г. Комисията по досиетата съобщи, че за 22 лица в състава му са открити данни за сътрудничество с ДС и Разузнавателното управление на Генералния щаб, от които за 11 души има безспорни доказателства. Самият премиер с прякор Дунав е бил съдържател на явочна квартира в Пловдив.

След т.нар. Виденова зима, на изборите през 1997 г., когато лидер на БСП вече е Георги Първанов, изборният резултат за партията е драматичен.

Гласовете за нея вече са близо два пъти и половина по-малко, отколкото на предишния вот.

Уплашени от успешния мандат на ОДС и правителството на Иван Костов, определени среди организират завръщането на Симеон Сакскобургготски в България. Със създаденото национално движение на негово име (НДСВ) той спечели изборите през 2001 г., стана първият в света монарх премиер в парламентарна република и създаде най-непрозрачната коалиция заедно с ДПС и „Новото време“. Сакскобургготски спря и отварянето на досиетата на Държавна сигурност, а бившите агенти се радваха на особена почит по време на неговия мандат. През 2001 г. БСП понесе сериозна загуба – партията за първи път в годините на Прехода падна до резултат малко над 783 000 гласа. Лидер тогава е вече Сергей Станишев, а неочаквано същата година Георги Първанов спечели изборите за президент.

През 2005 г. БСП за последен път получи доверието на повече от един милион избиратели и с близо 1,13 млн. гласа стана първа политическа сила в страната. След управлението на тройната коалиция и кабинета на Сергей Станишев още два пъти БСП успя да вдигне резултатите си на избори – на предсрочните през 2013 г. (след оставката на първия кабинет „Борисов“) и през 2017 г., когато левицата, вече начело с Корнелия Нинова, спечели 80 места в парламента. Но така и не успя да прескочи границата от един милион гласа.

За последните три десетилетия левицата катастрофира няколко пъти на избори. През 2014 г., когато протестиращите срещу модела #КОЙ свалиха правителството на БСП и ДПС с премиер Пламен Орешарски, социалистите паднаха до резултат от около половин милион гласа. А на последните три парламентарни избори през 2021 г. резултатите за социалистическата партия постепенно се сриват, за да стигнат до 267 817 гласа на вота през ноември.

Нинова безспорно има отношение към погрома на левицата, но би било повърхностно той да бъде обяснен само с нейното лидерство в БСП.

Корнелия Нинова може и да е влюбена във властта и напълно способна да прегази хората, които се опитват да ѝ я измъкнат от ръцете, както я описват нейни опоненти. Лица от обкръжението ѝ изглеждат като апаратчици от епохата на социализма и беше странно да видим компрометирани бивши министри, като Емилия Масларова и Румен Гечев, на коалиционните преговори за кабинет в качеството им на експерти. И в листите за изборите, и в подреждането им, и в разприте за това кой да ги води, се видя, че Нинова не действа принципно, а дава преднина на клакьорите и на тези, които ѝ угодничат. Програмата на БСП „Визия за България“ пък се оказа скучен, безидеен и лишен от всякаква визия документ за управлението на страната. И така нататък…

Но нали БСП не е пирамидална, вождистка партия? Защо ръководството ѝ проспа лидерските грешки на Нинова? От конформизъм, ленивост, безразличие? Които познават партията отвътре, интригите, битките за надмощие, лобитата – да кажат.

За външните наблюдатели обаче е ясно, че БСП ще продължи да върви надолу, защото:

БСП не се отказа от комунистическото си минало и наследство, а прие цялата история на тоталитарната БКП за своя. И като знак за това Корнелия Нинова и другарите ѝ продължават да се покланят ритуално на гроба на Тодор Живков в Правец.

БСП не осъди престъпленията на бившия режим. Не се извини за жертвите на Народния съд, за издевателствата и убийствата в лагерите, за унищожаването на елита на нацията ни след преврата на 9 септември 1944 г. Не поиска искрено прошка от българските мюсюлмани заради т.нар. Възродителен процес. (Извинението на Сергей Станишев и целувката на Местан на Орлов мост през 2014 г. не изпратиха ясни послания към малцинството.)

БСП не прекъсва и до днес обвързаността си с бившите тайни служби на тоталитарната държава.

БСП продължава да нарича съветската окупация на България „освобождение“ и да бъде „пета колона“ на Русия и днес.

Но най-важното е, че БСП не е нито лява, нито социална партия.

И доказателство за това са не само милионерите и олигарсите в нея, но и политиката ѝ в данъчната, пенсионната и социалната сфера. А всяко нейно правителство или е катастрофирало (Луканов, Виденов), или е управлявало непрозрачно и силно зависимо от дълбоката държава (Станишев, Орешарски).

Партията ще продължи да губи електорална подкрепа, защото няма какво да предложи на избирателите си. А и защото, както видяхме, не позволява смяна на поколенията в ръководството си. БСП, както и останалите партии, създадени от матрицата на Прехода, ще изчезнат или ще продължат да съществуват като маргинални организации без влияние в политиката. И това е неизбежно.

Заглавна снимка: Корнелия Нинова изнася политически доклад пред Конгреса на БСП на 22 януари 2022 г. Стопкадър от видеоклип на БСТВ


Deploy and Manage Gitlab Runners on Amazon EC2

Post Syndicated from Sylvia Qi original

Gitlab CI is a tool utilized by many enterprises to automate their Continuous integration, continuous delivery and deployment (CI/CD) process. A Gitlab CI/CD pipeline consists of two major components: A .gitlab-ci.yml file describing a pipeline’s jobs, and a Gitlab Runner, an application that executes the pipeline jobs.

Setting up the Gitlab Runner is a time-consuming process. It involves provisioning the necessary infrastructure, installing the necessary software to run pipeline workloads, and configuring the runner. For enterprises running hundreds of pipelines across multiple environments, it is essential to automate the Gitlab Runner deployment process so as to be deployed quickly in a repeatable, consistent manner.

This post will guide you through utilizing Infrastructure-as-Code (IaC) to automate Gitlab Runner deployment and administrative tasks on Amazon EC2. With IaC, you can quickly and consistently deploy the entire Gitlab Runner architecture by running a script. You can track and manage changes efficiently. And, you can enforce guardrails and best practices via code. The solution presented here also offers autoscaling so that you save costs by terminating resources when not in use. You will learn:

  • How to deploy Gitlab Runner quickly and consistently across multiple AWS accounts.
  • How to enforce guardrails and best practices on the Gitlab Runner through IaC.
  • How to autoscale Gitlab Runner based on workloads to ensure best performance and save costs.

This post comes from a DevOps engineer perspective, and assumes that the engineer is familiar with the practices and tools of IaC and CI/CD.

Overview of the solution

The following diagram displays the solution architecture. We use AWS CloudFormation to describe the infrastructure that is hosting the Gitlab Runner. The main steps are as follows:

  1. The user runs a deploy script in order to deploy the CloudFormation template. The template is parameterized, and the parameters are defined in a properties file. The properties file specifies the infrastructure configuration, as well as the environment in which to deploy the template.
  2. The deploy script calls CloudFormation CreateStack API to create a Gitlab Runner stack in the specified environment.
  3. During stack creation, an EC2 autoscaling group is created with the desired number of EC2 instances. Each instance is launched via a launch template, which is created with values from the properties file. An IAM role is created and attached to the EC2 instance. The role contains permissions required for the Gitlab Runner to execute pipeline jobs. A lifecycle hook is attached to the autoscaling group on instance termination events. This ensures graceful instance termination.
  4. During instance launch, CloudFormation uses a cfn-init helper script to install and configure the Gitlab Runner:
    1. cfn-init installs the Gitlab Runner software on the EC2 instance.
    2. cfn-init configures the Gitlab Runner as a docker executor using a pre-defined docker image in the Gitlab Container Registry. The docker executor implementation lets the Gitlab Runner run each build in a separate and isolated container. The docker image contains the software required to run the pipeline workloads, thereby eliminating the need to install these packages during each build.
    3. cfn-init registers the Gitlab Runner to Gitlab projects specified in the properties file, so that these projects can utilize the Gitlab Runner to run pipelines.
  1. The user may repeat the same steps to deploy Gitlab Runner into another environment.

Architecture diagram previously explained in post.


This walkthrough will demonstrate how to deploy the Gitlab Runner, and how easy it is to conduct Gitlab Runner administrative tasks via this architecture. We will walk through the following tasks:

  • Build a docker executor image for the Gitlab Runner.
  • Deploy the Gitlab Runner stack.
  • Update the Gitlab Runner.
  • Terminate the Gitlab Runner.
  • Add/Remove Gitlab projects from the Gitlab Runner.
  • Autoscale the Gitlab Runner based on workloads.

The code in this post is available at


For this walkthrough, you need the following:

  • A Gitlab account (all tiers including Gitlab Free self-managed, Gitlab Free SaaS, and higher tiers). This demo uses free tire.
  • A Gitlab Container Registry.
  • Git client to clone the source code provided.
  • An AWS account with local credentials properly configured (typically under ~/.aws/credentials).
  • The latest version of the AWS CLI. For more information, see Installing, updating, and uninstalling the AWS CLI.
  • Docker is installed and running on the localhost/laptop.
  • Nodejs and npm installed on the localhost/laptop.
  • A VPC with 2 private subnets and that is connected to the internet via NAT gateway allowing outbound traffic.
  • The following IAM service-linked role created in the AWS account: AWSServiceRoleForAutoScaling
  • An Amazon S3 bucket for storing Lambda deployment packages.
  • Familiarity with Git, Gitlab CI/CD, Docker, EC2, CloudFormation and Amazon CloudWatch.

Build a docker executor image for the Gitlab Runner

The Gitlab Runner in this solution is implemented as docker executor. The Docker executor connects to Docker Engine and runs each build in a separate and isolated container via a predefined docker image. The first step in deploying the Gitlab Runner is building a docker executor image. We provided a simple Dockerfile in order to build this image. You may customize the Dockerfile to install your own requirements.

To build a docker image using the sample Dockerfile:

  1. Create a directory where we will store our demo code. From your terminal run:
mkdir demo-repos && cd demo-repos
  1. Clone the source code repository found in the following location:
git clone
  1. Create a new project on your Gitlab server. Name the project any name you like.
  2. Clone your newly created repo to your laptop. Ignore the warning about cloning an empty repository.
git clone <your-repo-url>
  1. Copy the demo repo files into your newly created repo on your laptop, and push it to your Gitlab repository. You may customize the Dockerfile before pushing it to Gitlab.
cp -r amazon-ec2-gitlab-runner/* <your-repo-dir>
cd <your-repo-dir>
git add .
git commit -m “Initial commit”
git push
  1. On the Gitlab console, go to your repository’s Package & Registries -> Container Registry. Follow the instructions provided on the Container Registry page in order to build and push a docker image to your repository’s container registry.

Deploy the Gitlab Runner stack

Once the docker executor image has been pushed to the Gitlab Container Registry, we can deploy the Gitlab Runner. The Gitlab Runner infrastructure is described in the Cloudformation template gitlab-runner.yaml. Its configuration is stored in a properties file called A launch template is created with the values in the properties file. Then it is used to launch instances. This architecture lets you deploy Gitlab Runner to as many environments as you like by utilizing the configurations provided in the appropriate properties files.

During the provisioning process, utilize a cfn-init helper script to run a series of commands to install and configure the Gitlab Runner.

              command: sudo yum -y install docker
              command: sudo service docker start
              command: sudo wget -O /usr/bin/gitlab-runner
              command: sudo chmod a+x /usr/bin/gitlab-runner
              command: sudo useradd --comment 'GitLab Runner' --create-home gitlab-runner --shell /bin/bash
              command: sudo gitlab-runner install --user=gitlab-runner --working-directory=/home/gitlab-runner
              command: !Sub 'aws configure set default.region ${AWS::Region}'
              command: !Sub 
                - |
                  for GitlabGroupToken in `aws ssm get-parameters --names /${AWS::StackName}/ci-tokens --query 'Parameters[0].Value' | sed -e "s/\"//g" | sed "s/,/ /g"`;do
                      sudo gitlab-runner register \
                      --non-interactive \
                      --url "${GitlabServerURL}" \
                      --registration-token $GitlabGroupToken \
                      --executor "docker" \
                      --docker-image "${DockerImagePath}" \
                      --description "Gitlab Runner with Docker Executor" \
                      --locked="${isLOCKED}" --access-level "${ACCESS}" \
                      --docker-volumes "/var/run/docker.sock:/var/run/docker.sock" \
                      --tag-list "${RunnerEnvironment}-${RunnerVersion}-docker"
                - isLOCKED: !FindInMap [GitlabRunnerRegisterOptionsMap, !Ref RunnerEnvironment, isLOCKED]
                  ACCESS: !FindInMap [GitlabRunnerRegisterOptionsMap, !Ref RunnerEnvironment, ACCESS]                              
              command: sudo gitlab-runner start

The helper script ensures that the Gitlab Runner setup is consistent and repeatable for each deployment. If a configuration change is required, users simply update the configuration steps and redeploy the stack. Furthermore, all changes are tracked in Git, which allows for versioning of the Gitlab Runner.

To deploy the Gitlab Runner stack:

  1. Obtain the runner registration tokens of the Gitlab projects that you want registered to the Gitlab Runner. Obtain the token by selecting the project’s Settings > CI/CD and expand the Runners section.
  2. Update the file parameters according to your own environment. Refer to the gitlab-runner.yaml file for a description of these parameters. Rename the file if you like. You may also create an additional properties file for deploying into other environments.
  3. Run the deploy script to deploy the runner:
cd <your-repo-dir>
./ <properties-file> <region> <aws-profile> <stack-name> 

<properties-file> is the name of the properties file.

<region> is the region where you want to deploy the stack.

<aws-profile> is the name of the CLI profile you set up in the prerequisites section.

<stack-name> is the name you chose for the CloudFormation stack.

For example:

./ us-east-1 dev amazon-ec2-gitlab-runner-demo

After the stack is deployed successfully, you will see the Gitlab Runner autoscaling group created in the EC2 console:

After the stack is deployed successfully, you will see the Gitlab Runner autoscaling group created in the EC2 console.

Under your Gitlab project Settings > CICD > Runners > Available specific runners, you will see the fully configured Gitlab Runner. The green circle indicates that the Gitlab Runner is ready for use.

Now go to your Gitlab project Settings  CICD  Runners  Available specific runners, you will see the fully configured Gitlab Runner. The green circle indicates that the Gitlab Runner is ready for use.

Updating the Gitlab Runner

There are times when you would want to update the Gitlab Runner. For example, updating the instance VolumeSize in order to resolve a disk space issue, or updating the AMI ID when a new AMI becomes available.

Utilizing the properties file and launch template makes it easy to update the Gitlab Runner. Simply update the Gitlab Runner configuration parameters in the properties file. Then, run the deploy script to udpate the Gitlab Runner stack. To ensure that the changes take effect immediately (e.g., existing instances are replaced by new instances with the new configuration), we utilize an AutoscalingRollingUpdate update policy to automatically update the instances in the autoscaling group.

        MinInstancesInService: !Ref MinInstancesInService
        MaxBatchSize: !Ref MaxBatchSize
        PauseTime: "PT5M"
        WaitOnResourceSignals: true
          - HealthCheck
          - ReplaceUnhealthy
          - AZRebalance
          - AlarmNotification
          - ScheduledActions

The policy tells CloudFormation that when changes are detected in the launch template, update the instances in batch size of MaxBatchSize, while keeping a number of instances (specified in MinInstanceInService) in service during the update.

Below is an example of updating the Gitlab Runner instance type.

To update the instance type of the runner instance:

  1. Update the “InstanceType” parameter in the properties file.


  1. Run the script to update the CloudFormation stack:
cd <your-repo-dir>
./ <properties-file> <region> <aws-profile> <stack-name> 

In the CloudFormation console, you will see that the launch template is updated first, then a rolling update is initiated. The instance type update requires a replacement of the original instance, so a temporary instance was launched and put in service. Then, the temporary instance was terminated when the new instance was launched successfully.

In the CloudFormation console, you will see that the launch template is updated first, then a rolling update is initiated. The instance type update requires a replacement of the original instance, so a temporary instance was launched and put in service. Then, the temporary instance was terminated when the new instance was launched successfully.

After the update is complete, you will see that on the Gitlab project’s console, the old Gitlab Runner, ez_5x8Rv, is replaced by the new Gitlab Runner, N1_UQ7yc.

After the update is complete, you will see that on the Gitlab project’s console, the old Gitlab Runner, ez_5x8Rv, is replaced by the new Gitlab Runner, N1_UQ7yc.

Terminate the Gitlab Runner

There are times when an autoscaling group instance must be terminated. For example, during an autoscaling scale-in event, or when the instance is being replaced by a new instance during a stack update, as seen previously. When terminating an instance, you must ensure that the Gitlab Runner finishes executing any running jobs before the instance is terminated, otherwise your environment could be left in an inconsistent state. Also, we want to ensure that the terminated Gitlab Runner is removed from the Gitlab project. We utilize an autoscaling lifecycle hook to achieve these goals.

The lifecycle hook works like this: A CloudWatch event rule actively listens for the EC2 Instance-terminate events. When one is detected, the event rule triggers a Lambda function. The Lambda function calls SSM Run Command to run a series of commands on the EC2 instances, via a SSM Document. The commands include stopping the Gitlab Runner gracefully when all running jobs are finished, de-registering the runner from Gitlab projects, and signaling the autoscaling group to terminate the instance.

The lifecycle hook works like this: A CloudWatch event rule actively listens for the EC2 Instance-terminate events. When one is detected, the event rule triggers a Lambda function. The Lambda function calls SSM Run Command to run a series of commands on the EC2 instances, via a SSM Document. The commands include stopping the Gitlab Runner gracefully when all running jobs are finished, de-registering the runner from Gitlab projects, and signaling the autoscaling group to terminate the instance.

There are also times when you want to terminate an instance manually. For example, when an instance is suspected to not be functioning properly. To terminate an instance from the Gitlab Runner autoscaling group, use the following command:

aws autoscaling terminate-instance-in-auto-scaling-group \
    --instance-id="${InstanceId}" \
    --no-should-decrement-desired-capacity \
    --region="${region}" \

The above command terminates the instance. The lifecycle hook ensures that the cleanup steps are conducted properly, and the autoscaling group launches another new instance to replace the old one.

Note that if you terminate the instance by using the “ec2 terminate-instance” command, then the autoscaling lifecycle hook actions will not be triggered.

Add/Remove Gitlab projects from the Gitlab Runner

As new projects are added to your enterprise, you may want to register them to the Gitlab Runner, so that those projects can utilize the Gitlab Runner to run pipelines. On the other hand, you would want to remove the Gitlab Runner from a project if it no longer wants to utilize the Gitlab Runner, or if it qualifies to utilize the Gitlab Runner. For example, if a project is no longer allowed to deploy to an environment configured by the Gitlab Runner. Our architecture offers a simple way to add and remove projects from the Gitlab Runner. To add new projects to the Gitlab Runner, update the RunnerRegistrationTokens parameter in the properties file, and then rerun the deploy script to update the Gitlab Runner stack.

To add new projects to the Gitlab Runner:

  1. Update the RunnerRegistrationTokens parameter in the properties file. For example:
  1. Update the Gitlab Runner stack. This updates the SSM parameter which stores the tokens.
cd <your-repo-dir>
./ <properties-file> <region> <aws-profile> <stack-name> 
  1. Relaunch the instances in the Gitlab Runner autoscaling group. The new instances will use the new RunnerRegistrationTokens value. Run the following command to relaunch the instances:
./ <runner-autoscaling-group-name> <region> <optional-aws-profile>

To remove projects from the Gitlab Runner, follow the steps described above, with just one difference. Instead of adding new tokens to the RunnerRegistrationTokens parameter, remove the token(s) of the project that you want to dissociate from the runner.

Autoscale the runner based on custom performance metrics

Each Gitlab Runner can be configured to handle a fixed number of concurrent jobs. Once this capacity is reached for every runner, any new jobs will be in a Queued/Waiting status until the current jobs complete, which would be a poor experience for our team. Setting the number of concurrent jobs too high on our runners would also result in a poor experience, because all jobs leverage the same CPU, memory, and storage in order to conduct the builds.

In this solution, we utilize a scheduled Lambda function that runs every minute in order to inspect the number of jobs running on every runner, leveraging the Prometheus Metrics endpoint that the runners expose. If we approach the concurrent build limit of the group, then we increase the Autoscaling Group size so that it can take on more work. As the number of concurrent jobs decreases, then the scheduled Lambda function will scale the Autoscaling Group back in an effort to minimize cost. The Scaling-Up operation will ignore the Autoscaling Group’s cooldown period, which will help ensure that our team is not waiting on a new instance, whereas the Scale-Down operation will obey the group’s cooldown period.

Here is the logical sequence diagram for the work:

Sequence diagram

For operational monitoring, the Lambda function also publishes custom CloudWatch Metrics for the count of active jobs, along with the target and actual capacities of the Autoscaling group. We can utilize this information to validate that the system is working properly and determine if we need to modify any of our autoscaling parameters.

For operational monitoring, the Lambda function also publishes custom CloudWatch Metrics for the count of active jobs, along with the target and actual capacities of the Autoscaling group. We can utilize this information to validate that the system is working properly and determine if we need to modify any of our autoscaling parameters.

Congratulations! You have completed the walkthrough. Take some time to review the resources you have deployed, and practice the various runner administrative tasks that we have covered in this post.


Problem: I deployed the CloudFormation template, but no runner is listed in my repository.

Possible Cause: Errors have been encountered during cfn-init, causing runner registration to fail. Connect to your runner EC2 instance, and check /var/log/cfn-*.log files.

Cleaning up

To avoid incurring future charges, delete every resource provisioned in this demo by deleting the CloudFormation stack created in the “Deploy the Gitlab Runner stack” section.


This article demonstrated how to utilize IaC to efficiently conduct various administrative tasks associated with a Gitlab Runner. We deployed Gitlab Runner consistently and quickly across multiple accounts. We utilized IaC to enforce guardrails and best practices, such as tracking Gitlab Runner configuration changes, terminating the Gitlab Runner gracefully, and autoscaling the Gitlab Runner to ensure best performance and minimum cost. We walked through the deploying, updating, autoscaling, and terminating of the Gitlab Runner. We also saw how easy it was to clean up the entire Gitlab Runner architecture by simply deleting a CloudFormation stack.

About the authors

Sylvia Qi

Sylvia is a Senior DevOps Architect focusing on architecting and automating DevOps processes, helping customers through their DevOps transformation journey. In her spare time, she enjoys biking, swimming, yoga, and photography.

Sebastian Carreras

Sebastian is a Senior Cloud Application Architect with AWS Professional Services. He leverages his breadth of experience to deliver bespoke solutions to satisfy the visions of his customer. In his free time, he really enjoys doing laundry. Really.

Minimizing Dependencies in a Disaster Recovery Plan

Post Syndicated from Randy DeFauw original

The Availability and Beyond whitepaper discusses the concept of static stability for improving resilience. What does static stability mean with regard to a multi-Region disaster recovery (DR) plan? What if the very tools that we rely on for failover are themselves impacted by a DR event?

In this post, you’ll learn how to reduce dependencies in your DR plan and manually control failover even if critical AWS services are disrupted. As a bonus, you’ll see how to use service control policies (SCPs) to help simulate a Regional outage, so that you can test failover scenarios more realistically.

Failover plan dependencies and considerations

Let’s dig into the DR scenario in more detail. Using Amazon Route 53 for Regional failover routing is a common pattern for DR events. In the simplest case, we’ve deployed an application in a primary Region and a backup Region. We have a Route 53 DNS record set with records for both Regions, and all traffic goes to the primary Region. In an event that triggers our DR plan, we manually or automatically switch the DNS records to direct all traffic to the backup Region.

Relying on an automated health check to control Regional failover can be tricky. A health check might not be perfectly reliable if a Region is experiencing some type of degradation. Often, we prefer to initiate our DR plan manually, which then initiates with automation.

What are the dependencies that we’ve baked into this failover plan? First, Route 53, our DNS service, has to be available. It must continue to serve DNS queries, and we have to be able to change DNS records manually. Second, if we do not have a full set of resources already deployed in the backup Region, we must be able to deploy resources into it.

Both dependencies might violate static stability, because we are relying on resources in our DR plan that might be affected by the outage we’re seeing. Ideally, we don’t want to depend on other services running so we can failover and continue to serve our own traffic. How do we reduce additional dependencies?

Static stability

Let’s look at our first dependency on Route 53 – control planes and data planes. Briefly, a control plane is used to configure resources, and the data plane delivers services (see Understanding Availability Needs for a more complete definition.)

The Route 53 data plane, which responds to DNS queries, is highly resilient across Regions. We can safely rely on it during the failure of any single Region. But let’s assume that for some reason we are not able to call on the Route 53 control plane.

Amazon Route 53 Application Recovery Controller (Route 53 ARC) was built to handle this scenario. It provisions a Route 53 health check that we can manually control with a Route 53 ARC routing control, and is a data plane operation. The Route 53 ARC data plane is highly resilient, using a cluster of five Regional endpoints. You can revise the health check if three of the five Regions are available.

Figure 1. Simple Regional failover scenario using Route 53 Application Recovery Controller

Figure 1. Simple Regional failover scenario using Route 53 Application Recovery Controller

The second dependency, being able to deploy resources into the second Region, is not a concern if we run a fully scaled-out set of resources. We must make sure that our deployment mechanism doesn’t rely only on the primary Region. Most AWS services have Regional control planes, so this isn’t an issue.

The AWS Identity and Access Management (IAM) data plane is highly available in each Region, so you can authorize the creation of new resources as long as you’ve already defined the roles. Note: If you use federated authentication through an identity provider, you should test that the IdP does not itself have a dependency on another Region.

Testing your disaster recovery plan

Once we’ve identified our dependencies, we need to decide how to simulate a disaster scenario. Two mechanisms you can use for this are network access control lists (NACLs) and SCPs. The first one enables us to restrict network traffic to our service endpoints. However, the second allows defining policies that specify the maximum permissions for the target accounts. It also allows us to simulate a Route 53 or IAM control plane outage by restricting access to the service.

For the end-to-end DR simulation, we’ve published an AWS samples repository on GitHub that you can use to deploy. This evaluates Route 53 ARC capabilities if both Route 53 and IAM control planes aren’t accessible.

By deploying test applications across us-east-1 and us-west-1 AWS Regions, we can simulate a real-world scenario that determines the business continuity impact, failover timing, and procedures required for successful failover with unavailable control planes.

Figure 2. Simulating Regional failover using service control policies

Figure 2. Simulating Regional failover using service control policies

Before you conduct the test outlined in our scenario, we strongly recommend that you create a dedicated AWS testing environment with an AWS Organizations setup. Make sure that you don’t attach SCPs to your organization’s root but instead create a dedicated organization unit (OU). You can use this pattern to test SCPs and ensure that you don’t inadvertently lock out users from key services.

Chaos engineering

Chaos engineering is the discipline of experimenting on a system to build confidence in its capability to withstand turbulent production conditions. Chaos engineering and its principles are important tools when you plan for disaster recovery. Even a simple distributed system may be too complex to operate reliably. It can be hard or impossible to plan for every failure scenario in non-trivial distributed systems, because of the number of failure permutations. Chaos experiments test these unknowns by injecting failures (for example, shutting down EC2 instances) or transient anomalies (for example, unusually high network latency.)

In the context of multi-Region DR, these techniques can help challenge assumptions and expose vulnerabilities. For example, what happens if a health check passes but the system itself is unhealthy, or vice versa? What will you do if your entire monitoring system is offline in your primary Region, or too slow to be useful? Are there control plane operations that you rely on that themselves depend on a single AWS Region’s health, such as Amazon Route 53? How does your workload respond when 25% of network packets are lost? Does your application set reasonable timeouts or does it hang indefinitely when it experiences large network latencies?

Questions like these can feel overwhelming, so start with a few, then test and iterate. You might learn that your system can run acceptably in a degraded mode. Alternatively, you might find out that you need to be able to failover quickly. Regardless of the results, the exercise of performing chaos experiments and challenging assumptions is critical when developing a robust multi-Region DR plan.


In this blog, you learned about reducing dependencies in your DR plan. We showed how you can use Amazon Route 53 Application Recovery Controller to reduce a dependency on the Route 53 control plane, and how to simulate a Regional failure using SCPs. As you evaluate your own DR plan, be sure to take advantage of chaos engineering practices. Formulate questions and test your static stability assumptions. And of course, you can incorporate these questions into a custom lens when you run a Well-Architected review using the AWS Well-Architected Tool.

[$] Supporting PGP keys and signatures in the kernel

Post Syndicated from jake original

A few weeks back, we looked at a proposal
to add an integrity-management feature to Fedora. One of the selling
points was that the integrity checking could be done using the PGP
signatures that are already embedded into the RPM package files that Fedora
uses. But the kernel needs to be able to verify PGP signatures in order
for the Fedora feature to work. That addition to the kernel has been proposed, but
some in the kernel-development community seem less than completely
enthusiastic about bringing PGP support into the kernel itself.

A new Polkit vulnerability

Post Syndicated from corbet original

Qualys has announced
the disclosure of a local-root vulnerability in Polkit. They are calling
it “PwnKit” and have even provided a proof-of-concept video.

Successful exploitation of this vulnerability allows any
unprivileged user to gain root privileges on the vulnerable
host. Qualys security researchers have been able to independently
verify the vulnerability, develop an exploit, and obtain full root
privileges on default installations of Ubuntu, Debian, Fedora, and
CentOS. Other Linux distributions are likely vulnerable and
probably exploitable. This vulnerability has been hiding in plain
sight for 12+ years and affects all versions of pkexec since its
first version in May 2009.

Updates from distributors are already rolling out.

New – Replication for Amazon Elastic File System (EFS)

Post Syndicated from Jeff Barr original

Amazon Elastic File System (Amazon EFS) allows EC2 instances, AWS Lambda functions, and containers to share access to a fully-managed file system. First announced in 2015 and generally available in 2016, Amazon EFS delivers low-latency performance for a wide variety of workloads and can scale to thousands of concurrent clients or connections. Since the 2016 launch we have continued to listen and to innovate, and have added many new features and capabilities in response to your feedback. These include on-premises access via Direct Connect (2016), encryption of data at rest (2017), provisioned throughput and encryption of data in transit (2018), an infrequent access storage class (2019), IAM authorization & access points (2020), lower-cost one zone storage classes (2021), and more.

Introducing Replication
Today I am happy to announce that you can now use replication to automatically maintain copies of your EFS file systems for business continuity or to help you to meet compliance requirements as part of your disaster recovery strategy. You can set this up in minutes for new or existing EFS file systems, with replication either within a single AWS region or between two AWS regions in the same AWS partition.

Once configured, replication begins immediately. All replication traffic stays on the AWS global backbone, and most changes are replicated within a minute, with an overall Recovery Point Objective (RPO) of 15 minutes for most file systems. Replication does not consume any burst credits and it does not count against the provisioned throughput of the file system.

Configuring Replication
To configure replication, I open the Amazon EFS Console , view the file system that I want to replicate, and select the Replication tab:

I click Create replication, choose the desired destination region, and select the desired storage (Regional or One Zone). I can use the default KMS key for encryption or I can choose another one. I review my settings and click Create replication to proceed:

Replication begins right away and I can see the new, read-only file system immediately:

A new CloudWatch metric, TimeSinceLastSync, is published when the initial replication is complete, and periodically after that:

The replica is created in the selected region. I create any necessary mount targets and mount the replica on an EC2 instance:

EFS tracks modifications to the blocks (currently 4 MB) that are used to store files and metadata, and replicates the changes at a rate of up to 300 MB per second. Because replication is block-based, it is not crash-consistent; if you need crash-consistency you may want to take a look at AWS Backup.

After I have set up replication, I can change the lifecycle management, intelligent tiering, throughput mode, and automatic backup setting for the destination file system. The performance mode is chosen when the file system is created, and cannot be changed.

Initiating a Fail-Over
If I need to fail over to the replica, I simply delete the replication. I can do this from either side (source or destination), by clicking Delete and confirming my intent:

I enter delete, and click Delete replication to proceed:

The former read-only replica is now a writable file system that I can use as part of my recovery process. To fail-back, I create a replica in the original location, wait for replication to finish, and delete the replication.

I can also use the command line and the EFS APIs to manage replication. For example:

createreplication-configuration / CreateReplicationConfiguration – Establish replication for an existing file system.

describe-replication-configurations / DescribeReplicationConfigurations – See the replication configuration for a source or destination file system, or for all replication configurations in an AWS account. The data returned for a destination file system also includes LastReplicatedTimestamp, the time of the last successful sync.

delete-replication-configuration / DeleteReplicationConfiguration – End replication for a file system.

Available Now
This new feature is available now and you can start using it today in the AWS US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Asia Pacific (Mumbai), Asia Pacific (Osaka), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), Europe (London), Europe (Paris), Europe (Stockholm), South America (São Paulo), and GovCloud Regions.

You pay the usual storage fees for the original and replica file systems and any applicable cross-region or intra-region data transfer charges.


Validate streaming data over Amazon MSK using schemas in cross-account AWS Glue Schema Registry

Post Syndicated from Vikas Bajaj original

Today’s businesses face an unprecedented growth in the volume of data. A growing portion of the data is generated in real time by IoT devices, websites, business applications, and various other sources. Businesses need to process and analyze this data as soon as it arrives to make business decisions in real time. Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed service that enables building and running stream processing applications that use Apache Kafka to collect and process data in real time.

Stream processing applications using Apache Kafka don’t communicate with each other directly; they communicate via sending and receiving messages over Kafka topics. For stream processing applications to communicate efficiently and confidently, a message payload structure must be defined in terms of attributes and data types. This structure describes the schema applications use when sending and receiving messages. However, with a large number of producer and consumer applications, even a small change in schema (removing a field, adding a new field, or change in data type) may cause issues for downstream applications that are difficult to debug and fix.

Traditionally, teams have relied on change management processes (such as approvals and maintenance windows) or other informal mechanisms (documentation, emails, collaboration tools, and so on) to inform one another of data schema changes. However, these mechanisms don’t scale and are prone to mistakes. The AWS Glue Schema Registry allows you to centrally publish, discover, control, validate, and evolve schemas for stream processing applications. With the AWS Glue Schema Registry, you can manage and enforce schemas on data streaming applications using Apache Kafka, Amazon MSK, Amazon Kinesis Data Streams, Amazon Kinesis Data Analytics for Apache Flink, and AWS Lambda.

This post demonstrates how Apache Kafka stream processing applications validate messages using an Apache Avro schema stored in the AWS Glue Schema registry residing in a central AWS account. We use the AWS Glue Schema Registry SerDe library and Avro SpecificRecord to validate messages in stream processing applications while sending and receiving messages from a Kafka topic on an Amazon MSK cluster. Although we use an Avro schema for this post, the same approach and concept applies to JSON schemas as well.

Use case

Let’s assume a fictitious rideshare company that offers unicorn rides. To draw actionable insights, they need to process a stream of unicorn ride request messages. They expect rides to be very popular and want to make sure their solution can scale. They’re also building a central data lake where all their streaming and operation data is stored for analysis. They’re customer obsessed, so they expect to add new fun features to future rides, like choosing the hair color of your unicorn, and will need to reflect these attributes in the ride request messages. To avoid issues in downstream applications due to future schema changes, they need a mechanism to validate messages with a schema hosted in a central schema registry. Having schemas in a central schema registry makes it easier for the application teams to publish, validate, evolve, and maintain schemas in a single place.

Solution overview

The company uses Amazon MSK to capture and distribute the unicorn ride request messages at scale. They define an Avro schema for unicorn ride requests because it provides rich data structures, supports direct mapping to JSON, as well as a compact, fast, and binary data format. Because the schema was agreed in advance, they decided to use Avro SpecificRecord.SpecificRecord is an interface from the Avro library that allows the use of an Avro record as a POJO. This is done by generating a Java class (or classes) from the schema, by using avro-maven-plugin. They use AWS Identity and Access Management (IAM) cross-account roles to allow producer and consumer applications from the other AWS account to safely and securely access schemas in the central Schema Registry account.

The AWS Glue Schema Registry is in Account B, whereas the MSK cluster and Kafka producer and consumer applications are in Account A. We use the following two IAM roles to enable cross-account access to the AWS Glue Schema Registry. Apache Kafka clients in Account A assume a role in Account B using an identity-based policy because the AWS Glue Schema Registry doesn’t support resource-based policies.

  • Account A IAM role – Allows producer and consumer applications to assume an IAM role in Account B.
  • Account B IAM role – Trusts all IAM principals from Account A and allows them to perform read actions on the AWS Glue Schema Registry in Account B. In a real use case scenario, IAM principals that can assume cross-account roles should be scoped more specifically.

The following architecture diagram illustrates the solution:

The solution works as follows:

  1. A Kafka producer running in Account A assumes the cross-account Schema Registry IAM role in Account B by calling the AWS Security Token Service (AWS STS) assumeRole API.
  2. The Kafka producer retrieves the unicorn ride request Avro schema version ID from the AWS Glue Schema Registry for the schema that’s embedded in the unicorn ride request POJO. Fetching the schema version ID is internally managed by the AWS Glue Schema Registry SerDe’s serializer. The serializer has to be configured as part of the Kafka producer configuration.
  3. If the schema exists in the AWS Glue Schema Registry, the serializer decorates the data record with the schema version ID and then serializes it before delivering it to the Kafka topic on the MSK cluster.
  4. The Kafka consumer running in Account A assumes the cross-account Schema Registry IAM role in Account B by calling the AWS STS assumeRole API.
  5. The Kafka consumer starts polling the Kafka topic on the MSK cluster for data records.
  6. The Kafka consumer retrieves the unicorn ride request Avro schema from the AWS Glue Schema Registry, matching the schema version ID that’s encoded in the unicorn ride request data record. Fetching the schema is internally managed by the AWS Glue Schema Registry SerDe’s deserializer. The deserializer has to be configured as part of the Kafka consumer configuration. If the schema exists in the AWS Glue Schema Registry, the deserializer deserializes the data record into the unicorn ride request POJO for the consumer to process it.

The AWS Glue Schema Registry SerDe library also supports optional compression configuration to save on data transfers. For more information about the Schema Registry, see How the Schema Registry works.

Unicorn ride request Avro schema

The following schema (UnicornRideRequest.avsc) defines a record representing a unicorn ride request, which contains ride request attributes along with the customer attributes and system-recommended unicorn attributes:

    "type": "record",
    "name": "UnicornRideRequest",
    "namespace": "demo.glue.schema.registry.avro",
    "fields": [
      {"name": "request_id", "type": "int", "doc": "customer request id"},
      {"name": "pickup_address","type": "string","doc": "customer pickup address"},
      {"name": "destination_address","type": "string","doc": "customer destination address"},
      {"name": "ride_fare","type": "float","doc": "ride fare amount (USD)"},
      {"name": "ride_duration","type": "int","doc": "ride duration in minutes"},
      {"name": "preferred_unicorn_color","type": {"type": "enum","name": "UnicornPreferredColor","symbols": ["WHITE","BLACK","RED","BLUE","GREY"]}, "default": "WHITE"},
        "name": "recommended_unicorn",
        "type": {
          "type": "record",
          "name": "RecommendedUnicorn",
          "fields": [
            {"name": "unicorn_id","type": "int", "doc": "recommended unicorn id"},
            {"name": "color","type": {"type": "enum","name": "unicorn_color","symbols": ["WHITE","RED","BLUE"]}},
            {"name": "stars_rating", "type": ["null", "int"], "default": null, "doc": "unicorn star ratings based on customers feedback"}
        "name": "customer",
        "type": {
          "type": "record",
          "name": "Customer",
          "fields": [
            {"name": "customer_account_no","type": "int", "doc": "customer account number"},
            {"name": "first_name","type": "string"},
            {"name": "middle_name","type": ["null","string"], "default": null},
            {"name": "last_name","type": "string"},
            {"name": "email_addresses","type": ["null", {"type":"array", "items":"string"}]},
            {"name": "customer_address","type": "string","doc": "customer address"},
            {"name": "mode_of_payment","type": {"type": "enum","name": "ModeOfPayment","symbols": ["CARD","CASH"]}, "default": "CARD"},
            {"name": "customer_rating", "type": ["null", "int"], "default": null}


To use this solution, you must have two AWS accounts:

  • Account A – For the MSK cluster, Kafka producer and consumer Amazon Elastic Compute Cloud (Amazon EC2) instances, and AWS Cloud9 environment
  • Account B – For the Schema Registry and schema

For this solution, we use Region us-east-1, but you can change this as per your requirements.

Next, we create the resources in each account using AWS CloudFormation templates.

Create resources in Account B

We create the following resources in Account B:

  • A schema registry
  • An Avro schema
  • An IAM role with the AWSGlueSchemaRegistryReadonlyAccess managed policy and an instance profile, which allows all Account A IAM principals to assume it
  • The UnicornRideRequest.avsc Avro schema shown earlier, which is used as a schema definition in the CloudFormation template

Make sure you have the appropriate permissions to create these resources.

  1. Log in to Account B.
  2. Launch the following CloudFormation stack.
  3. For Stack name, enter SchemaRegistryStack.
  4. For Schema Registry name, enter unicorn-ride-request-registry.
  5. For Avro Schema name, enter unicorn-ride-request-schema-avro.
  6. For the Kafka client’s AWS account ID, enter your Account A ID.
  7. For ExternalId, enter a unique random ID (for example, demo10A), which should be provided by the Kafka clients in Account A while assuming the IAM role in this account.

For more information about cross-account security, see The confused deputy problem.

  1. When the stack is complete, on the Outputs tab of the stack, copy the value for CrossAccountGlueSchemaRegistryRoleArn.

The Kafka producer and consumer applications created in Account A assume this role to access the Schema Registry and schema in Account B.

  1. To verify the resources were created, on the AWS Glue console, choose Schema registries in the navigation bar, and locate unicorn-ride-request-registry.
  2. Choose the registry unicorn-ride-request-registry and verify that it contains unicorn-ride-request-schema-avro in the Schemas section.
  3. Choose the schema to see its content.

The IAM role created by the SchemaRegistryStack stack allows all Account A IAM principals to assume it and perform read actions on the AWS Glue Schema Registry. Let’s look at the trust relationships of the IAM role.

  1. On the SchemaRegistryStack stack Outputs tab, copy the value for CrossAccountGlueSchemaRegistryRoleName.
  2. On the IAM console, search for this role.
  3. Choose Trust relationships and look at its trusted entities to confirm that Account A is listed.
  4. In the Conditions section, confirm that sts:ExternalId has the same unique random ID provided during stack creation.

Create resources in Account A

We create the following resources in Account A:

  • A VPC
  • EC2 instances for the Kafka producer and consumer
  • An AWS Cloud9 environment
  • An MSK cluster

As a prerequisite, create an EC2 keypair and download it on your machine to be able to SSH into EC2 instances. Also create an MSK cluster configuration with default values. You need to have permissions to create the CloudFormation stack, EC2 instances, AWS Cloud9 environment, MSK cluster, MSK cluster configuration, and IAM role.

  1. Log in to Account A.
  2. Launch the following CloudFormation stack to launch the VPC, EC2 instances, and AWS Cloud9 environment.
  3. For Stack name, enter MSKClientStack.
  4. Provide the VPC and subnet CIDR ranges.
  5. For EC2 Keypair, choose an existing EC2 keypair.
  6. For the latest EC2 AMI ID, select the default option.
  7. For the cross-account IAM role ARN, use the value for CrossAccountGlueSchemaRegistryRoleArn (available on the Outputs tab of SchemaRegistryStack).
  8. Wait for the stack to create successfully.
  9. Launch the following CloudFormation stack to create the MSK cluster.
  10. For Stack name, enter MSKClusterStack.
  11. Use Amazon MSK version 2.7.1.
  12. For the MSK cluster configuration ARN, enter the MSK cluster configuration ARN. One that you created as part of the prerequisite.
  13. For the MSK cluster configuration revision number, enter 1 or change it according to your version.
  14. For the client CloudFormation stack name, enter MSKClientStack (the stack name that you created prior to this stack).

Configure the Kafka producer

To configure the Kafka producer accessing the Schema Registry in the central AWS account, complete the following steps:

  1. Log in to Account A.
  2. On the AWS Cloud9 console, choose the Cloud9EC2Bastion environment created by the MSKClientStack stack.
  3. On the File menu, choose Upload Local Files.
  4. Upload the EC2 keypair file that you used earlier while creating the stack.
  5. Open a new terminal and change the EC2 keypair permissions:
    chmod 0400 <keypair PEM file>

  6. SSH into the KafkaProducerInstance EC2 instance and set the Region as per your requirement:
    ssh -i <keypair PEM file> [email protected]<KafkaProducerInstance Private IP address>
    aws configure set region <region>

  7. Set the environment variable MSK_CLUSTER_ARN pointing to the MSK cluster’s ARN:
    export MSK_CLUSTER_ARN=$(aws kafka list-clusters |  jq '.ClusterInfoList[] | select (.ClusterName == "MSKClusterStack") | {ClusterArn} | join (" ")' | tr -d \")

Change the .ClusterName value in the code if you used a different name for the MSK cluster CloudFormation stack. The cluster name is the same as the stack name.

  1. Set the environment variable BOOTSTRAP_BROKERS pointing to the bootstrap brokers:
    export BOOTSTRAP_BROKERS=$(aws kafka get-bootstrap-brokers --cluster-arn $MSK_CLUSTER_ARN | jq -r .BootstrapBrokerString)

  2. Verify the environment variables:

  3. Create a Kafka topic called unicorn-ride-request-topic in your MSK cluster, which is used by the Kafka producer and consumer applications later:
    cd ~/kafka
    ./bin/ --bootstrap-server $BOOTSTRAP_BROKERS \
    --topic unicorn-ride-request-topic \
    --create --partitions 3 --replication-factor 2
    ./bin/ --bootstrap-server $BOOTSTRAP_BROKERS --list

The MSKClientStack stack copied the Kafka producer client JAR file called kafka-cross-account-gsr-producer.jar to the KafkaProducerInstance instance. It contains the Kafka producer client that sends messages to the Kafka topic unicorn-ride-request-topic on the MSK cluster and accesses the unicorn-ride-request-schema-avro Avro schema from the unicorn-ride-request-registry schema registry in Account B. The Kafka producer code, which we cover later in this post, is available on GitHub.

  1. Run the following commands and verify kafka-cross-account-gsr-producer.jar exists:
    cd ~
    ls -ls

  2. Run the following command to run the Kafka producer in the KafkaProducerInstance terminal:
    java -jar kafka-cross-account-gsr-producer.jar -bs $BOOTSTRAP_BROKERS \
    -rn <Account B IAM role arn that Kafka producer application needs to assume> \
    -topic unicorn-ride-request-topic \
    -reg us-east-1 \
    -nm 500 \
    -externalid <Account B IAM role external Id that you used while creating a CF stack in Account B>

The code has the following parameters:

  • -bs$BOOTSTRAP_BROKERS (the MSK cluster bootstrap brokers)
  • -rn – The CrossAccountGlueSchemaRegistryRoleArn value from the SchemaRegistryStack stack outputs in Account B
  • -topic – the Kafka topic unicorn-ride-request-topic
  • -regus-east-1 (change it according to your Region, it’s used for the AWS STS endpoint and Schema Registry)
  • -nm: 500 (the number of messages the producer application sends to the Kafka topic)
  • -externalId – The same external ID (for example, demo10A) that you used while creating the CloudFormation stack in Account B

The following screenshot shows the Kafka producer logs showing Schema Version Id received..., which means it has retrieved the Avro schema unicorn-ride-request-schema-avro from Account B and messages were sent to the Kafka topic on the MSK cluster in Account A.

Kafka producer code

The complete Kafka producer implementation is available on GitHub. In this section, we break down the code.

  • getProducerConfig() initializes the producer properties, as shown in the following code:
    • VALUE_SERIALIZER_CLASS_CONFIG – The GlueSchemaRegistryKafkaSerializer.class.getName() AWS serializer implementation that serializes data records (the implementation is available on GitHub)
    • REGISTRY_NAME – The Schema Registry from Account B
    • SCHEMA_NAME – The schema name from Account B
private Properties getProducerConfig() {
        Properties props = new Properties();
        props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, this.bootstrapServers);
        props.put(ProducerConfig.ACKS_CONFIG, "-1");
        props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, GlueSchemaRegistryKafkaSerializer.class.getName());
        props.put(AWSSchemaRegistryConstants.REGISTRY_NAME, "unicorn-ride-request-registry");
        props.put(AWSSchemaRegistryConstants.SCHEMA_NAME, "unicorn-ride-request-schema-avro");
        props.put(AWSSchemaRegistryConstants.AVRO_RECORD_TYPE, AvroRecordType.SPECIFIC_RECORD.getName());
        return props;
  • startProducer() assumes the role in Account B to be able to connect with the Schema Registry in Account B and sends messages to the Kafka topic on the MSK cluster:
public void startProducer() {
        KafkaProducer<String, UnicornRideRequest> producer = 
		new KafkaProducer<String,UnicornRideRequest>(getProducerConfig());
        int numberOfMessages = Integer.valueOf(str_numOfMessages);"Starting to send records...");
        for(int i = 0;i < numberOfMessages;i ++)
            UnicornRideRequest rideRequest = getRecord(i);
            String key = "key-" + i;
            ProducerRecord<String, UnicornRideRequest> record = 
		new ProducerRecord<String, UnicornRideRequest>(topic, key, rideRequest);
            producer.send(record, new ProducerCallback());
  • assumeGlueSchemaRegistryRole() as shown in the following code uses AWS STS to assume the cross-account Schema Registry IAM role in Account B. (For more information, see Temporary security credentials in IAM.) The response from stsClient.assumeRole(roleRequest) contains the temporary credentials, which include accessKeyId, secretAccessKey, and a sessionToken. It then sets the temporary credentials in the system properties. The AWS SDK for Java uses these credentials while accessing the Schema Registry (through the Schema Registry serializer). For more information, see Using Credentials.
    public void assumeGlueSchemaRegistryRole() {
            try {
    	   Region region = Region.of(regionName);
                     throw new RuntimeException("Region : " + regionName + " is invalid.");
                StsClient stsClient = StsClient.builder().region(region).build();
                AssumeRoleRequest roleRequest = AssumeRoleRequest.builder()
                AssumeRoleResponse roleResponse = stsClient.assumeRole(roleRequest);
                Credentials myCreds = roleResponse.credentials();
                System.setProperty("aws.accessKeyId", myCreds.accessKeyId());
                System.setProperty("aws.secretAccessKey", myCreds.secretAccessKey());
                System.setProperty("aws.sessionToken", myCreds.sessionToken());
            } catch (StsException e) {

  • createUnicornRideRequest() uses the Avro schema (unicorn ride request schema) generated classes to create a SpecificRecord. For this post, the unicorn ride request attributes values are hard-coded in this method. See the following code:
    public UnicornRideRequest getRecord(int requestId){
                 Initialise UnicornRideRequest object of
                 class that is generated from AVRO Schema
               UnicornRideRequest rideRequest = UnicornRideRequest.newBuilder()
                .setPickupAddress("Melbourne, Victoria, Australia")
                .setDestinationAddress("Sydney, NSW, Aus")
                        .setEmailAddresses(Arrays.asList("[email protected]"))
                        .setCustomerAddress("Flinders Street Station")
                return rideRequest;

Configure the Kafka consumer

The MSKClientStack stack created the KafkaConsumerInstance instance for the Kafka consumer application. You can view all the instances created by the stack on the Amazon EC2 console.

To configure the Kafka consumer accessing the Schema Registry in the central AWS account, complete the following steps:

  1. Open a new terminal in the Cloud9EC2Bastion AWS Cloud9 environment.
  2. SSH into the KafkaConsumerInstance EC2 instance and set the Region as per your requirement:
    ssh -i <keypair PEM file> [email protected]<KafkaConsumerInstance Private IP address>
    aws configure set region <region>

  3. Set the environment variable MSK_CLUSTER_ARN pointing to the MSK cluster’s ARN:
    export MSK_CLUSTER_ARN=$(aws kafka list-clusters |  jq '.ClusterInfoList[] | select (.ClusterName == "MSKClusterStack") | {ClusterArn} | join (" ")' | tr -d \")

Change the .ClusterName value if you used a different name for the MSK cluster CloudFormation stack. The cluster name is the same as the stack name.

  1. Set the environment variable BOOTSTRAP_BROKERS pointing to the bootstrap brokers:
    export BOOTSTRAP_BROKERS=$(aws kafka get-bootstrap-brokers --cluster-arn $MSK_CLUSTER_ARN | jq -r .BootstrapBrokerString)

  2. Verify the environment variables:

The MSKClientStack stack copied the Kafka consumer client JAR file called kafka-cross-account-gsr-consumer.jar to the KafkaConsumerInstance instance. It contains the Kafka consumer client that reads messages from the Kafka topic unicorn-ride-request-topic on the MSK cluster and accesses the unicorn-ride-request-schema-avro Avro schema from the unicorn-ride-request-registry registry in Account B. The Kafka consumer code, which we cover later in this post, is available on GitHub.

  1. Run the following commands and verify kafka-cross-account-gsr-consumer.jar exists:
    cd ~
    ls -ls

  2. Run the following command to run the Kafka consumer in the KafkaConsumerInstance terminal:
    java -jar kafka-cross-account-gsr-consumer.jar -bs $BOOTSTRAP_BROKERS \
    -rn <Account B IAM role arn that Kafka consumer application needs to assume> \
    -topic unicorn-ride-request-topic \
    -reg us-east-1 \
    -externalid <Account B IAM role external Id that you used while creating a CF stack in Account B>

The code has the following parameters:

  • -bs$BOOTSTRAP_BROKERS (the MSK cluster bootstrap brokers)
  • -rn – The CrossAccountGlueSchemaRegistryRoleArn value from the SchemaRegistryStack stack outputs in Account B
  • -topic – The Kafka topic unicorn-ride-request-topic
  • -regus-east-1 (change it according to your Region, it’s used for the AWS STS endpoint and Schema Registry)
  • -externalId – The same external ID (for example, demo10A) that you used while creating the CloudFormation stack in Account B

The following screenshot shows the Kafka consumer logs successfully reading messages from the Kafka topic on the MSK cluster in Account A and accessing the Avro schema unicorn-ride-request-schema-avro from the unicorn-ride-request-registry schema registry in Account B.

If you see the similar logs, it means both the Kafka consumer applications have been able to connect successfully with the centralized Schema Registry in Account B and are able to validate messages while sending and consuming messages from the MSK cluster in Account A.

Kafka consumer code

The complete Kafka consumer implementation is available on GitHub. In this section, we break down the code.

  • getConsumerConfig() initializes consumer properties, as shown in the following code:
    • VALUE_DESERIALIZER_CLASS_CONFIG – The GlueSchemaRegistryKafkaDeserializer.class.getName() AWS deserializer implementation that deserializes the SpecificRecord as per the encoded schema ID from the Schema Registry (the implementation is available on GitHub).
private Properties getConsumerConfig() {
        Properties props = new Properties();
        props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, this.bootstrapServers);
        props.put(ConsumerConfig.GROUP_ID_CONFIG, "unicorn.riderequest.consumer");
        props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, GlueSchemaRegistryKafkaDeserializer.class.getName());
        props.put(AWSSchemaRegistryConstants.AWS_REGION, regionName);
        props.put(AWSSchemaRegistryConstants.AVRO_RECORD_TYPE, AvroRecordType.SPECIFIC_RECORD.getName());
        return props;
  • startConsumer() assumes the role in Account B to be able to connect with the Schema Registry in Account B and reads messages from the Kafka topic on the MSK cluster:
public void startConsumer() {"starting consumer...");
  KafkaConsumer<String, UnicornRideRequest> consumer = new KafkaConsumer<String, UnicornRideRequest>(getConsumerConfig());
  int count = 0;
  while (true) {
            final ConsumerRecords<String, UnicornRideRequest> records = consumer.poll(Duration.ofMillis(1000));
            for (final ConsumerRecord<String, UnicornRideRequest> record : records) {
                final UnicornRideRequest rideRequest = record.value();
  • assumeGlueSchemaRegistryRole() as shown in the following code uses AWS STS to assume the cross-account Schema Registry IAM role in Account B. The response from stsClient.assumeRole(roleRequest) contains the temporary credentials, which include accessKeyId, secretAccessKey, and a sessionToken. It then sets the temporary credentials in the system properties. The SDK for Java uses these credentials while accessing the Schema Registry (through the Schema Registry serializer). For more information, see Using Credentials.
public void assumeGlueSchemaRegistryRole() {
        try {
	Region region = Region.of(regionName);
                 throw new RuntimeException("Region : " + regionName + " is invalid.");
            StsClient stsClient = StsClient.builder().region(region).build();
            AssumeRoleRequest roleRequest = AssumeRoleRequest.builder()
            AssumeRoleResponse roleResponse = stsClient.assumeRole(roleRequest);
            Credentials myCreds = roleResponse.credentials();
            System.setProperty("aws.accessKeyId", myCreds.accessKeyId());
            System.setProperty("aws.secretAccessKey", myCreds.secretAccessKey());
            System.setProperty("aws.sessionToken", myCreds.sessionToken());
        } catch (StsException e) {

Compile and generate Avro schema classes

Like any other part of building and deploying your application, schema compilation and the process of generating Avro schema classes should be included in your CI/CD pipeline. There are multiple ways to generate Avro schema classes; we use avro-maven-plugin for this post. The CI/CD process can also use avro-tools to compile Avro schema to generate classes. The following code is an example of how you can use avro-tools:

java -jar /path/to/avro-tools-1.10.2.jar compile schema <schema file> <destination>

//compiling unicorn_ride_request.avsc
java -jar avro-tools-1.10.2.jar compile schema unicorn_ride_request.avsc .

Implementation overview

To recap, we start with defining and registering an Avro schema for the unicorn ride request message in the AWS Glue Schema Registry in Account B, the central data lake account. In Account A, we create an MSK cluster and Kafka producer and consumer EC2 instances with their respective application code (kafka-cross-account-gsr-consumer.jar and kafka-cross-account-gsr-producer.jar) and deployed in them using the CloudFormation stack.

When we run the producer application in Account A, the serializer (GlueSchemaRegistryKafkaSerializer) from the AWS Glue Schema Registry SerDe library provided as the configuration gets the unicorn ride request schema (UnicornRideRequest.avsc) from the central Schema Registry residing in Account B to serialize the unicorn ride request message. It uses the IAM role (temporary credentials) in Account B and Region, schema registry name (unicorn-ride-request-registry), and schema name (unicorn-ride-request-schema-avro) provided as the configuration to connect to the central Schema Registry. After the message is successfully serialized, the producer application sends it to the Kafka topic (unicorn-ride-request-topic) on the MSK cluster.

When we run the consumer application in Account A, the deserializer (GlueSchemaRegistryKafkaDeserializer) from the Schema Registry SerDe library provided as the configuration extracts the encoded schema ID from the message read from the Kafka topic (unicorn-ride-request-topic) and gets the schema for the same ID from the central Schema Registry in Account B. It then deserializes the message. It uses the IAM role (temporary credentials) in Account B and the Region provided as the configuration to connect to the central Schema Registry. The consumer application also configures Avro’s SPECIFIC_RECORD to inform the deserializer that the message is of a specific type (unicorn ride request). After the message is successfully deserialized, the consumer application processes it as per the requirements.

Clean up

The final step is to clean up. To avoid unnecessary charges, you should remove all the resources created by the CloudFormation stacks used for this post. The simplest way to do so is to delete the stacks. First delete the MSKClusterStack followed by MSKClientStack from Account A. Then delete the SchemaRegistryStack from Account B.


In this post, we demonstrated how to use AWS Glue Schema Registry with Amazon MSK and stream processing applications to validate messages using an Avro schema. We created a distributed architecture where the Schema Registry resides in a central AWS account (data lake account) and Kafka producer and consumer applications reside in a separate AWS account. We created an Avro schema in the schema registry in the central account to make it efficient for the application teams to maintain schemas in a single place. Because AWS Glue Schema Registry supports identity-based access policies, we used the cross-account IAM role to allow the Kafka producer and consumer applications running in a separate account to securely access the schema from the central account to validate messages. Because the Avro schema was agreed in advance, we used Avro SpecificRecord to ensure type safety at compile time and avoid runtime schema validation issues at the client side. The code used for this post is available on GitHub for reference.

To learn more about the services and resources in this solution, refer to AWS Glue Schema Registry, the Amazon MSK Developer Guide, the AWS Glue Schema Registry SerDe library, and IAM tutorial: Delegate access across AWS accounts using IAM roles.

About the Author

Vikas Bajaj is a Principal Solutions Architect at Amazon Web Service. Vikas works with digital native customers and advises them on technology architecture and modeling, and options and solutions to meet strategic business objectives. He makes sure designs and solutions are efficient, sustainable, and fit-for-purpose for current and future business needs. Apart from architecture and technology discussions, he enjoys watching and playing cricket.

How to automate AWS account creation with SSO user assignment

Post Syndicated from Rafael Koike original


AWS Control Tower offers a straightforward way to set up and govern an Amazon Web Services (AWS) multi-account environment, following prescriptive best practices. AWS Control Tower orchestrates the capabilities of several other AWS services, including AWS Organizations, AWS Service Catalog, and AWS Single Sign-On (AWS SSO), to build a landing zone very quickly. AWS SSO is a cloud-based service that simplifies how you manage SSO access to AWS accounts and business applications using Security Assertion Markup Language (SAML) 2.0. You can use AWS Control Tower to create and provision new AWS accounts and use AWS SSO to assign user access to those newly-created accounts.

Some customers need to provision tens, if not hundreds, of new AWS accounts at one time and assign access to many users. If you are using AWS Control Tower, doing this requires that you provision an AWS account in AWS Control Tower, and then assign the user access to the AWS account in AWS SSO before moving to the next AWS account. This process adds complexity and time for administrators who manage the AWS environment while delaying users’ access to their AWS accounts.

In this blog post, we’ll show you how to automate creating multiple AWS accounts in AWS Control Tower, and how to automate assigning user access to the AWS accounts in AWS SSO, with the ability to repeat the process easily for subsequent batches of accounts. This solution simplifies the provisioning and assignment processes, while enabling automation for your AWS environment, and allows your builders to start using and experimenting on AWS more quickly.

Services used

This solution uses the following AWS services:

High level solution overview

Figure 1 shows the architecture and workflow of the batch AWS account creation and SSO assignment processes.

Figure 1: Batch AWS account creation and SSO assignment automation architecture and workflow

Figure 1: Batch AWS account creation and SSO assignment automation architecture and workflow

Before starting

This solution is configured to be deployed in the North Virginia Region (us-east-1). But you can change the CloudFormation template to run in any Region that supports all the services required in the solution.

AWS Control Tower Account Factory can take up to 25 minutes to create and provision a new account. During this time, you will be unable to use AWS Control Tower to perform actions such as creating an organizational unit (OU) or enabling a guardrail on an OU. As a recommendation, running this solution during a time period when you do not anticipate using AWS Control Tower’s features is best practice.

Collect needed information

Note: You must have already configured AWS Control Tower, AWS Organizations, and AWS SSO to use this solution.

Before deploying the solution, you need to first collect some information for AWS CloudFormation.

The required information you’ll need to gather in these steps is:

  • AWS SSO instance ARN
  • AWS SSO Identity Store ID
  • Admin email address
  • Amazon S3 bucket
  • AWS SSO user group ARN

Prerequisite information: AWS SSO instance ARN

From the web console

You can find this information under Settings in the AWS SSO web console as shown in Figure 2.

Figure 2: AWS SSO instance ARN

Figure 2: AWS SSO instance ARN

From the CLI

You can also get this information by running the following CLI command using AWS Command Line Interface (AWS CLI):

aws sso-admin list-instances

The output is similar to the following:

    "Instances": [
        "InstanceArn": "arn:aws:sso:::instance/ssoins-abc1234567",
        "IdentityStoreId": "d-123456abcd"

Make a note of the InstanceArn value from the output, as this will be used in the AWS SSO instance ARN.

Prerequisite information: AWS SSO Identity Store ID

This is available from either the web console or the CLI.

From the web console

You can find this information in the same screen as the AWS SSO Instance ARN, as shown in Figure 3.

Figure 3: AWS SSO identity store ID

Figure 3: AWS SSO identity store ID

From the CLI

To find this from the AWS CLI command aws sso-admin list-instances, use the IdentityStoreId from the second key-value pair returned.

Prerequisite information: Admin email address

The admin email address notified when a new AWS account is created.

This email address is used to receive notifications when a new AWS account is created.

Prerequisite information: S3 bucket

The name of the Amazon S3 bucket where the AWS account list CSV files will be uploaded to automate AWS account creation.

This globally unique bucket name will be used to create a new Amazon S3 Bucket, and the automation script will receive events from new objects uploaded to this bucket.

Prerequisite information: AWS SSO user group ARN

Go to AWS SSO > Groups and select the user group whose permission set you would like to assign to the new AWS account. Copy the Group ID from the selected user group. This can be a local AWS SSO user group, or a third-party identity provider-synced user group.

Note: For the AWS SSO user group, there is no AWS CLI equivalent; you need to use the AWS web console to collect this information.

Figure 4: AWS SSO user group ARN

Figure 4: AWS SSO user group ARN

Prerequisite information: AWS SSO permission set

The ARN of the AWS SSO permission set to be assigned to the user group.

From the web console

To view existing permission sets using the AWS SSO web console, go to AWS accounts > Permission sets. From there, you can see a list of permission sets and their respective ARNs.

Figure 5: AWS SSO permission sets list

Figure 5: AWS SSO permission sets list

You can also select the permission set name and from the detailed permission set window, copy the ARN of the chosen permission set. Alternatively, create your own unique permission set to be assigned to the intended user group.

Figure 6: AWS SSO permission set ARN

Figure 6: AWS SSO permission set ARN

From the CLI

To get permission set information from the CLI, run the following AWS CLI command:

aws sso-admin list-permission-sets --instance-arn <SSO Instance ARN>

This command will return an output similar to this:

    "PermissionSets": [

If you can’t determine the details for your permission set from the output of the CLI shown above, you can get the details of each permission set by running the following AWS CLI command:

aws sso-admin describe-permission-set --instance-arn <SSO Instance ARN> --permission-set-arn <PermissionSet ARN>

The output will be similar to this:

    "PermissionSet": {
    "Name": "AWSPowerUserAccess",
    "PermissionSetArn": "arn:aws:sso:::permissionSet/ssoins-abc1234567/ps-abc123def4567890",
    "Description": "Provides full access to AWS services and resources, but does not allow management of Users and groups",
    "CreatedDate": "2020-08-28T11:20:34.242000-04:00",
    "SessionDuration": "PT1H"

The output above lists the name and description of each permission set, which can help you identify which permission set ARN you will use.

Solution initiation

The solution steps are in two parts: the initiation, and the batch account creation and SSO assignment processes.

To initiate the solution

  1. Log in to the management account as the AWS Control Tower administrator, and deploy the provided AWS CloudFormation stack with the required parameters filled out.

    Note: To fill out the required parameters of the solution, refer to steps 1 to 6 of the To launch the AWS CloudFormation stack procedure below.

  2. When the stack is successfully deployed, it performs the following actions to set up the batch process. It creates:
    • The S3 bucket where you will upload the AWS account list CSV file.
    • A DynamoDB table. This table tracks the AWS account creation status.
    • A Lambda function, NewAccountHandler.
    • A Lambda function, CreateManagedAccount. This function is triggered by the entries in the Amazon DynamoDB table and initiates the batch account creation process.
    • An Amazon CloudWatch Events rule to detect the AWS Control Tower CreateManagedAccount lifecycle event.
    • Another Lambda function, CreateAccountAssignment. This function is triggered by AWS Control Tower Lifecycle Events via Amazon CloudWatch Events to assign the AWS SSO Permission Set to the specified User Group and AWS account

To create the AWS Account list CSV file

After you deploy the solution stack, you need to create a CSV file based on this sample.csv and upload it to the Amazon S3 bucket created in this solution. This CSV file will be used to automate the new account creation process.

CSV file format

The CSV file must follow the following format:

Test-account-1,[email protected],[email protected],Fname-1,Lname-1,Test-OU-1,,,
Test-account-2,[email protected],[email protected],Fname-2,Lname-2,Test-OU-2,,,
Test-account-3,[email protected],[email protected],Fname-3,Lname-3,Test-OU-1,,,

Where the first line is the column names, and each subsequent line contains the new AWS accounts that you want to create and automatically assign that SSO user group to the permission set.

CSV fields

AccountName: String between 1 and 50 characters [a-zA-Z0-9_-]
SSOUserEmail: String with more than seven characters and be a valid email address for the primary AWS Administrator of the new AWS account
AccountEmail: String with more than seven characters and be a valid email address not used by other AWS accounts
SSOUserFirstName: String with the first name of the primary AWS Administrator of the new AWS account
SSOUserLastName: String with the last name of the primary AWS Administrator of the new AWS account
OrgUnit: String and must be an existing AWS Organizations OrgUnit
Status: String, for future use
AccountId: String, for future use
ErrorMsg: String, for future use

Figure 7 shows the details that are included in our example for the two new AWS accounts that will be created.

Figure 7: Sample AWS account list CSV

Figure 7: Sample AWS account list CSV

  1. The NewAccountHandler function is triggered from an object upload into the Amazon S3 bucket, validates the input file entries, and uploads the validated input file entries to the Amazon DynamoDB table.
  2. The CreateManagedAccount function queries the DynamoDB table to get the details of the next account to be created. If there is another account to be created, then the batch account creation process moves on to Step 4, otherwise it completes.
  3. The CreateManagedAccount function launches the AWS Control Tower Account Factory product in AWS Service Catalog to create and provision a new account.
  4. After Account Factory has completed the account creation workflow, it generates the CreateManagedAccount lifecycle event, and the event log states if the workflow SUCCEEDED or FAILED.
  5. The CloudWatch Events rule detects the CreateManagedAccount AWS Control Tower Lifecycle Event, and triggers the CreateManagedAccount and CreateAccountAssignment functions, and sends email notification to the administrator via AWS SNS.
  6. The CreateManagedAccount function updates the Amazon DynamoDB table with the results of the AWS account creation workflow. If the account was successfully created, it updates the input file entry in the Amazon DynamoDB table with the account ID; otherwise, it updates the entry in the table with the appropriate failure or error reason.
  7. The CreateAccountAssignment function assigns the AWS SSO Permission Set with the appropriate AWS IAM policies to the User Group specified in the Parameters when launching the AWS CloudFormation stack.
  8. When the Amazon DynamoDB table is updated, the Amazon DynamoDB stream triggers the CreateManagedAccount function for subsequent AWS accounts or when new AWS account list CSV files are updated, then steps 1-9 are repeated.

Upload the CSV file

Once the AWS account list CSV file has been created, upload it into the Amazon S3 bucket created by the stack.

Deploying the solution

To launch the AWS CloudFormation stack

Now that all the requirements and the specifications to run the solution are ready, you can launch the AWS CloudFormation stack:

  1. Open the AWS CloudFormation launch wizard in the console.
  2. In the Create stack page, choose Next.

    Figure 8: Create stack in CloudFormation

    Figure 8: Create stack in CloudFormation

  3. On the Specify stack details page, update the default parameters to use the information you captured in the prerequisites as shown in Figure 9, and choose Next.

    Figure 9: Input parameters into AWS CloudFormation

    Figure 9: Input parameters into AWS CloudFormation

  4. On the Configure stack option page, choose Next.
  5. On the Review page, check the box “I acknowledge that AWS CloudFormation might create IAM resources.” and choose Create Stack.
  6. Once the AWS CloudFormation stack has completed, go to the Amazon S3 web console and select the Amazon S3 bucket that you defined in the AWS CloudFormation stack.
  7. Upload the AWS account list CSV file with the information to create new AWS accounts. See To create the AWS Account list CSV file above for details on creating the CSV file.

Workflow and solution details

When a new file is uploaded to the Amazon S3 bucket, the following actions occur:

  1. When you upload the AWS account list CSV file to the Amazon S3 bucket, the Amazon S3 service triggers an event for newly uploaded objects that invokes the Lambda function NewAccountHandler.
  2. This Lambda function executes the following steps:
    • Checks whether the Lambda function was invoked by an Amazon S3 event, or the CloudFormation CREATE event.
    • If the event is a new object uploaded from Amazon S3, read the object.
    • Validate the content of the CSV file for the required columns and values.
    • If the data has a valid format, insert a new item with the data into the Amazon DynamoDB table, as shown in Figure 10 below.

      Figure 10: DynamoDB table items with AWS accounts details

      Figure 10: DynamoDB table items with AWS accounts details

    • Amazon DynamoDB is configured to initiate the Lambda function CreateManagedAccount when insert, update, or delete items are initiated.
    • The Lambda function CreateManagedAccount checks for update event type. When an item is updated in the table, this item is checked by the Lambda function, and if the AWS account is not created, the Lambda function invokes the AWS Control Tower Account Factory from the AWS Service Catalog to create a new AWS account with the details stored in the Amazon DynamoDB item.
    • AWS Control Tower Account Factory starts the AWS account creation process. When the account creation process completes, the status of Account Factory will show as Available in Provisioned products, as shown in Figure 11.

      Figure 11: AWS Service Catalog provisioned products for AWS account creation

      Figure 11: AWS Service Catalog provisioned products for AWS account creation

    • Based on the Control Tower lifecycle events, the CreateAccountAssignment Lambda function will be invoked when the CreateManagedAccount event is sent to CloudWatch Events. An AWS SNS topic is also triggered to send an email notification to the administrator email address as shown in Figure 12 below.

      Figure 12: AWS email notification when account creation completes

      Figure 12: AWS email notification when account creation completes

    • When invoked, the Lambda function CreateAccountAssignment assigns the AWS SSO user group to the new AWS account with the permission set defined in the AWS CloudFormation stack.

      Figure 13: New AWS account showing user groups with permission sets assigned

      Figure 13: New AWS account showing user groups with permission sets assigned

Figure 13 above shows the new AWS account with the user groups and the assigned permission sets. This completes the automation process. The AWS SSO users that are part of the user group will automatically be allowed to access the new AWS account with the defined permission set.

Handling common sources of error

This solution connects multiple components to facilitate the new AWS account creation and AWS SSO permission set assignment. The correctness of the parameters in the AWS CloudFormation stack is important to make sure that when AWS Control Tower creates a new AWS account, it is accessible.

To verify that this solution works, make sure that the email address is a valid email address, you have access to that email, and it is not being used for any existing AWS account. After a new account is created, it is not possible to change its root account email address, so if you input an invalid or inaccessible email, you will need to create a new AWS account and remove the invalid account.

You can view common errors by going to AWS Service Catalog web console. Under Provisioned products, you can see all of your AWS Control Tower Account Factory-launched AWS accounts.

Figure 14: AWS Service Catalog provisioned product with error

Figure 14: AWS Service Catalog provisioned product with error

Selecting Error under the Status column shows you the source of the error. Figure 15 below is an example of the source of the error:

Figure 15: AWS account creation error explanation

Figure 15: AWS account creation error explanation


In this post, we’ve shown you how to automate batch creation of AWS accounts in AWS Control Tower and batch assignment of user access to AWS accounts in AWS SSO. When the batch AWS accounts creation and AWS SSO user access assignment processes are complete, the administrator will be notified by emails from AWS SNS. We’ve also explained how to handle some common sources of errors and how to avoid them.

As you automate the batch AWS account creation and user access assignment, you can reduce the time you spend on the undifferentiated heavy lifting work, and onboard your users in your organization much more quickly, so they can start using and experimenting on AWS right away.

To learn more about the best practices of setting up an AWS multi-account environment, check out this documentation for more information.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Rafael Koike

Rafael is a Principal Solutions Architect supporting Enterprise customers in SouthEast and part of the Storage TFC. Rafael has a passion to build and his expertise in security, storage, networking and application development have been instrumental to help customers move to the cloud secure and fast. When he is not building he like to do Crossfit and target shooting.

Eugene Toh

Eugene Toh is a Solutions Architect supporting Enterprise customers in the Georgia and Alabama areas. He is passionate in helping customers to transform their businesses and take them to the next level. His area of expertise is in cloud migrations and disaster recovery and he enjoys giving public talks on the latest cloud technologies. Outside of work, he loves trying great food and traveling all over the world.

Backup Solutions for Dentist Offices

Post Syndicated from Molly Clancy original

On top of providing excellent care to patients, dental practices today are tasked with the care of ever more complex IT solutions. Complying with regulations like HIPAA, protecting patient health records, and managing stores of data from X-rays to insurance information are among the demands that dental practices have to meet.

Whether you outsource these tasks to a managed service provider (MSP) or you manage your data infrastructure in house with network attached storage (NAS) or other hardware, understanding backup best practices and the different options available to help you manage your practice’s data is important for your continued success.

Keeping your data safe and accessible doesn’t have to be complicated or expensive. In this post, learn more about records retention for dental offices and how you can implement some simple strategies to keep data safe and protected, including 3-2-1 backups, common NAS devices, and insight from an MSP that specializes in IT services specifical for dental practices.

How Long Should a Dental Office Keep Records?

When thinking about backup and data storage solutions for your dental practice, it helps to first have a good understanding of the records retention requirements for dental offices. The best way to understand how long a dental office should keep records is to check with your state board of dentistry. Regulations on records retention vary by state and by patient type.

Retaining records for at least five to seven years is good practice, but some states will require longer retention periods of up to 10 years. Specific types of patients, including minors, may have different retention periods.

Regardless of your state regulations, records must be kept for five years for patients who receive Medicare or Medicaid. If your state regulations are less than five years, plan to retain records longer for these patients.

Finally, it’s good practice to keep all records for patients with whom you’re involved in any kind of legal dispute until the dispute is settled.

What Is the HIPAA Regulation for Storage of Dental Records?

HIPAA does not govern how long medical or dental records must be retained, but it does govern how long HIPAA-related documentation must be retained. Any HIPAA-related documentation, including things like policies, procedures, authorization forms, etc., must be retained for six years according to guidance in HIPAA policy § 164.316(b)(2)(i) on time limits. Some states may have longer or shorter retention periods. If shorter, HIPAA supersedes state regulations.

How Long Does a Dental Office Need to Keep Insurance EOBs?

Explanations of benefits or EOBs are documents from insurance providers that explain the amounts insurance will pay for services. Retention periods for these documents vary by state as well, so check with your state dental board to see how long you should keep them. Additionally, insurance providers may stipulate how long records must be kept. As a general rule of thumb, the longer retention period supersedes others. The best advice—err on the side of caution and keep records for the longest retention period required by either state or federal law. Fortunately, cloud storage provides you with a simple, affordable way to ensure your retention periods meet or exceed requirements.

3-2-1 Backup Strategy

Understanding how long you need to keep records is the first step in structuring your dental practice’s backup plan. The second is understanding what a good backup strategy looks like. The 3-2-1 backup strategy is a tried and true method for protecting data. It means keeping at least three copies of your data on two different media (i.e. devices) with at least one off-site, generally in the cloud. For a dental practice, we can use a simple X-ray file as an example. That file should live on two different devices on-premises, let’s say a machine reserved for storing X-rays which backs up to a NAS device. That’s two copies. If you then back your NAS device up to cloud storage, that’s your third, off-site copy.

The Benefits of Backing Up Your Dental Practice

Why do you need that many copies, you might ask. There are some tried and true benefits that make a strong case for using a 3-2-1 strategy rather than hoping for the best with fewer copies of your data.

  1. Fast access to files. When you accidentally delete a file, you can restore it quickly from either your on-site or cloud backup. And if you need a file while you’re away from your desk, you can simply log in to your cloud backup and access it immediately.
  2. Quick recoveries from computer crashes. Keeping one copy on-site means you can quickly restore files if one of your machines crashes. You can start up another computer and get immediate access, or you can restore all of the files to a replacement computer.
  3. Reliable recoveries from damage and disaster. Floods, fires, and other disasters do happen. With a copy off-site, your data is one less thing you have to worry about in that unfortunate event. You can access your files remotely if needed and restore them completely when you are able.
  4. Safe recoveries from ransomware attacks. After hearing about so many major ransomware attacks in the news this past year, you might be surprised to know that most attacks are carried out on small to medium-sized businesses. Keeping an off-site copy in the cloud, especially if you take advantage of features like Object Lock, can better prepare you to recover from a ransomware attack.
  5. Compliance with regulatory requirements. As mentioned above, dental practices are subject to retention regulations. Using a cloud backup solution that offers AES encryption helps your practice achieve compliance.

Using NAS for Dental Practices

NAS is essentially a computer connected to a network that provides file-based data storage services to other devices on the network. The primary strength of NAS is how simple it is to set up and deploy.

NAS is frequently the next step up for a small business that is using external hard drives or direct attached storage, which can be especially vulnerable to drive failure. Moving up to NAS offers businesses like dental practices a number of benefits, including:

  • The ability to share files locally and remotely.
  • 24/7 file availability.
  • Data redundancy.
  • Integrations with cloud storage that provides a location for necessary automatic data backups.

If you’re interested in upgrading to NAS, check out our Complete NAS Guide for advice on provisioning the right NAS for your needs and getting the most out of it after you buy it.

➔ Download Our Complete NAS Guide

Hybrid Cloud Strategy for Dental Practices: NAS + Cloud Storage

Most NAS devices come with cloud storage integrations that enable businesses to adopt a hybrid cloud strategy for their data. A hybrid cloud strategy uses a private cloud and public cloud in combination. To expand on that a bit, a hybrid cloud refers to a cloud environment made up of a mixture of typically on-premises, private cloud resources combined with third-party public cloud resources that use some kind of orchestration between them. In this case, your NAS device serves as the on-premises private cloud, as it’s dedicated to only you or your organization, and then you connect it to the public cloud.

Some cloud providers are already integrated with NAS systems. (Backblaze B2 Cloud Storage is integrated with NAS systems from Synology and QNAP, for example.) Check if your preferred NAS system is already integrated with a cloud storage provider to ensure setting up cloud backup, storage, and sync is as easy as possible.

Your NAS should come with a built-in backup manager, like Hyper Backup from Synology or Hybrid Backup Sync from QNAP. Once you download and install the appropriate backup manager app, you can configure it to send backups to your preferred cloud provider. You can also fine-tune the behavior of the backup jobs, including what gets backed up and how often.

Now, you can send backups to the cloud as a third, off-site backup and use your cloud instance to access files anywhere in the world with an internet connection.

Using an MSP for Dental Practices

Many dental practices choose to outsource some or all IT services to an MSP. Making the decision of whether or not to hire an MSP will depend on your individual circumstances and comfort level. Either way, coming to the conversation with an understanding of your backup needs and the cloud backup landscape can help.

Nate Smith, Technical Project Manager at DTC, is responsible for backing up 6,000+ endpoints on 500+ servers at more than 450 dental and doctor’s offices in the mid-Atlantic region. He explained that, due to the sheer number of objects most dentists need to restore (e.g., hundreds of thousands of X-rays), the cost of certain cloud providers can be prohibitive. “If you need something and you need it fast, Amazon Glacier will hit you hard,” he said, referring to the service’s warming fees and retrieval costs.

When seeking out an MSP, make sure to ask about the cloud provider they’re using and how they charge for storage and data transfer. And if you’re not using an MSP, compare costs from different cloud providers to make sure you’re getting the most for your investment in backing up your data.

Cloud Storage and Your Dental Practice

Whether you’re managing your data infrastructure in house with NAS or other hardware, or you’re planning to outsource your IT needs to an MSP, cloud storage should be part of your backup strategy. To recap, having a third copy of your data off-site in the cloud gives you a number of benefits, including:

  • Fast access to your files.
  • Quick recoveries from computer crashes.
  • Reliable recoveries from natural disasters and theft.
  • Protection from ransomware.
  • Compliance with regulatory requirements.

Have questions about choosing a cloud storage provider to back up your dental practice? Let us know in the comments. Ready to get started? Click here to get your first 10GB free with Backblaze B2.

The post Backup Solutions for Dentist Offices appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

How to use tokenization to improve data security and reduce audit scope

Post Syndicated from Tim Winston original

Tokenization of sensitive data elements is a hot topic, but you may not know what to tokenize, or even how to determine if tokenization is right for your organization’s business needs. Industries subject to financial, data security, regulatory, or privacy compliance standards are increasingly looking for tokenization solutions to minimize distribution of sensitive data, reduce risk of exposure, improve security posture, and alleviate compliance obligations. This post provides guidance to determine your requirements for tokenization, with an emphasis on the compliance lens given our experience as PCI Qualified Security Assessors (PCI QSA).

What is tokenization?

Tokenization is the process of replacing actual sensitive data elements with non-sensitive data elements that have no exploitable value for data security purposes. Security-sensitive applications use tokenization to replace sensitive data, such as personally identifiable information (PII) or protected health information (PHI), with tokens to reduce security risks.

De-tokenization returns the original data element for a provided token. Applications may require access to the original data, or an element of the original data, for decisions, analysis, or personalized messaging. To minimize the need to de-tokenize data and to reduce security exposure, tokens can retain attributes of the original data to enable processing and analysis using token values instead of the original data. Common characteristics tokens may retain from the original data are:

Format attributes

Length for compatibility with storage and reports of applications written for the original data
Character set for compatibility with display and data validation of existing applications
Preserved character positions such as first 6 and last 4 for credit card PAN

Analytics attributes

Mapping consistency where the same data always results in the same token
Sort order

Retaining functional attributes in tokens must be implemented in ways that do not defeat the security of the tokenization process. Using attribute preservation functions can possibly reduce the security of a specific tokenization implementation. Limiting the scope and access to tokens addresses limitations introduced when using attribute retention.

Why tokenize? Common use cases

I need to reduce my compliance scope

Tokens are generally not subject to compliance requirements if there is sufficient separation of the tokenization implementation and the applications using the tokens. Encrypted sensitive data may not reduce compliance obligations or scope. Such industry regulatory standards as PCI DSS 3.2.1 still consider systems that store, process, or transmit encrypted cardholder data as in-scope for assessment; whereas tokenized data may remove those systems from assessment scope. A common use case for PCI DSS compliance is replacing PAN with tokens in data sent to a service provider, which keeps the service provider from being subject to PCI DSS.

I need to restrict sensitive data to only those with a “need-to-know”

Tokenization can be used to add a layer of explicit access controls to de-tokenization of individual data items, which can be used to implement and demonstrate least-privileged access to sensitive data. For instances where data may be co-mingled in a common repository such as a data lake, tokenization can help ensure that only those with the appropriate access can perform the de-tokenization process and reveal sensitive data.

I need to avoid sharing sensitive data with my service providers

Replacing sensitive data with tokens before providing it to service providers who have no access to de-tokenize data can eliminate the risk of having sensitive data within service providers’ control, and avoid having compliance requirements apply to their environments. This is common for customers involved in the payment process, which provides tokenization services to merchants that tokenize the card holder data, and return back to their customers a token they can use to complete card purchase transactions.

I need to simplify data lake security and compliance

A data lake centralized repository allows you to store all your structured and unstructured data at any scale, to be used later for not-yet-determined analysis. Having multiple sources and data stored in multiple structured and unstructured formats creates complications for demonstrating data protection controls for regulatory compliance. Ideally, sensitive data should not be ingested at all; however, that is not always feasible. Where ingestion of such data is necessary, tokenization at each data source can keep compliance-subject data out of data lakes, and help avoid compliance implications. Using tokens that retain data attributes, such as data-to-token consistency (idempotence) can support many of the analytical capabilities that make it useful to store data in the data lake.

I want to allow sensitive data to be used for other purposes, such as analytics

Your organization may want to perform analytics on the sensitive data for other business purposes, such as marketing metrics, and reporting. By tokenizing the data, you can minimize the locations where sensitive data is allowed, and provide tokens to users and applications needing to conduct data analysis. This allows numerous applications and processes to access the token data and maintain security of the original sensitive data.

I want to use tokenization for threat mitigation

Using tokenization can help you mitigate threats identified in your workload threat model, depending on where and how tokenization is implemented. At the point where the sensitive data is tokenized, the sensitive data element is replaced with a non-sensitive equivalent throughout the data lifecycle, and across the data flow. Some important questions to ask are:

  • What are the in-scope compliance, regulatory, privacy, or security requirements for the data that will be tokenized?
  • When does the sensitive data need to be tokenized in order to meet security and scope reduction objectives?
  • What attack vector is being addressed for the sensitive data by tokenizing it?
  • Where is the tokenized data being hosted? Is it in a trusted environment or an untrusted environment?

For additional information on threat modeling, see the AWS security blog post How to approach threat modeling.

Tokenization or encryption consideration

Tokens can provide the ability to retain processing value of the data while still managing the data exposure risk and compliance scope. Encryption is the foundational mechanism for providing data confidentiality.

Encryption rarely results in cipher text with a similar format to the original data, and may prevent data analysis, or require consuming applications to adapt.

Your decision to use tokenization instead of encryption should be based on the following:

Reduction of compliance scope As discussed above, by properly utilizing tokenization to obfuscate sensitive data you may be able to reduce the scope of certain framework assessments such as PCI DSS 3.2.1.
Format attributes Used for compatibility with existing software and processes.
Analytics attributes Used to support planned data analysis and reporting.
Elimination of encryption key management A tokenization solution has one essential API—create token—and one optional API—retrieve value from token. Managing access controls can be simpler than some non-AWS native general purpose cryptographic key use policies. In addition, the compromise of the encryption key compromises all data encrypted by that key, both past and future. The compromise of the token database compromises only existing tokens.

Where encryption may make more sense

Although scope reduction, data analytics, threat mitigation, and data masking for the protection of sensitive data make very powerful arguments for tokenization, we acknowledge there may be instances where encryption is the more appropriate solution. Ask yourself these questions to gain better clarity on which solution is right for your company’s use case.

Scalability If you require a solution that scales to large data volumes, and have the availability to leverage encryption solutions that require minimal key management overhead, such as AWS Key Management Services (AWS KMS), then encryption may be right for you.
Data format If you need to secure data that is unstructured, then encryption may be the better option given the flexibility of encryption at various layers and formats.
Data sharing with 3rd parties If you need to share sensitive data in its original format and value with a 3rd party, then encryption may be the appropriate solution to minimize external access to your token vault for de-tokenization processes.

What type of tokenization solution is right for your business?

When trying to decide which tokenization solution to use, your organization should first define your business requirements and use cases.

  1. What are your own specific use cases for tokenized data, and what is your business goal? Identifying which use cases apply to your business and what the end state should be is important when determining the correct solution for your needs.
  2. What type of data does your organization want to tokenize? Understanding what data elements you want to tokenize, and what that tokenized data will be used for may impact your decision about which type of solution to use.
  3. Do the tokens need to be deterministic, the same data always producing the same token? Knowing how the data will be ingested or used by other applications and processes may rule out certain tokenization solutions.
  4. Will tokens be used internally only, or will the tokens be shared across other business units and applications? Identifying a need for shared tokens may increase the risk of token exposure and, therefore, impact your decisions about which tokenization solution to use.
  5. How long does a token need to be valid? You will need to identify a solution that can meet your use cases, internal security policies, and regulatory framework requirements.

Choosing between self-managed tokenization or tokenization as a service

Do you want to manage the tokenization within your organization, or use Tokenization as a Service (TaaS) offered by a third-party service provider? Some advantages to managing the tokenization solution with your company employees and resources are the ability to direct and prioritize the work needed to implement and maintain the solution, customizing the solution to the application’s exact needs, and building the subject matter expertise to remove a dependency on a third party. The primary advantages of a TaaS solution are that it is already complete, and the security of both tokenization and access controls are well tested. Additionally, TaaS inherently demonstrates separation of duties, because privileged access to the tokenization environment is owned by the tokenization provider.

Choosing a reversible tokenization solution

Do you have a business need to retrieve the original data from the token value? Reversible tokens can be valuable to avoid sharing sensitive data with internal or third-party service providers in payments and other financial services. Because the service providers are passed only tokens, they can avoid accepting additional security risk and compliance scope. If your company implements or allows de-tokenization, you will need to be able to demonstrate strict controls on the management and use of de-tokenization privilege. Eliminating the implementation of de-tokenization is the clearest way to demonstrate that downstream applications cannot have sensitive data. Given the security and compliance risks of converting tokenized data back into its original data format, this process should be highly monitored, and you should have appropriate alerting in place to detect each time this activity is performed.

Operational considerations when deciding on a tokenization solution

While operational considerations are outside the scope of this post, they are important factors for choosing a solution. Throughput, latency, deployment architecture, resiliency, batch capability, and multi-regional support can impact the tokenization solution of choice. Integration mechanisms with identity and access control and logging architectures, for example, are important for compliance controls and evidence creation.

No matter which deployment model you choose, the tokenization solution needs to meet security standards, similar to encryption standards, and must prevent determining what the original data is from the token values.


Using tokenization solutions to replace sensitive data offers many security and compliance benefits. These benefits include lowered security risk and smaller audit scope, resulting in lower compliance costs and a reduction in regulatory data handling requirements.

Your company may want to use sensitive data in new and innovative ways, such as developing personalized offerings that use predictive analysis and consumer usage trends and patterns, fraud monitoring and minimizing financial risk based on suspicious activity analysis, or developing business intelligence to improve strategic planning and business performance. If you implement a tokenization solution, your organization can alleviate some of the regulatory burden of protecting sensitive data while implementing solutions that use obfuscated data for analytics.

On the other hand, tokenization may also add complexity to your systems and applications, as well as adding additional costs to maintain those systems and applications. If you use a third-party tokenization solution, there is a possibility of being locked into that service provider due to the specific token schema they may use, and switching between providers may be costly. It can also be challenging to integrate tokenization into all applications that use the subject data.

In this post, we have described some considerations to help you determine if tokenization is right for you, what to consider when deciding which type of tokenization solution to use, and the benefits. disadvantages, and comparison of tokenization and encryption. When choosing a tokenization solution, it’s important for you to identify and understand all of your organizational requirements. This post is intended to generate questions your organization should answer to make the right decisions concerning tokenization.

You have many options available to tokenize your AWS workloads. After your organization has determined the type of tokenization solution to implement based on your own business requirements, explore the tokenization solution options available in AWS Marketplace. You can also build your own solution using AWS guides and blog posts. For further reading, see this blog post: Building a serverless tokenization solution to mask sensitive data.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon Security Assurance Services or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.


Tim Winston

Tim is a Senior Assurance Consultant with AWS Security Assurance Services. He leverages more than 20 years’ experience as a security consultant and assessor to provide AWS customers with guidance on payment security and compliance. He is a co-author of the “Payment Card Industry Data Security Standard (PCI DSS) 3.2.1 on AWS”.


Kristine Harper

Kristine is a Senior Assurance Consultant and PCI DSS Qualified Security Assessor (QSA) with AWS Security Assurance Services. Her professional background includes security and compliance consulting with large fintech enterprises and government entities. In her free time, Kristine enjoys traveling, outdoor activities, spending time with family, and spoiling her pets.


Michael Guzman

Michael is an Assurance Consultant with AWS Security Assurance Services. Michael is a PCI QSA and HITRUST CCSFP, along with holding several AWS certifications. His background is in Financial Services IT Operations and Administrations, with over 20 years experience within that industry. In his spare time Michael enjoy’s spending time with his family, continuing to improve his golf skills and perfecting his Tri-Tip recipe.

Netflix: A Culture of Learning

Post Syndicated from Netflix Technology Blog original

Martin Tingley with Wenjing Zheng, Simon Ejdemyr, Stephanie Lane, Colin McFarland, Mihir Tendulkar, and Travis Brooks

This is the last post in an overview series on experimentation at Netflix. Need to catch up? Earlier posts covered the basics of A/B tests (Part 1 and Part 2 ), core statistical concepts (Part 3 and Part 4), how to build confidence in a decision (Part 5), and the the role of Experimentation and A/B testing within the larger Data Science and Engineering organization at Netflix (Part 6).

Earlier posts in this series covered the why, what and how of A/B testing, all of which are necessary to reap the benefits of experimentation for product development. But without a little magic, these basics are still not enough.

The secret sauce that turns the raw ingredients of experimentation into supercharged product innovation is culture. There are never any shortcuts when developing and growing culture, and fostering a culture of experimentation is no exception. Building leadership buy-in for an approach to learning that emphasizes A/B testing, building trust in the results of tests, and building the technical capabilities to execute experiments at scale all take time — particularly within an organization that’s new to these ideas. But the pay-offs of using experimentation and the virtuous cycle of product development via the scientific method are well worth the effort. Our colleagues at Microsoft have shared thoughtful publications on how to Kickstart the Experimentation Flywheel and build a culture of experimentation, while their “Crawl, Walk, Run, Fly” model is a great tool for assessing the maturity of an experimentation practice.

At Netflix, we’ve been leveraging experimentation and the scientific method for decades, and are fortunate to have a mature experimentation culture. There is broad buy-in across the company, including from the C-Suite, that, whenever possible, results from A/B tests or other causal inference approaches are near-requirements for decision making. We’ve also invested in education programs to up-level company-wide understanding of how we use A/B tests as a framework for product development. In fact, most of the material from this blog series has been adapted from our internal Experimentation 101 and 201 classes, which are open to anyone at Netflix.

Netflix is organized to learn

As a company, Netflix is organized to emphasize the importance of learning from data, including from A/B tests. Our Data and Insights organization has teams that partner with all corners of the company to deliver a better experience to our members, from understanding content preferences around the globe to delivering a seamless customer support experience. We use qualitative and quantitative consumer research, analytics, experimentation, predictive modeling, and other tools to develop a deep understanding of our members. And we own the data pipelines that power everything from executive-oriented dashboards to the personalization systems that help connect each Netflix member with content that will spark joy for them. This data-driven mindset is ubiquitous at all levels of the company, and the Data and Insights organization is represented at the highest echelon of Netflix Leadership.

As discussed in Part 6, there are experimentation and causal inference focussed data scientists who collaborate with product innovation teams across Netflix. These data scientists design and execute tests to support learning agendas and contribute to decision making. By diving deep into the details of single test results, looking for patterns across tests, and exploring other data sources, these Netflix data scientists build up domain expertise about aspects of the Netflix experience and become valued partners to product managers and engineering leaders. Data scientists help shape the evolution of the Netflix product through opportunity sizing and identifying areas ripe for innovation, and frequently propose hypotheses that are subsequently tested.

We’ve also invested in a broad and flexible experimentation platform that allows our experimentation program to scale with the ambitions of the company to learn more and better serve Netflix members. Just as the Netflix product itself has evolved over the years, our approach to developing technologies to support experimentation at scale continues to evolve. In fact, we’ve been working to improve experimentation platform solutions at Netflix for more than 20 years — our first investments in tooling to support A/B tests came way back in 2001.

Early experimentation tooling at Netflix, from 2001.

Learning and experimentation are ubiquitous across Netflix

Netflix has a unique internal culture that reinforces the use of experimentation and the scientific method as a means to deliver more joy to all of our current and future members. As a company, we aim to be curious, and to truly and honestly understand our members around the world, and how we can better entertain them. We are also open minded, knowing that great ideas can come from unlikely sources. There’s no better way to learn and make great decisions than to confirm or falsify ideas and hypotheses using the power of rigorous testing. Openly and candidly sharing test results allows everyone at Netflix to develop intuition about our members and ideas for how we can deliver an ever better experience to them — and then the virtuous cycle starts again.

In fact, Netflix has so many tests running on the product at any given time that a member may be simultaneously allocated to several tests. There is not one Netflix product: at any given time, we are testing out a large number of product variants, always seeking to learn more about how we can deliver more joy to our current members and attract new members. Some tests, such as the Top 10 list, are easy for users to notice, while others, such as changes to the personalization and search systems or how Netflix encodes and delivers streaming video, are less obvious.

At Netflix, we are not afraid to test boldly, and to challenge fundamental or long-held assumptions. The Top 10 list is a great example of both: it’s a large and noticeable change that surfaces a new type of evidence on the Netflix product. Large tests like this can open up whole new areas for innovation, and are actively socialized and debated within the company (see below). On the other end of the spectrum, we also run tests on much smaller scales in order to optimize every aspect of the product. A great example is the testing we do to find just the right text copy for every aspect of the product. By the numbers, we run far more of these smaller and less noticeable tests, and we invest in end-to-end infrastructure that simplifies their execution, allowing product teams to rapidly go from hypothesis to test to roll out of the winning experience. As an example, the Shakespeare project provides an end-to-end solution for rapid text copy testing that integrates with the centralized Netflix experimentation platform. More generally, we are always on the lookout for new areas that can benefit from experimentation, or areas where additional methodology or tooling can produce new or faster learnings.

Debating tests and the importance of humility

Netflix has mature operating mechanisms to debate, make, and socialize product decisions. Netflix does not make decisions by committee or by seeking consensus. Instead, for every significant decision there is a single “Informed Captain” who is ultimately responsible for making a judgment call after digesting relevant data and input from colleagues (including dissenting perspectives). Wherever possible, A/B test results or causal inference studies are an expected input to this decision making process.

In fact, not only are test results expected for product decisions — it’s expected that decisions on investment areas for innovation and testing, test plans for major innovations, and results of major tests are all summarized in memos, socialized broadly, and actively debated. The forums where these debates take place are broadly accessible, ensuring a diverse set of viewpoints provide feedback on test designs and results, and weigh in on decisions. Invites for these forums are open to anyone who is interested, and the price of admission is reading the memo. Despite strong executive attendance, there’s a notable lack of hierarchy in these forums, as we all seek to be led by the data.

Netflix data scientists are active and valued participants in these forums. Data scientists are expected to speak for the data, both what can and what cannot be concluded from experimental results, the pros and cons of different experimental designs, and so forth. Although they are not informed captains on product decisions, data scientists, as interpreters of the data, are active contributors to key product decisions.

Product evolution via experimentation can be a humbling experience. At Netflix, we have experts in every discipline required to develop and evolve the Netflix service (product managers, UI/UX designers, data scientists, engineers of all types, experts in recommendation systems and streaming video optimization — the list goes on), who are constantly coming up with novel hypotheses for how to improve Netflix. But only a small percentage of our ideas turn out to be winners in A/B tests. That’s right: despite our broad expertise, our members let us know, through their actions in A/B tests, that most of our ideas do not improve the service. We build and test hundreds of product variants each year, but only a small percentage end up in production and rolled out to the more than 200 million Netflix members around the world.

The low win rate in our experimentation program is both humbling and empowering. It’s hard to maintain a big ego when anyone at the company can look at the data and see all the big ideas and investments that have ultimately not panned out. But nothing proves the value of decision making through experimentation like seeing ideas that all the experts were bullish on voted down by member actions in A/B tests — and seeing a minor tweak to a sign up flow turn out to be a massive revenue generator.

At Netflix, we do not view tests that do not produce winning experience as “failures.” When our members vote down new product experiences with their actions, we still learn a lot about their preferences, what works (and does not work!) for different member cohorts, and where there may, or may not be, opportunities for innovation. Combining learnings from tests in a given innovation area, such as the Mobile UI experience, helps us paint a more complete picture of the types of experiences that do and do not resonate with our members, leading to new hypotheses, new tests, and, ultimately, a more joyful experience for our members. And as our member base continues to grow globally, and as consumer preferences and expectations continue to evolve, we also revisit ideas that were unsuccessful when originally tested. Sometimes there are signals from the original analysis that suggest now is a better time for that idea, or that it will provide value to some of our newer member cohorts.

Because Netflix tests all ideas, and because most ideas are not winners, our culture of experimentation democratizes ideation. Product managers are always hungry for ideas, and are open to innovative suggestions coming from anyone in the company, regardless of seniority or expertise. After all, we’ll test anything before rolling it out to the member base, and even the experts have low success rates! We’ve seen time and time again at Netflix that product suggestions large and small that arise from engineers, data scientists, even our executives, can result in unexpected wins.

(Left) Very few of our ideas are winners. (Right) Experimentation democratizes ideation. Because we test all ideas, and because most do not win, there’s an openness to product ideas coming from all corners of the business: anyone can raise their hand and make a suggestion.

A culture of experimentation allows more voices to contribute to ideation, and far, far more voices to help inform decision making. It’s a way to get the best ideas from everyone working on the product, and to ensure that the innovations that are rolled out are vetted and approved by members.

A better product for our members and an internal culture that is humble and values ideas and evidence: experimentation is a win-win proposition for Netflix.

Emerging research areas

Although Netflix has been running experiments for decades, we’ve only scratched the surface relative to what we want to learn and the capabilities we need to build to support those learning ambitions. There are open challenges and opportunities across experimentation and causal inference at Netflix: exploring and implementing new methodologies that allow us to learn faster and better; developing software solutions that support research; evolving our internal experimentation platform to better serve a growing user community and ever increasing size and throughput of experiments. And there’s a continuous focus on evolving and growing our experimentation culture through internal events and education programs, as well as external contributions. Here are a few themes that are on our radar:

Increasing velocity: beyond fixed time horizon experimentation.

This series has focused on fixed time horizon tests: sample sizes, the proportion of traffic allocated to each treatment experience, and the test duration are all fixed in advance. In principle, the data are examined only once, at the conclusion of the test. This ensures that the false positive rate (see Part 3) is not increased by peeking at the data numerous times. In practice, we’d like to be able to call tests early, or to adapt how incoming traffic is allocated as we learn incrementally about which treatments are successful and which are not, in a way that preserves the statistical properties described earlier in this series. To enable these benefits, Netflix is investing in sequential experimentation that permits for valid decision making at any time, versus waiting until a fixed time has passed. These methods are already being used to ensure safe deployment of Netflix client applications. We are also investing in support for experimental designs that adaptively allocate traffic throughout the test towards promising treatments. The goal of both these efforts is the same: more rapid identification of experiences that benefit members.

Scaling support for quasi experimentation and causal inference.

Netflix has learned an enormous amount, and dramatically improved almost every aspect of the product, using the classic online A/B tests, or randomized controlled trials, that have been the focus of this series. But not every business question is amenable to A/B testing, whether due to an inability to randomize at the individual level, or due to factors, such as spillover effects, that may violate key assumptions for valid causal inference. In these instances, we often rely on the rigorous evaluation of quasi-experiments, where units are not assigned to a treatment or control condition by a random process. But the term “quasi-experimentation” itself covers a broad category of experimental design and methodological approaches that differ between the myriad academic backgrounds represented by the Netflix data science community. How can we synthesize best practices across domains and scale our approach to enable more colleagues to leverage quasi-experimentation?

Our early successes in this space have been driven by investments in knowledge sharing across business verticals, education, and enablement via tooling. Because quasi-experiment use cases span many domains at Netflix, identifying common patterns has been a powerful driver in developing shared libraries that scientists can use to evaluate individual quasi-experiments. And to support our continued scale, we’ve built internal tooling that coalesces data retrieval, design evaluation, analysis, and reproducible reporting, all with the goal to enable our scientists.

We expect our investments in research, tooling, and education for quasi-experiments to grow over time. In success, we will enable both scientists and their cross functional partners to learn more about how to deliver more joy to current and future Netflix members.

Experimentation Platform as a Product.

We treat the Netflix Experimentation Platform as an internal product, complete with its own product manager and innovation roadmap. We aim to provide an end-to-end paved path for configuring, allocating, monitoring, reporting, storing and analyzing A/B tests, focusing on experimentation use cases that are optimized for simplicity and testing velocity. Our goal is to make experimentation a simple and integrated part of the product lifecycle, with little effort required on the part of engineers, data scientists, or PMs to create, analyze, and act on tests, with automation available wherever the test owner wants it.

However, if the platform’s default paths don’t work for a specific use case, experimenters can leverage our democratized contribution model, or reuse pieces of the platform, to build out their own solutions. As experimenters innovate on the boundaries of what’s possible in measurement methodology, experimental design, and automation, the Experimentation Platform team partners to commoditize these innovations and make them available to the broader organization.

Three core principles guide product development for our experimentation platform:

  • Complexities and nuances of testing such as allocations and methodologies should, typically, be abstracted away from the process of running a single test, with emphasis instead placed on opinionated defaults that are sensible for a set of use cases or testing areas.
  • Manual intervention at specific steps in the test execution should, typically, be optional, with emphasis instead on test owners being able to invest their attention where they feel it adds value and leave other areas to automation.
  • Designing, executing, reporting, deciding, and learning are all different phases of the experiment lifecycle that have differing needs and users, and each stage benefits from purpose built tooling for each use.


Netflix has a strong culture of experimentation, and results from A/B tests, or other applications of the scientific method, are generally expected to inform decisions about how to improve our product and deliver more joy to members. To support the current and future scale of experimentation required by the growing Netflix member base and the increasing complexity of our business, Netflix has invested in culture, people, infrastructure, and internal education to make A/B testing broadly accessible across the company.

And we are continuing to evolve our culture of learning and experimentation to deliver more joy to Netflix members around the world. As our member base and business grows, smaller differences between treatment and control experiences become materially important. That’s also true for subsets of the population: with a growing member base, we can become more targeted and look to deliver positive experiences to cohorts of users, defined by geographical region, device type, etc. As our business grows and expands, we are looking for new places that could benefit from experimentation, ways to run more experiments and learn more with each, and ways to accelerate our experimentation program while making experimentation accessible to more of our colleagues.

But the biggest opportunity is to deliver more joy to our members through the virtuous cycle of experimentation.

Interested in learning more? Explore our research site.

Interested in joining us? Explore our open roles.

Netflix: A Culture of Learning was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Merck Wins Insurance Lawsuit re NotPetya Attack

Post Syndicated from Bruce Schneier original

The insurance company Ace American has to pay for the losses:

On 6th December 2021, the New Jersey Superior Court granted partial summary judgment (attached) in favour of Merck and International Indemnity, declaring that the War or Hostile Acts exclusion was inapplicable to the dispute.

Merck suffered US$1.4 billion in business interruption losses from the Notpetya cyber attack of 2017 which were claimed against “all risks” property re/insurance policies providing coverage for losses resulting from destruction or corruption of computer data and software.

The parties disputed whether the Notpetya malware which affected Merck’s computers in 2017 was an instrument of the Russian government, so that the War or Hostile Acts exclusion would apply to the loss.

The Court noted that Merck was a sophisticated and knowledgeable party, but there was no indication that the exclusion had been negotiated since it was in standard language. The Court, therefore, applied, under New Jersey law, the doctrine of construction of insurance contracts that gives prevalence to the reasonable expectations of the insured, even in exceptional circumstances when the literal meaning of the policy is plain.

Merck argued that the attack was not “an official state action,” which I’m surprised wasn’t successfully disputed.

Slashdot thread.

Security updates for Tuesday

Post Syndicated from corbet original

Security updates have been issued by CentOS (java-11-openjdk), Debian (aide, apr, ipython, openjdk-11, qt4-x11, and strongswan), Fedora (binaryen and rust), Mageia (expat, htmldoc, libreswan, mysql-connector-c++, phpmyadmin, python-celery, python-numpy, and webkit2), openSUSE (kernel and virtualbox), Red Hat (etcd, libreswan, nodejs:14, OpenJDK 11.0.14, OpenJDK 17.0.2, and rpm), Slackware (expat), SUSE (java-1_7_1-ibm, kernel, and zxing-cpp), and Ubuntu (strongswan).

The collective thoughts of the interwebz

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.