Какво искат? Как ще го направят? Кой ще го свърши?

2024-07-26 Емилия Милчева

Post Syndicated from Емилия Милчева original https://www.toest.bg/kakvo-iskat-kak-shte-go-napraviat-koi-shte-go-svurshi/

Някакъв патоген мори българските партии и до изборите наесен, към които се е отправил недостойният 50-ти парламент, не всички от тях ще го преболедуват.

В разпад са и системни, и антисистемни партии и дори онези, които на пръв поглед изглеждат относително стабилни, ще бъдат засегнати от катастрофата.

Проказата

Политическата криза, заради която България отива на седми парламентарни избори за по-малко от четири години, е като проказа за партиите. Тяхната безпомощност, некомпетентност и нежелание да излязат извън тяснопартийните си сметки, за да извършат дълбоката трансформация на държавата, повече не могат да бъдат маскирани зад декларативни хипотези с висока степен на убеденост или отбягване на диалог.

По БНР политологът проф. Димитър Вацов коментира, че политическите сили не успяват да се обърнат към избирателите и да намерят верния тон:

Политическата класа се е самоизолирала и това е част от наблюдаваната криза. Партиите са в дълг към избирателите си.

А ги очаква неблагодарна и належаща работа – да възстановят доверието на гражданите в политическата система, която е пред делегитимация, и да активират обществена подкрепа за радикални реформи в съдебната и други системи. Ако българските политици не са в състояние да осмислят предизвикателствата, няма как да намерят устойчиви решения за тях – следователно са непригодни за тази дейност.

Процесите на вътрешнопартийна конверсия не се изчерпват със смяна на лидерите, нито (само) с промяна на посланията, а с ново политическо съдържание. Именно за това говори евродепутатът Радан Кънев (ПП–ДБ, ЕНП) пред „24 часа“:

Виждам тенденция неуспехите да се отдават на комуникационни проблеми и лош пиар. Това е малката беля. Преди публичните послания идва въпросът за политическото съдържание. Тоест какво се предлага и дали то се одобрява от достатъчен брой граждани.

Така че за предстоящите наесен избори, макар и още ненасрочени, основните въпроси са три:

Какво искат партиите?
Как ще го направят?
Кой ще го свърши?

Отговорите им избирателите държат да научат от партиите, за които искат или обмислят да гласуват. Така че политическите сили следва да ги подготвят без шикалкавене и уклончиви отговори като „Нeка видим резултатите“. Каквито и да са резултатите, границите на политическите съюзи трябва да са предварително ясни, също и целите и кадрите.

ГЕРБ не е титан

Лесно е да си великан в царството на джуджетата. Партия ГЕРБ изглежда непоклатима, но какво от това. Макар и първа политическа сила, сама не може да направи правителство, а повече не може да си го позволи – има толкова гърла да храни. Честите избори уронват и репутацията на партията, и тази на лидера ѝ Бойко Борисов – единственият политик в най-новата ни история, изкарал три премиерски мандата.

Сега Борисов запазва самообладание в очакване на изхода от битката в ДПС, който ще предопредели и съдбата на неговия дългогодишен непубличен сподвижник и все-още-съпредседател на ДПС Делян Пеевски, санкциониран за корупция от САЩ и Великобритания. Дистанцирането му от Пеевски обаче ще е трудно поради установените зависимости. Въпросът е: дори и Пеевски да бъде „изрязан“ от ДПС, ще посмее ли Бойко Борисов да остави регулаторите и съдебната власт без квоти на Пеевски.

Само месец след като назначи хора от ДПС, избрали сега лагера на Доган, на високи позиции в областните управи (които са отговорни за изборите), служебният премиер Димитър Главчев ги замени с кадри на ГЕРБ. Тимур Халилов от „автентичното ДПС“ го упрекна, че открито се е включил в битката в Движението:

Тези действия на Главчев оставят дълбоко съмнение в обществото доколко независим е служебният кабинет и кой диктува какви рокади да се правят.

Къде ще се класира ДПС

Който и да победи във войната, разразила се между Делян Пеевски и почетния председател Ахмед Доган, и да получи законовата възможност да състави листите, няма да направи така, че ДПС да повтори успеха си от 9 юни, когато стана втора политическа сила, и Движението дълго ще се възстановява от последиците.

Пукнатината, която се отваря в ДПС, може да е изход за българите мюсюлмани да потърсят друго политическо представителство – стига партиите да са готови да ги приемат.

Темата за „феодализираните избиратели на ДПС“ е всъщност и шамар към цялата политическа система, която третира българските турци и роми като втора категория хора – не по-различно от тоталитарния режим. Затова и експертът по малцинствени въпроси и съветник на президента Желю Желев – Михаил Иванов – казва пред „Дневник“, че ключът срещу модела „Пеевски“ е в другите партии:

И турците, и помаците в България се нуждаеха от закрила от националистите. Тази закрила те не видяха в останалите партии, видяха я в ДПС. Опитвал съм се да посреднича на мои познати турци и помаци с политически качества да се свържат с ръководствата на ДБ и ПП, но удрях на камък.

Изглежда, че ако успее да отхвърли (само) Пеевски, Движението за права и свободи ще стане по-приемлив партньор за „Продължаваме промяната–Демократична България“. Имиджово, защото същностна промяна в ДПС не се очаква да настъпи – който и да оглави ДПС, той ще е продукт на някой от кръговете.

ПП–ДБ тренират за коалиция

За ПП–ДБ ключовите въпроси са поне три: колко самостойни искат да бъдат в коалиционния си съюз и какъв механизъм за изработване на общи решения да приемат; какво да предложат на избирателите; какво да е основното послание в предизборната кампания.

Новината за договаряне на споразумение между двете части на коалицията бе обявена тази седмица, но детайли по документа не бяха разкрити. Различията в позициите на ПП и ДБ по ключови въпроси станаха публично видими, което извади на преден план необходимостта от координация и единни позиции. Въпреки че за четвърти път се явяват като коалиция на избори (обявиха съюза си на изборите на 2 октомври 2022 г.), досега така и не бяха изработили регламент за партньорството си.

Основното послание на предстоящата им кампания също ще е от значение, тъй като досега обединена кампания липсваше. Пеевски вече не може да е основна мишена, още по-малко Борисов – като бивш, но и потенциален партньор (онези плакати с изображението на двамата и надпис „Какъв искате да бъде вашият премиер“ трябва да получат „Златни малинки“ за неефективен пиар). Преди да е започнала същинската предизборна кампания обаче, Делян Пеевски вече се позиционира като противник и „разобличител“ на корупцията на ПП–ДБ и внесе в прокуратурата сигнал срещу тях. Санкционираният за корупция политик от ДПС поиска от държавното обвинение, ДАНС и МВР да проверят автентичността на записите, в които се чуват реплики като „неофициален кеш“ и „кеш в торби“.

В това време ГЕРБ и ПП–ДБ вече тестват едно ново мнозинство чрез общата си позиция за ускоряване на приемането на еврото. Парламентарната Комисия по бюджет и финанси прие законопроекта за въвеждане на еврото, който ще влезе в пленарна зала за първо четене. А наред с това, по предложение на Мартин Димитров и Асен Василев от ПП–ДБ, Министерство на финансите ще бъде задължено да поиска членство на България в еврозоната от 1 юли 2025 г., ако тази година България покрие изискването за инфлация.

В двуседмичен срок след постигането на този критерий българското правителство се задължава да поиска извънредни конвергентни доклади от Европейската централна банка и от Европейската комисия, които да удостоверят, че критериите за членство в еврозоната са покрити. Ако това се случи, успехът ще се запише на сметките на следващото редовно правителство, което ще влезе в историята.

Предстоящият вот ще покаже дали стъпките, които предприемат ПП–ДБ, ще повишат рязко намалялото им сега парламентарно представителство, или ще го увеличат. Едно е сигурно обаче – ще трябва да оставят моралното превъзходство в предизборния арсенал на „Възраждане“, които, поради факта, че не са управлявали, могат да претендират за него.

„Възраждане“ и нейните посестрими

Чистотата си е чистота, но „Възраждане“ се тресе от страх какво ще се случи с посестримата ѝ „Величие“, а и с други като нея, които чакат на опашка на входа на парламента още от предишните избори. Макар и с разпаднала се парламентарна група, „Величие“ няма намерение да се оттегля от политиката, след като проби на вота на 9 юни. Нейният т.нар. идеолог и създател на атракциона „Исторически парк“ Ивелин Михайлов ще кани и други „войводи“ – например от ВМРО – за общо явяване, за да направят България земен рай. А формирането на безпрецедентната временна комисия в парламента за разследване на схемите около „Исторически парк“, която беше оглавена от депутатката от „Възраждане“ Цвета Рангелова, само ще превърне Михайлов в симпатична жертва на статуквото, срещу което той твърди, че се бори.

В това време Костадинов се опитва да съживи рейтинга си с вече познатите на гражданите референдуми – срещу еврото, срещу НАТО – и… с проект на нов български компютърен шрифт „Руница“.

Решихме да възродим това българско наследство, като направим шрифт, с който да можете да превръщате кирилицата в руница и руницата в кирилица.

Ето, и Костадинов е подходил исторически. Михайлов се хвали със стрелба с лък и възраждане на други прабългарски традиции, а лидерът на „Възраждане“ възражда древната писменост на прабългарите. Така че за октомврийските избори Костадинов и сие ще си бранят ловните полета. Същото ще прави и „Има такъв народ“.

Търси се лидер за БСП

Наесен БСП ще избира свой лидер, като все още не е ясно дали ще го направи по правилата, наложени от председателката в оставка Корнелия Нинова, тоест чрез пряк избор от членовете на партията, или ще ги промени, за да избира конгресът. Онова, което се вижда с просто око, е, че постепенно, също като при Пеевски, Нинова е напускана от свои верни съратници и „Позитано“ 20 вече сменя курса, наложен от нея, в посока към президента, демонстрирайки желание за диалог и разбирателство.

След като бяха проведени срещи с „Левицата!“, „Изправи се, България“, „Българска социалдемокрация – Евролевица“ и ПД „Социалдемократи“, новото ръководство на БСП излезе с призив всички напуснали партията да се върнат в нея. Дали това ще помогне на постепенно изчезващата БСП да подобри резултата си? Едва ли. Нито една от тези политически сили не регистрира значим резултат на избори.

За едно Корнелия Нинова е права – че напусналите БСП са стари лица, а проблемът е, че няма нови идеи. Но такива нямаше и по време на нейното управление, при което БСП придоби характеристиките на национал-популистка формация и се маргинализира.

Наближаващите избори ще разпердушинят някои партии, други ще посвият перки, но никоя няма да се отърве без сътресения. Ако искат да ги боли по-малко, българските политици да покажат малко честност. Също и чест.

Bypass the AT&T Fiber Gateway w/ WAS-110 SFP+

2024-07-26 digiblur DIY

Post Syndicated from digiblur DIY original https://www.youtube.com/watch?v=H72SKt_1BGc

The pilot who flew the Wrong Way, Douglas Corrigan

2024-07-26 The History Guy: History Deserves to Be Remembered

Post Syndicated from The History Guy: History Deserves to Be Remembered original https://www.youtube.com/watch?v=15r4dGd4Y10

Comic for 2024.07.26 – Microplastics

2024-07-26 Explosm.net

Post Syndicated from Explosm.net original https://explosm.net/comics/microplastics

New Cyanide and Happiness Comic

Olympic Sports

2024-07-26 xkcd.com

Post Syndicated from xkcd.com original https://xkcd.com/2964/

Thankfully for everyone involved, the Winter Olympics officials spotted me and managed to stop me before I got to the ski jump.

From Top Dogs to Unified Pack

2024-07-25 Rapid7

Post Syndicated from Rapid7 original https://blog.rapid7.com/2024/07/25/from-top-dogs-to-unified-pack/

Embracing a consolidated security ecosystem

From Top Dogs to Unified Pack

Cybersecurity is as unpredictable as it is rewarding. Each day often presents a new set of challenges and responsibilities, particularly as organizations accelerate digital transformation efforts. This means you and your cyber team may find yourselves navigating a complex landscape of multi-cloud environments and evolving compliance requirements.

So how does that translate into what cyber professionals have to deal with on a daily basis?

A Day in the Life of a Security Professional

In the Trenches

The responsibility of safeguarding sensitive data and protecting that very same data can create a constant pressure to stay one step ahead – of many things. Teams defending environments often face high stress levels and tight deadlines. Unsurprisingly, the demand for skilled security leaders often outpaces the supply of personnel. This is where an array of tools and solutions are introduced to support those teams. And while there are many positives to be had, security teams are often overrun by an array of solutions and vendors, creating increased complexity and vulnerabilities in their organization’s risk posture.

Multiple Vendors Often Means More Work

Using different vendors and solutions for various security functions can help keep things fresh, but it can also be time-consuming and cumbersome. And rather than help teams, it may lead to a decrease in performance. With each platform and tool requiring its own resources, the overall efficiency of your infrastructure and processes may suffer. These performance issues can impact critical business operations and hinder productivity. For instance, by the time you receive a threat alert, the attacker could already be hard at work.

Security analysts require a streamlined work environment that enables them to understand the root cause of alerts from any source with a single click. They shouldn’t have to waste time switching between multiple tools to investigate and remediate potential threats. And when belts start tightening and resources become scarce, managing multiple vendors with different payment cycles can become frustrating.

It pays to find ways to create a security ecosystem without sacrificing the efficacy of its components. By reducing the number of disparate cyber solutions, security professionals can optimize effectiveness and efficiency, subsequently enhancing security posture and reducing their risk profile.

What are the Benefits of a Unified Security Ecosystem?

Widening visibility into your entire IT environment strengthens threat detection capabilities, allowing security teams to minimize the impact of potential cyberattacks. In fact, 41% of organizations surveyed by Gartner say consolidating security solutions improved their risk posture. For some organizations still clinging to the status quo of best of breed solutions, consider the following consolidation benefits when trying to gain executive-buy-in.

1. Identify Systems and Applications at Risk

A robust vulnerability management program should be your first port of call to help identify any systems or applications potentially at risk. It provides your security team with critical insight into potential weaknesses in your IT infrastructure and overall network. Importantly, it will enable you to properly manage and patch vulnerabilities that pose risks to the network, protecting your organization from the possibility of a breach.

2. Safeguard an Evolving Landscape with Real-time Monitoring

Continuous scanning and testing of applications are vital components of a robust security strategy. Consolidating your security tech stack into a centralized ecosystem offers the ability to monitor your infrastructure in real-time and receive in-depth reports for better cross-team collaboration. Actionable insight gained will give you and your security team the autonomy you need to stay ahead of evolving risks and proactively address potential vulnerabilities.

3. Broaden Visibility and Contextual Understanding

Avoid leaving your security team with isolated alerts that require manual investigation and correlation. Integrating data from multiple sources, including endpoints, networks, cloud environments, and applications offers a comprehensive view and analysis of threats across different layers of the IT environment. This holistic approach allows for better correlation of data across various vectors, uncovering complex attack patterns that might otherwise go unnoticed. Consider broadening your context with threat intelligence, providing information about actor groups, typical targets, TTP’s, and more.

How Rapid7 Can Help: Managed Threat Complete

Managed Threat Complete offers a simplified security stack, fueling your D&R program to give you a 24x7x365 SOC, IR, XDR technology, SIEM, SOAR, threat intelligence, and unlimited VRM in a single service. This ensures your environment is monitored round-the-clock and end-to-end by an elite SOC that works transparently with your in-house team, helping to further expand your resources. Learn more.

4. Automate Threat Hunting and Distinguish Friend from Foe

In the face of ever-evolving threats, automating threat hunting becomes a crucial capability. By integrating automation within your consolidated security ecosystem, you’ll be able to quickly discern whether incoming threats are benign or malicious. Streamlined processes allow for efficient identification of potential risks, enabling you and your team to prioritize your efforts for activities that require human effort.

5. Prioritize Risk and Simplify Workflows

The sheer volume of security alerts can overwhelm even the most robust security operations. A consolidated security ecosystem mitigates this challenge by automatically grouping related alerts and prioritizing events that demand immediate attention. Unifying and visualizing activities in one place more rapidly identify the root causes of threats and their potential impact. Armed with this knowledge, you can assess the scope of an incident efficiently, build a timeline of the attack, and take swift, targeted action to effectively neutralize the threat.

6. Swiftly Investigate with End-to-end Digital Forensics

Incident resolution demands a thorough understanding of the attack’s entry point and the ability to track down any traces left by adversaries. With a consolidated security ecosystem, conduct swift and comprehensive investigations using end-to-end digital forensics and review key artifacts such as event logs, registry keys, and browser history across your entire IT environment — significantly enhancing your incident response capabilities. A full view of attacker activity can help you determine the extent of the compromise, identify weaknesses in your defences, and take appropriate remedial actions.

7. Coordinate Responses with Remediation and Policy Enforcement

Enable coordinated responses and future-proof defenses by integrating prevention technologies across your entire tech stack. Leverage communication between various security components and take decisive action against active threats in real-time. For example, an attack blocked on the network can automatically update policies on endpoints, ensuring consistent security measures across your infrastructure. This proactive approach to security ultimately reduces the risk of successful cyberattacks.

Consolidate to Mitigate

With a rapidly changing threat landscape, consolidation offers the security improvements your organization needs to give it the balance of power. Simplifying and streamlining your cybersecurity solutions begins with gaining visibility into your tech stack. This enables your team to identify where consolidation can improve your team’s productivity and effectiveness in detecting and mitigating risk.

Разговори за смъртта. Какво правим преди и какво правим след загубата

2024-07-25 Надежда Цекулова

Post Syndicated from Надежда Цекулова original https://www.toest.bg/razgovori-za-smurtta-kakvo-pravim-predi-i-kakvo-pravim-sled-zagubata/

Разговори за смъртта. Какво правим преди и какво правим след загубата

През последните години имам чувството, че всяко лято ми отнема по някого. Хора, с които уж не сме много близки, но загубата им осезаемо стеснява света ми и отнема хоризонти, които никога повече няма да бъдат достъпни.

Загубата е едно от най-сериозните предизвикателства за психичното здраве. А смъртта е една от най-тежките загуби, пред които може да се изправим. Въпреки това за смъртта у нас се говори рядко, и то най-често от ухо на ухо. Смъртта е страдание, за което в българската (и не само) култура предимно се мълчи.

Д-р Бояна Петкова е един от хората, които се опитват нежно да пробият тишината. През 2012 г. тя създава Фондация „Макове за Мери“, която работи в подкрепа на семейства, преживели перинатална загуба, а през 2017 г. – „Ида – фондация за палиативни грижи за деца“, която, видно от името ѝ, се застъпва за развитието на палиативните грижи за деца в България. С фондациите си Бояна застава зад издаването на детските книги „Сестра ми Мая от небето“ и „Книга за смъртта“. През 2020 г. става съорганизатор на първото за България Death Cafe – срещи за „разговори около смъртта, живота и всичко помежду им“, които се организират и до днес. През цялото време тя разширява познанията си по темата.

По професия Бояна Петкова е детски лекар. По стечение на обстоятелствата е моя приятелка. Следващите редове са колкото интервю, толкова и група за психологическа взаимопомощ. Група от двама, в която се надявам да се почувствате добре дошли.

През изминалите години имаш поредица от активистки инициативи, свързани с изваждането на смъртта от това табуирано и доста загадъчно поле, в което тя съществува в нашия обществен живот. Това не е пряко следствие от професията ти, откъде идва този интерес?

Караш ме да се замисля какво би казала психоаналитичката ми по този въпрос (смее се). Но се сещам за обяснението, което дава Ървин Ялом. Той не е класически психоаналитик, има екзистенциална насоченост и е един от утвърдените съвременни авторитети по тази тема. В книгата „Да се взреш в слънцето“ Ялом обяснява, че има един тип хора, които не могат да избягат от усещането, или по-скоро от знанието, че всичко е преходно. Че всички неща, които обичаш, ще свършат, всичко е преходно и всички хора са смъртни. И има други хора, на които това им излиза на преден план едва когато се случи нещо по-значимо в живота им, когато преживеят някаква голяма загуба.

Та, аз съм по-скоро от първия тип хора, без да имам някакво конкретно обяснение защо.

Какво е усещането ти за мястото на смъртта в нашия общностен живот през всички тези начини, по които ти се занимаваш с темата? Издавала си книги, организираш Deаth Cafe, проучвала си темата както медицински, така и психологически, говорила си по медиите в контекста на тези си дейности – и каква обратна връзка получаваш?

В нашето общество смъртта е, от една страна, много силно ритуализирана – има специални неща, които ние правим, когато някой почине. Тези ритуали впрочем са много различни в различните части на България, общо е придържането към тях. Но те остават в конкретен контекст и конкретен момент. И когато ритуалът свърши, сякаш и нашата възможност да изразяваме чувствата и мислите си за смъртта е свършила. Това, което на мен много ми липсва, е какво правим преди това и какво правим след това.

Понякога смъртта е много неочаквана, внезапна и тогава трудно можем да се подготвим. Но дори при такава смърт ние изпълняваме ритуалите и след тях отново остава дупката, остава пропадането. Друг път смъртта е нещо дългоочаквано, за което обаче пак никой не говори.

Какво е това, което липсва според теб?

Всъщност няма обществен разговор, който да направи темата за смъртта приемлива. Или е много откъслечен. Затова и чрез форма̀та Death Cafe се опитваме да направим така, че темата да присъства ненатрапчиво. Тя да е там и който има нужда, да може да се свърже с нея и да намери хора, които могат да говорят за това. Понякога самите ние, организаторите на Death Cafe, не можем толкова добре да уплътним разговора, но в тези групи винаги има хора, които му дават енергия – дори и ние да не сме активни, той си тръгва самичък.

Аз те прекъснах, беше започнала да говориш за това какво се случва, когато смъртта не е изненадваща, а е дългоочаквана.

Когато дълго време очакваш една смърт, се случва нещо, което на английски се нарича anticipatory grief, не съм сигурна дали преводът „предварителна скръб“ е коректен на български. Но с понятието наричаме процеса на скърбене, преди всъщност да си преживял истинска загуба. Този процес по някакъв начин ни подготвя, но в същото време остава затворен в нас, ако не бъде изговорен на глас. Така е възможно в един дълъг период – ако има дълго боледуване например – ние да губим човека малко по малко, но да не сме успели истински да осмислим това, защото в нашата култура сме много силно вкопчени в биологичния аспект на живеенето. То се вижда много ясно и в начина, по който практикуваме медицинската професия – докато има жизнени показатели, разговорът за смъртта изобщо не се появява, той е табу.

Основният ми двигател за разговора, който водим в момента, беше именно силното впечатление от начина, по който моят социален балон посрещна една внезапна смърт. Десетки хора изразиха скръбта си в социалните мрежи, но някак не се получи разговор между тях. Сякаш всеки от тях се прощаваше с този, когото всички заедно бяха загубили. Но в самота. Това според теб ефект на социалните мрежи ли е, или някакъв резултат от културата ни на възприемане на смъртта и траура?

От една страна, мисля, че е изцяло феномен на социалните мрежи. От друга страна, тази смърт, за която говориш, засегна и мен. И много мислих за себе си тия дни. Мислих за това защо аз обсесивно чета какво споделят други хора на стената на човек, който не ми е бил близък, но съм имала достатъчно досег с него, за да ми е дал нещо от себе си. И да знам, че следващия път, когато съм с тази общност от хора, всички ние ще усещаме липсата му. Всички ние ще знаем, че това, което този конкретен човек допринасяше за тази общност, е вече дупка, няма го и няма как да бъде заменено – защото никой друг не може да участва по точно този начин в точно тази общност.

Това важи за всеки човек във всеки контекст – когато някой се загуби за общността си, няма кой друг да заеме неговото място точно по същия начин. А когато смъртта е внезапна и неочаквана, и то на човек, който е в разцвета на силите си, тя е още по-недопустима, някак умът ти не ти позволява да я приемеш, да я обхванеш, да я осмислиш…

Моето обяснение защо хората пишат спомените си и четат спомените на другите в такава ситуация, е, че всъщност през тези истории ние осмисляме как този човек повече никога нищо няма да може да ни даде и ние нищо няма да можем да му дадем. Тези споделяния са някакъв последен опит на скърбящите да дадат нещо от себе си – спомените си. Това разширява връзката на всеки от опечалените с този, който е починал, защото всеки човек има различни истории, различни спомени. Когато си ги споделяме, ние си ги разменяме и си даваме още малко от този човек, когото всички сме изгубили.

Нещо подобно всъщност се случва и през ритуалите за изпращане и възпоминание. Те предполагат близките на починалия да се срещат в определени дни и най-често тези срещи провокират именно такава размяна на спомени – понякога тъжни, друг път дори ведри, колкото и парадоксално да е това.

Да, така е. Аз не знам дали това е бил изначалният замисъл на тези ритуали, но е факт, че това, което скърбящите най-често правят, е да си разказват живите истории с другия човек и своеобразно си ги подаряват един на друг.

Спомням си сега за последния рожден ден на Марин Бодаков, когото загубихме преди почти три години. Съпругата му Зорница Христова беше помолила приятелите им в социалните мрежи да разкажат своите истории с Марин. И ние цял ден писахме и четохме. Стана един огромен пост и аз всъщност се почувствах по точно този начин – че другите хора ми подаряват своите частици от Марин, а аз им давам моите.

Преди четири години, когато с Марин и Зорница издадохте „Книга за смъртта“, той ми каза в едно интервю, че „българският публичен разговор за смъртта не я припознава и уважава в нейната обичайна форма – от старост, от болест“. Намирам известна горчива ирония, че именно смъртта на Марин е едно ярко опровержение на тази теза. За съжаление, в последните години има и редица други примери за хора, чиято „обичайна“ смърт получава силен отзвук и остава в публичната памет благодарение на богатото им публично наследство. На прима виста се сещам за Боян Петров, Кристиан Таков, Адил Кадъм, Димо Стоянов…

Аз не съм съвсем съгласна с Марин за това. Както споменах в началото на разговора, един от основните съвременни авторитети по темата е Ървин Ялом. Той се позовава на много класически философи, разбира се, но неговата теза по този въпрос е, че един от начините да преодолеем собствения си страх от смъртта и собствената си тревожност е да си дадем сметка за т.нар. вълнови ефект, или ripple effect. Така се наричат концентричните кръгове, които се получават, когато хвърлиш камъче в езеро.

Та, в контекста на въпроса, Ялом разказва, че един от методите, които използва в работата си, е да помогне на хората да си дадат сметка за този ефект – че някакви неща от себе си ние оставяме в другите. И когато един човек вече го няма, у другите остава да живее по нещо, което той им е дал. Тези неща живеят собствен живот, предават се нататък, развиват се, водят до ново съзидание. И мисля, че когато един човек си отиде, точно това се случва в общностите, изградени около него. Това е много творчески начин да се преживее и преработи и загубата, и усещането за обезсърчаване.

Звучи много банално, но баналните неща нерядко са станали банални, защото са верни.

Но какво, ако смъртта е наистина „обикновена“, макар и това да звучи цинично? Много от нас ще се разпознаят в примерния разговор, в който възрастен близък иска да ни сподели мислите си за собствената си очаквана смърт, а ние отговаряме с „Не говори така сега“ и отместваме темата… Този страх да проведем такъв разговор ощетява ли ни?

Да, ощетява ни, защото ни лишава от възможност да вземем нещо от този близък и да му дадем нещо. Да осъществим общуване, което в един момент може да се окаже, че е било важно за нас. И това може да е безвъзвратно – да изгубим завинаги спомени или част от родовата история, или просто едно по-интензивно преживяване на отношенията с този човек.

Преодоляването обаче на страховете, които ни пречат да водим тези разговори, също често изисква допълнително усилие или подкрепа. Има ли в България къде да се потърси такава помощ? Освен Death Cafe, за което споменахме, има ли например групи за подкрепа, за взаимопомощ или дори някакви дискусионни формати?

Не мога да претендирам за изчерпателно познаване на възможностите, но ми се струва подходящо тук да отбележа, че Death Cafe е абсолютно отворен формат. Ние нямаме монопол върху него, никой не „притежава правата“ и практически всеки във всяка общност може да организира такива разговори.

Започнахме по време на пандемията от COVID-19 и до момента имаме над 20 срещи. В България са ми известни и други опити и е много интересно, че всеки има различен профил, теми и фокус. При нас например обичайно средната възраст е около 30 години, защото има млади хора. В Банско преди време беше организирана такава сбирка и средната възраст е била около 60-те. Това е съвсем различна демографска извадка. Една активистка беше организирала среща във феминистки колектив „Kоприва“, където пък средната възраст беше около 20 години. Тези коренно различни групи имат нужда от коренно различен разговор за смъртта.

Извън това има Facebook група „След загубата“. Тя е насочена към родители, изгубили дете, без значение дали бебе, или възрастен човек.

Това обаче са единични и силно ограничени инициативи. Като цяло у нас липсва развита система, която да оказва психологическа подкрепа в процеса на загуба и траур. А тя липсва, защото липсва осъзнатата нужда от тях.

Като че ли липсва и нагласата да разкажем на някой близък как бихме искали да бъдат уредени земните ни дела, ако умрем неочаквано. И това изглежда не по-малко труден разговор от този за вече преживяната загуба. Имат ли стойност такъв тип разговори за психичното ни здраве?

Мисля, че те имат смисъл много повече за другите, отколкото за човека, който изявява волята си чрез един такъв разговор – защото на него към момента на неговата смърт вероятно ще му е все едно. Но ясното волеизявление приживе освобождава близките от трудни решения и чувство за вина. За живите близки – независимо дали вярват в някаква форма на продължаване на живота след смъртта, или напротив, убедени са, че всичко свършва със смъртта – обикновено е важно по някакъв начин да зачетат волята на починалия. Така, сякаш той все още има чувства, желания и предпочитания.

Това създава много трудни за носене чувства у тях – на безсилие, на вина, защото например не знаем дали починалият ни близък е искал да бъде погребан, или кремиран, а ако изберем кремация – дали е искал прахта му да бъде разпръсната някъде, или да си държим урната вкъщи, на камината… Това усложнява процеса на траур и раздяла и тези въпроси, които, отделени от емоцията, са съвсем практически, всъщност могат напълно да смажат скърбящите близки и да блокират траурния процес.

От друга страна, тези разговори имат един по-глобален смисъл. Той не е само за конкретния човек или за неговите близки, с които разговаря, а изпълнява задачата да си напомняме, че сме преходни. Човешкият живот е едновременно много устойчив, способен да издържи чудовищни изпитания, и изключително крехък. Този парадокс е много труден за носене, за държане в ума ни, затова повечето хора не живеят с всекидневно съзнание за него. Но да си напомняме понякога за това, да го изкарваме на светло, всъщност ни помага да изостряме съзнанието си за ценността на отношенията, които имаме, да сме по-благодарни, да имаме съзнание за радостта от живота.

The CrowdStrike Outage and Market-Driven Brittleness

2024-07-25 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/07/the-crowdstrike-outage-and-market-driven-brittleness.html

Friday’s massive internet outage, caused by a mid-sized tech company called CrowdStrike, disrupted major airlines, hospitals, and banks. Nearly 7,000 flights were canceled. It took down 911 systems and factories, courthouses, and television stations. Tallying the total cost will take time. The outage affected more than 8.5 million Windows computers, and the cost will surely be in the billions of dollars—easily matching the most costly previous cyberattacks, such as NotPetya.

The catastrophe is yet another reminder of how brittle global internet infrastructure is. It’s complex, deeply interconnected, and filled with single points of failure. As we experienced last week, a single problem in a small piece of software can take large swaths of the internet and global economy offline.

The brittleness of modern society isn’t confined to tech. We can see it in many parts of our infrastructure, from food to electricity, from finance to transportation. This is often a result of globalization and consolidation, but not always. In information technology, brittleness also results from the fact that hundreds of companies, none of which you’ve heard of, each perform a small but essential role in keeping the internet running. CrowdStrike is one of those companies.

This brittleness is a result of market incentives. In enterprise computing—as opposed to personal computing—a company that provides computing infrastructure to enterprise networks is incentivized to be as integral as possible, to have as deep access into their customers’ networks as possible, and to run as leanly as possible.

Redundancies are unprofitable. Being slow and careful is unprofitable. Being less embedded in and less essential and having less access to the customers’ networks and machines is unprofitable—at least in the short term, by which these companies are measured. This is true for companies like CrowdStrike. It’s also true for CrowdStrike’s customers, who also didn’t have resilience, redundancy, or backup systems in place for failures such as this because they are also an expense that affects short-term profitability.

But brittleness is profitable only when everything is working. When a brittle system fails, it fails badly. The cost of failure to a company like CrowdStrike is a fraction of the cost to the global economy. And there will be a next CrowdStrike, and one after that. The market rewards short-term profit-maximizing systems, and doesn’t sufficiently penalize such companies for the impact their mistakes can have. (Stock prices depress only temporarily. Regulatory penalties are minor. Class-action lawsuits settle. Insurance blunts financial losses.) It’s not even clear that the information technology industry could exist in its current form if it had to take into account all the risks such brittleness causes.

The asymmetry of costs is largely due to our complex interdependency on so many systems and technologies, any one of which can cause major failures. Each piece of software depends on dozens of others, typically written by other engineering teams sometimes years earlier on the other side of the planet. Some software systems have not been properly designed to contain the damage caused by a bug or a hack of some key software dependency.

These failures can take many forms. The CrowdStrike failure was the result of a buggy software update. The bug didn’t get caught in testing and was rolled out to CrowdStrike’s customers worldwide. Sometimes, failures are deliberate results of a cyberattack. Other failures are just random, the result of some unforeseen dependency between different pieces of critical software systems.

Imagine a house where the drywall, flooring, fireplace, and light fixtures are all made by companies that need continuous access and whose failures would cause the house to collapse. You’d never set foot in such a structure, yet that’s how software systems are built. It’s not that 100 percent of the system relies on each company all the time, but 100 percent of the system can fail if any one of them fails. But doing better is expensive and doesn’t immediately contribute to a company’s bottom line.

Economist Ronald Coase famously described the nature of the firm—any business—as a collection of contracts. Each contract has a cost. Performing the same function in-house also has a cost. When the costs of maintaining the contract are lower than the cost of doing the thing in-house, then it makes sense to outsource: to another firm down the street or, in an era of cheap communication and coordination, to another firm on the other side of the planet. The problem is that both the financial and risk costs of outsourcing can be hidden—delayed in time and masked by complexity—and can lead to a false sense of security when companies are actually entangled by these invisible dependencies. The ability to outsource software services became easy a little over a decade ago, due to ubiquitous global network connectivity, cloud and software-as-a-service business models, and an increase in industry- and government-led certifications and box-checking exercises.

This market force has led to the current global interdependence of systems, far and wide beyond their industry and original scope. It’s why flying planes depends on software that has nothing to do with the avionics. It’s why, in our connected internet-of-things world, we can imagine a similar bad software update resulting in our cars not starting one morning or our refrigerators failing.

This is not something we can dismantle overnight. We have built a society based on complex technology that we’re utterly dependent on, with no reliable way to manage that technology. Compare the internet with ecological systems. Both are complex, but ecological systems have deep complexity rather than just surface complexity. In ecological systems, there are fewer single points of failure: If any one thing fails in a healthy natural ecosystem, there are other things that will take over. That gives them a resilience that our tech systems lack.

We need deep complexity in our technological systems, and that will require changes in the market. Right now, the market incentives in tech are to focus on how things succeed: A company like CrowdStrike provides a key service that checks off required functionality on a compliance checklist, which makes it all about the features that they will deliver when everything is working. That’s exactly backward. We want our technological infrastructure to mimic nature in the way things fail. That will give us deep complexity rather than just surface complexity, and resilience rather than brittleness.

How do we accomplish this? There are examples in the technology world, but they are piecemeal. Netflix is famous for its Chaos Monkey tool, which intentionally causes failures to force the systems (and, really, the engineers) to be more resilient. The incentives don’t line up in the short term: It makes it harder for Netflix engineers to do their jobs and more expensive for them to run their systems. Over years, this kind of testing generates more stable systems. But it requires corporate leadership with foresight and a willingness to spend in the short term for possible long-term benefits.

Last week’s update wouldn’t have been a major failure if CrowdStrike had rolled out this change incrementally: first 1 percent of their users, then 10 percent, then everyone. But that’s much more expensive, because it requires a commitment of engineer time for monitoring, debugging, and iterating. And can take months to do correctly for complex and mission-critical software. An executive today will look at the market incentives and correctly conclude that it’s better for them to take the chance than to “waste” the time and money.

The usual tools of regulation and certification may be inadequate, because failure of complex systems is inherently also complex. We can’t describe the unknown unknowns involved in advance. Rather, what we need to codify are the processes by which failure testing must take place.

We know, for example, how to test whether cars fail well. The National Highway Traffic Safety Administration crashes cars to learn what happens to the people inside. But cars are relatively simple, and keeping people safe is straightforward. Software is different. It is diverse, is constantly changing, and has to continually adapt to novel circumstances. We can’t expect that a regulation that mandates a specific list of software crash tests would suffice. Again, security and resilience are achieved through the process by which we fail and fix, not through any specific checklist. Regulation has to codify that process.

Today’s internet systems are too complex to hope that if we are smart and build each piece correctly the sum total will work right. We have to deliberately break things and keep breaking them. This repeated process of breaking and fixing will make these systems reliable. And then a willingness to embrace inefficiencies will make these systems resilient. But the economic incentives point companies in the other direction, to build their systems as brittle as they can possibly get away with.

This essay was written with Barath Raghavan, and previously appeared on Lawfare.com.

За какво служат (понякога) представките и наставките?

2024-07-25

Post Syndicated from original https://www.toest.bg/za-kakvo-sluzhat-ponyakoga-priedstavkite-i-nastavkite/

За какво служат (понякога) представките и наставките?

Дори знанията ви от часовете по български език да са поизбледнели, със сигурност си спомняте, че представките и наставките са важни структурни части на думите. Като добавим и корените, получаваме основния градивен материал, с който си служим, за да назоваваме най-различни понятия в нашата действителност. Да видим в какви процеси са въвлечени представките и наставките и защо е важно да познаваме езиковите механизми, преди да стоварим тежката си присъда върху някоя дума.

Основната функция на представките и наставките е словообразувателната.

Като ги прибавяме към вече съществуващи думи, образуваме нови. Защо са ни нужни нови лексеми и не може ли да си караме с тези, които си имаме, мисля, е ясно. Светът се променя, появяват се нови понятия, които трябва да обозначаваме някак в своята реч. Освен това, колкото по-богат е речникът на един език, толкова по-точно и по-детайлно ще могат да се изразяват хората, които го говорят.

И така, като прибавим наставката -ов към съществителното име пластмаса, ще получим прилагателното пластмасов, за да определим от какъв материал е изработен даден съд. Като прибавим представката до- към глагола чета, получаваме нов глагол – (да) дочета, за да съобщим, че приключваме започнатата книга или статия.

Дотук няма нищо смущаващо – и двете думи се използват безпроблемно в българския език и не предизвикват възражения.

Срещала съм обаче немалко гневни коментари срещу употребата на (да) закупя/закупувам вместо (да) купя/купувам.

Защо първите два глагола се срещат все по-често и хората искат да закупят не апартамент, а нещо далеч по-малко: Търся да закупя следните отвертки: тип „триъгълник, тип Y и тип „вилка“. Една от причините конкретно за предпочитането на закупя според мен е, че на глагола купя му липсва наставка или представка, която изрично да маркира неговия свършен вид. В съвременния български език има само около 50 такива глагола. Останалото огромно мнозинство непроизводни глаголи са от несвършен вид¹, затова езиковото съзнание се стреми да внесе формален показател – представката за-, с който да представи действието като цялостно, завършено, и така прибягва към вече съществуващата дума (да) закупя, разширявайки нейното значение. Оттук нататък до употребата на закупувам с новото, по-широко значение трябва да се направи една съвсем малка и лесна крачка, тъй като голяма част от българските глаголи имат идентична семантика и се различават само по вида си – свършен/несвършен: (да) премина/преминавам; (да) отчета/отчитам; (да) стопля/стоплям.

Друга причина за предпочитането на закупя/закупувам е, че имат по-сложна морфемна структура, която е белег за обработеност и изисканост, а това са характеристики на книжовния език. Неслучайно хората, употребяващи тези два глагола, са упреквани в маниерност, макар че техният стремеж вероятно е да се изразят по-културно, тъй като купя/купувам им звучат тривиално и по-простовато.

След като изкоментирахме закупуването,

да продължим с току-що употребения глагол – изкоментирам, и да добавим към него зарегистрирам.

Те са образувани съответно от коментирам и регистрирам, представители на единствения словообразувателен тип глаголи в нашия език – с наставка -ира- (с разширен вариант -изира-), които са двувидови. Проблемът е, че с това нарушават системността. Всички български глаголи са или от свършен, или от несвършен вид, а обсъжданите са, така да се каже, хибридни и в зависимост от контекста са или от единия, или от другия вид²:

МВР само регистрира катастрофите. (несв. вид)
Утре ще регистрирам колата в данъчната служба. (св. вид)

Да, обаче на езиковото съзнание не му е достатъчно контекстът да определя вида на глагола, затова решава да добави представка или наставка, за да експлицира граматичното значение:

Интересувам се как се регистрирва такъв влекач в България. (несв. вид)
Тогава японската фирма зарегистрира в патентното ведомство на САЩ названията JX20, JX25, JX25h и JX30. (св. вид)

Носителите на езика се отнасят доста по-толерантно към новите глаголи от свършен вид (зарегистрирам е по-приемливо от регистрирвам) и някои от тях вече фигурират в речниците на българския език, вкл. и в БЕРОН, например: напарфюмирам, заангажирам, проконтролирам, прекопирам. Нещо повече, процесът е продължил нататък и от свършените глаголи се образуват несвършени с помощта на представката -в-: зарегистрирвам, заангажирвам, прекопирвам.

Последните думи не галят ухото и са ясно маркирани като разговорни. Книжовният език засега здраво е залостил вратите за тях. Посочвам ги не за да подразня естетическото ви чувство, а защото трябва да можем да говорим, да обсъждаме и да обясняваме причините за съществуването на подобни думи, а не гнусливо да извръщаме поглед от тях. Посочвам ги и защото приемливото в публичното общуване се осъзнава в съпоставка, а понякога и в ярък контраст с неприемливото. Конкретно в тази статия ги посочвам и защото очертават бъдещето на глаголите с наставки -ира- и -изира-: рано или късно те ще бъдат опитомени изцяло от българската видова система и първичните регистрирам, ангажирам, копирам ще се осмислят като несвършени, а не като двувидови³.

За да стане по-ясно, че този процес е изцяло в духа на българския език, ви предлагам две успоредици, които показват как и словообразуването, и видовите трансформации при глаголите с наставки -ира-, -изира- следват утвърдени модели:

бия (несв.) → пробия (св.) → пробивам (несв.)
контролирам (св./несв.) → проконтролирам (св.) → проконтролирвам (несв.)

крия (несв.) → закрия (св.) → закривам (несв.)
ангажирам (св./несв.) → заангажирам (св.) → заангажирвам (несв.)

След това бродене в дълбините на видовите отношения в глаголната система

се изкачваме към ефира с прилагателното име фин. Може би сте го срещали и като финен –

форма, която следва да се окачестви като грешна. В нея има една ненужна наставка. Дали наистина не върши никаква работа? Да си представим, че сме чужденци, които учат български и не правят аналогия с други езици. Ако срещнат думата фин, няма да разберат каква част на речта е. Формата финен обаче може да им подскаже, че това е прилагателно, защото вече ще знаят някои по-често срещани думи от този клас с наставка -ен, например труден, гладен, правилен, летен.

Носителите на българския език, които допускат грешката да напишат или да кажат финен, се опитват да дооформят думата, като ѝ придадат вид, типичен за прилагателните имена – повечето от тях са образувани с най-различни наставки. Като цяло думите усложняват морфемната си структура в историческия развой. В старобългарския език например са засвидетелствани прилагателните малъ; лъжь и лъжьнъ; радъ и радостьнъ. Днешните прилагателни със съответното значение са само с наставки: малък; лъжлив, лъжовен; радостен.

Аз самата ще бъда радостна, ако с тази статия съм успяла да ви провокирам и да ви накарам да погледнете от друг ъгъл на грешките и на думите, които ви дразнят (е, поне на някои). Опитвайте се да ги приемате по-философски, защото един ден част от тях ще бъдат нормализирани и нормирани. Заради логиката на езиковия развой.

¹ Непроизводни са думите, които не са образувани от друга дума в езика. За глаголите това в повечето случаи означава, че нямат представка или наставка в състава си. В тази голяма група глаголи от несвършен вид се включват и образуваните от имена с добавяне на основна гласна (димя, редя). Граматика на съвременния български книжовен език. Т. 2. Морфология. София: Издателство на БАН, с. 260.

² Глаголите с наставки -ира- и -изира- са заети сравнително скоро от западноевропейски езици, в които липсва категорията вид, и затова още не са подчинени на българската езикова система.

³ Граматика на съвременния…, стр. 268. Вече може да се намери потвърждение за това преосмисляне: в по-стария академичен тълковен Речник на българския език трите глагола са определени като двувидови, а в БЕРОН те са от несвършен вид.

Езикът може да е вкусен и извън блюдото – онзи, българският език, на който говорим от малки и на който около 24 май се кълнем в обич. А той в същността си е средство за общуване и за да ни служи добре, непрекъснато се променя. Да го погледнем в неговата динамика и да се опитаме да разберем какво става и защо, кои са движещите механизми и как те са свързани с обществените процеси. И тъй като задачата не е лека, ще го правим постепенно – на порции.

How ATPCO enables governed self-service data access to accelerate innovation with Amazon DataZone

2024-07-25 Brian Olsen

Post Syndicated from Brian Olsen original https://aws.amazon.com/blogs/big-data/how-atpco-enables-governed-self-service-data-access-to-accelerate-innovation-with-amazon-datazone/

This blog post is co-written with Raj Samineni from ATPCO.

In today’s data-driven world, companies across industries recognize the immense value of data in making decisions, driving innovation, and building new products to serve their customers. However, many organizations face challenges in enabling their employees to discover, get access to, and use data easily with the right governance controls. The significant barriers along the analytics journey constrain their ability to innovate faster and make quick decisions.

ATPCO is the backbone of modern airline retailing, enabling airlines and third-party channels to deliver the right offers to customers at the right time. ATPCO’s reach is impressive, with its fare data covering over 89% of global flight schedules. The company collaborates with more than 440 airlines and 132 channels, managing and processing over 350 million fares in its database at any given time. ATPCO’s vision is to be the platform driving innovation in airline retailing while remaining a trusted partner to the airline ecosystem. ATPCO aims to empower data-driven decision-making by making high quality data discoverable by every business unit, with the appropriate governance on who can access what.

In this post, using one of ATPCO’s use cases, we show you how ATPCO uses AWS services, including Amazon DataZone, to make data discoverable by data consumers across different business units so that they can innovate faster. We encourage you to read Amazon DataZone concepts and terminologies first to become familiar with the terms used in this post.

Use case

One of ATPCO’s use cases is to help airlines understand what products, including fares and ancillaries (like premium seat preference), are being offered and sold across channels and customer segments. To support this need, ATPCO wants to derive insights around product performance by using three different data sources:

Airline Ticketing data – 1 billion airline ticket sales data processed through ATPCO
ATPCO pricing data – 87% of worldwide airline offers are powered through ATPCO pricing data. ATPCO is the industry leader in providing pricing and merchandising content for airlines, global distribution systems (GDSs), online travel agencies (OTAs), and other sales channels for consumers to visually understand differences between various offers.
De-identified customer master data – ATPCO customer master data that has been de-identified for sensitive internal analysis and compliance.

In order to generate insights that will then be shared with airlines as a data product, an ATPCO analyst needs to be able to find the right data related to this topic, get access to the data sets, and then use it in a SQL client (like Amazon Athena) to start forming hypotheses and relationships.

Before Amazon DataZone, ATPCO analysts needed to find potential data assets by talking with colleagues; there wasn’t an easy way to discover data assets across the company. This slowed down their pace of innovation because it added time to the analytics journey.

Solution

To address the challenge, ATPCO sought inspiration from a modern data mesh architecture. Instead of a central data platform team with a data warehouse or data lake serving as the clearinghouse of all data across the company, a data mesh architecture encourages distributed ownership of data by data producers who publish and curate their data as products, which can then be discovered, requested, and used by data consumers.

Amazon DataZone provides rich functionality to help a data platform team distribute ownership of tasks so that these teams can choose to operate less like gatekeepers. In Amazon DataZone, data owners can publish their data and its business catalog (metadata) to ATPCO’s DataZone domain. Data consumers can then search for relevant data assets using these human-friendly metadata terms. Instead of access requests from data consumer going to a ATPCO’s data platform team, they now go to the publisher or a delegated reviewer to evaluate and approve. When data consumers use the data, they do so in their own AWS accounts, which allocates their consumption costs to the right cost center instead of a central pool. Amazon DataZone also avoids duplicating data, which saves on cost and reduces compliance tracking. Amazon DataZone takes care of all of the plumbing, using familiar AWS services such as AWS Identity and Access Management (IAM), AWS Glue, AWS Lake Formation, and AWS Resource Access Manager (AWS RAM) in a way that is fully inspectable by a customer.

The following diagram provides an overview of the solution using Amazon DataZone and other AWS services, following a fully distributed AWS account model, where data sets like airline ticket sales, ticket pricing, and de-identified customer data in this use case are stored in different member accounts in AWS Organizations.

Implementation

Now, we’ll walk through how ATPCO implemented their solution to solve the challenges of analysts discovering, getting access to, and using data quickly to help their airline customers.

There are four parts to this implementation:

Set up account governance and identity management.
Create and configure an Amazon DataZone domain.
Publish data assets.
Consume data assets as part of analyzing data to generate insights.

Part 1: Set up account governance and identity management

Before you start, compare your current cloud environment, including data architecture, to ATPCO’s environment. We’ve simplified this environment to the following components for the purpose of this blog post:

ATPCO uses an organization to create and govern AWS accounts.
ATPCO has existing data lake resources set up in multiple accounts, each owned by different data-producing teams. Having separate accounts helps control access, limits the blast radius if things go wrong, and helps allocate and control cost and usage.
In each of their data-producing accounts, ATPCO has a common data lake stack: An Amazon Simple Storage Service (Amazon S3) bucket for data storage, AWS Glue crawler and catalog for updating and storing technical metadata, and AWS LakeFormation (in hybrid access mode) for managing data access permissions.
ATPCO created two new AWS accounts: one to own the Amazon DataZone domain and another for a consumer team to use for analytics with Amazon Athena.
ATPCO enabled AWS IAM Identity Center and connected their identity provider (IdP) for authentication.

We’ll assume that you have a similar setup, though you might choose differently to suit your unique needs.

Part 2: Create and configure an Amazon DataZone domain

After your cloud environment is set up, the steps in Part 2 will help you create and configure an Amazon DataZone domain. A domain helps you organize your data, people, and their collaborative projects, and includes a unique business data catalog and web portal that publishers and consumers will use to share, collaborate, and use data. For ATPCO, their data platform team created and configured their domain.

Step 2.1: Create an Amazon DataZone domain

Persona: Domain administrator

Go to the Amazon DataZone console in your domain account. If you use AWS IAM Identity Center for corporate workforce identity authentication, then select the AWS Region in which your Identity Center instance is deployed. Choose Create domain.

Enter a name and description.
Leave Customize encryption settings (advanced) cleared.
Leave the radio button selected for Create and use a new role. AWS creates an IAM role in your account on your behalf with the necessary IAM permissions for accessing Amazon DataZone APIs.
Leave clear the quick setup option for Set-up this account for data consumption and publishing because we don’t plan to publish or consume data in our domain account.
Skip Add new tag for now. You can always come back later to edit the domain and add tags.
Choose Create Domain.

After a domain is created, you will see a domain detail page similar to the following. Notice that IAM Identity Center is disabled by default.

Step 2.2: Enable IAM Identity Center for your Amazon DataZone domain and add a group

Persona: Domain administrator

By default, your Amazon domain, its APIs, and its unique web portal are accessible by IAM principals in this AWS account with the necessary datazone IAM permissions. ATPCO wanted its corporate employees to be able to use Amazon DataZone with their corporate single sign-on SSO credentials without needing secondary federation to IAM roles. AWS Identity Center is the AWS cross-service solution for passing identity provider credentials. You can skip this step if you plan to use IAM principals directly for accessing Amazon DataZone.

Navigate to your Amazon DataZone domain’s detail page and choose Enable IAM Identity Center.

Scroll down to the User management section and select Enable users in IAM Identity Center. When you do, User and group assignment method options appear below. Turn on Require assignments. This means that you need to explicitly allow (add) users and groups to access your domain. Choose Update domain.

Now let’s add a group to the domain to provide its members with access. Back on your domain’s detail page, scroll to the bottom and choose the User management tab. Choose Add, and select Add SSO Groups from the drop-down.

Enter the first letters of the group name and select it from the options. After you’ve added the desired groups, choose Add group(s).
You can confirm that the groups are added successfully on the domain’s detail page, under the User management tab by selecting SSO Users and then SSO Groups from the drop-down.

Step 2.3: Associate AWS accounts with the domain for segregated data publishing and consumption

Personas: Domain administrator and AWS account owners

Amazon DataZone supports a distributed AWS account structure, where data assets are segregated from data consumption (such as Amazon Athena usage), and data assets are in their own accounts (owned by their respective data owners). We call these associated accounts. Amazon DataZone and the other AWS services it orchestrates take care of the cross-account data sharing. To make this work, domain and account owners need to perform a one-time account association: the domain needs to be shared with the account, and the account owner needs to configure it for use with Amazon DataZone. For ATPCO, there are four desired associated accounts, three of which are the accounts with data assets stored in Amazon S3 and cataloged in AWS Glue (airline ticketing data, pricing data, and de-identified customer data), and a fourth account that is used for an analyst’s consumption.

The first part of associating an account is to share the Amazon DataZone domain with the desired accounts (Amazon DataZone uses AWS RAM to create the resource policy for you). In ATPCO’s case, their data platform team manages the domain, so a team member does these steps.

Todo this in the Amazon DataZone console, sign in to the domain account and navigate to the domain detail page, and then scroll down and choose the Associated Accounts tab. Choose Request association.
Enter the AWS account ID of the first account to be associated.
Choose Add another account and repeat step one for the remaining accounts to be associated. For ATPCO, there were four to-be associated accounts.
When complete, choose Request Association.

The second part of associating an account is for the account owner to then configure their account for use by Amazon DataZone. Essentially, this process means that the account owner is allowing Amazon DataZone to perform actions in the account, like granting access to Amazon DataZone projects after a subscription request is approved.

Sign in to the associated account and go to the Amazon DataZone console in the same Region as the domain. On the Amazon DataZone home page, choose View requests.
Select the name of the inviting Amazon DataZone domain and choose Review request.

Choose the Amazon DataZone blueprint you want to enable. We select Data Lake in this example because ATPCO’s use case has data in Amazon S3 and consumption through Amazon Athena.

Leave the defaults as-is in the Permissions and resources The Glue Manage Access role allows Amazon DataZone to use IAM and LakeFormation to manage IAM roles and permissions to data lake resources after you approve a subscription request in Amazon DataZone. The Provisioning role allows Amazon DataZone to create S3 buckets and AWS Glue databases and tables in your account when you allow users to create Amazon DataZone projects and environments. The Amazon S3 bucket for data lake is where you specify which S3bucket is used by Amazon DataZone when users store data with your account.

Choose Accept & configure association. This will take you to the associated domains table for this associated account, showing which domains the account is associated with. Repeat this process for other to-be associated accounts.

After the associations are configured by accounts, you will see the status reflected in the Associated accounts tab of the domain detail page.

Step 2.4: Set up environment profiles in the domain

Persona: Domain administrator

The final step to prepare the domain is making the associated AWS accounts usable by Amazon DataZone domain users. You do this with an environment profile, which helps less technical users get started publishing or consuming data. It’s like a template, with pre-defined technical details like blueprint type, AWS account ID, and Region. ATPCO’s data platform team set up an environment profile for each associated account.

To do this in the Amazon DataZone console, the data platform team member sign in to the domain account and navigates to the domain detail page, and chooses Open data portal in the upper right to go to the web-based Amazon DataZone portal.

Choose Select project in the upper-left next to the DataZone icon and select Create Project. Enter a name, like Domain Administration and choose Create. This will take you to your new project page.
In the Domain Administration project page, choose the Environments tab, and then choose Environment profiles in the navigation pane. Select Create environment profile.
1. Enter a name, such as Sales – Data lake blueprint.
2. Select the Domain Administration project as owner, and the DefaultDataLake as the blueprint.
3. Select the AWS account with sales data as well as the preferred Region for new resources, such as AWS Glue and Athena consumption.
4. Leave All projects and Any database
5. Finalize your selection by choosing Create Environment Profile.

Repeat this step for each of your associated accounts. As a result, Amazon DataZone users will be able to create environments in their projects to use AWS resources in specific AWS accounts forpublishing or consumption.

Part 3: Publish assets

With Part 2 complete, the domain is ready for publishers to sign in and start publishing the first data assets to the business data catalog so that potential data consumers find relevant assets to help them with their analyses. We’ll focus on how ATPCO published their first data asset for internal analysis—sales data from their airline customers. ATPCO already had the data extracted, transformed, and loaded in a staged S3 bucket and cataloged with AWS Glue.

Step 3.1: Create a project

Persona: Data publisher

Amazon DataZone projects enable a group of users to collaborate with data. In this part of the ATPCO use case, the project is used to publish sales data as an asset in the project. By tying the eventual data asset to a project (rather than a user), the asset will have long-lived ownership beyond the tenure of any single employee or group of employees.

As a data publisher, obtain theURL of the domain’s data portal from your domain administrator, navigate to this sign-in page and authenticate with IAM or SSO. After you’re signed in to the data portal, choose Create Project, enter a name (such as Sales Data Assets) and choose Create.
If you want to add teammates to the project, choose Add Members. On the Project members page, choose Add Members, search for the relevant IAM or SSO principals, and select a role for them in the project. Owners have full permissions in the project, while contributors are not able to edit or delete the project or control membership. Choose Add Members to complete the membership changes.

Step 3.2: Create an environment

Persona: Data publisher

Projects can be comprised of several environments. Amazon DataZone environments are collections of configured resources (for example, an S3 bucket, an AWS Glue database, or an Athena workgroup). They can be useful if you want to manage stages of data production for the same essential data products with separate AWS resources, such as raw, filtered, processed, and curated data stages.

While signed in to the data portal and in the Sales Data Assets project, choose the Environments tab, and then select Create Environment. Enter a name, such as Processed, referencing the processed stage of the underlying data.
Select the Sales – Data lake blueprint environment profile the domain administrator created in Part 2.
Choose Create Environment. Notice that you don’t need any technical details about the AWS account or resources! The creation process might take several minutes while Amazon DataZone sets up Lake Formation, Glue, and Athena.

Step 3.3: Create a new data source and run an ingestion job

Persona: Data publisher

In this use case, ATPCO has cataloged their data using AWS Glue. Amazon DataZone can use AWS Glue as a data source. Amazon DataZone data source (for AWS Glue) is a representation of one or more AWS Glue databases, with the option to set table selection criteria based on their name. Similar to how AWS Glue crawlers scan for new data and metadata, you can run an Amazon DataZone ingestion job against an Amazon DataZone data source (again, AWS Glue) to pull all of the matching tables and technical metadata (such as column headers) as the foundation for one or more data assets. An ingestion job can be run manually or automatically on a schedule.

While signed in to the data portal and in the Sales Data Assets project, choose the Data tab, and then select Data sources. Choose Create Data Source, and enter a name for your data source, such as Processed Sales data in Glue, select AWS Glue as the type, and choose Next.
Select the Processed environment from Step 3.2. In the database name box, enter a value or select from the suggested AWS Glue databases that Amazon DataZone identified in the AWS account. You can add additional criteria and another AWS Glue database.
For Publishing settings, select No. This allows you to review and enrich the suggested assets before publishing them to the business data catalog.
For Metadata generation methods, keep this box selected. Amazon DataZone will provide you with recommended business names for the data assets and its technical schema to publish an asset that’s easier for consumers to find.
Clear Data quality unless you have already set up AWS Glue data quality. Choose Next.
For Run preference, select to run on demand. You can come back later to run this ingestion job automatically on a schedule. Choose Next.
Review the selections and choose Create.

To run the ingestion job for the first time, choose Run in the upper right corner. This will start the job. The run time is dependent on the quantity of databases, tables, and columns in your data source. You can refresh the status by choosing Refresh.

Step 3.4: Review, curate, and publish assets

Persona: Data publisher

After the ingestion job is complete, the matching AWS Glue tables will be added to the project’s inventory. You can then review the asset, including automated metadata generated by Amazon DataZone, add additional metadata, and publish the asset.

While signed in to the data portal and in the Sales Data Assets project, go to the Data tab, and select Inventory. You can review each of the data assets generated by the ingestion job. Let’s select the first result. In the asset detail page, you can edit the asset’s name and description to make it easier to find, especially in a list of search results.
You can edit the Read Me section and add rich descriptions for the asset, with markdown support. This can help reduce the questions consumers message the publisher with for clarification.
You can edit the technical schema (columns), including adding business names and descriptions. If you enabled automated metadata generation, then you’ll see recommendations here that you can accept or reject.
After you are done enriching the asset, you can choose Publish to make it searchable in the business data catalog.

Have the data publisher for each asset follow Part 3. For ATPCO, this means two additional teams followed these steps to get pricing and de-identified customer data into the data catalog.

Part 4: Consume assets as part of analyzing data to generate insights

Now that the business data catalog has three published data assets, data consumers will find available data to start their analysis. In this final part, an ATPCO data analyst can find the assets they need, obtain approved access, and analyze the data in Athena, forming the precursor of a data product that ATPCO can then make available to their customer (such as an airline).

Step 4.1: Discover and find data assets in the catalog

Persona: Data consumer

As a data consumer, obtain the URL of the domain’s data portal from your domain administrator, navigate to in the sign-in page, and authenticate with IAM or SSO. In the data portal, enter text to find data assets that match what you need to complete your analysis. In the ATPCO example, the analyst started by entering ticketing data. This returned the sales asset published above because the description noted that the data was related to “sales, including tickets and ancillaries (like premium seat selection preferences).”

The data consumer reviews the detail page of the sales asset, including the description and human-friendly terms in the schema, and confirms that it’s of use to the analysis. They then choose Subscribe. The data consumer is prompted to select a project for the subscription request, in which case they follow the same instructions as creating a project in Step 3.1, naming it Product analysis project. Enter a short justification of the request. Choose Subscribe to send the request to the data publisher.

Repeat Steps 4.2 and 4.3 for each of the needed data assets for the analysis. In the ATPCO use case, this meant searching for and subscribing to pricing and customer data.

While waiting for the subscription requests to be approved, the data consumer creates an Amazon DataZone environment in the Product analysis project, similar to Step 3.2. The data consumer selects an environment profile for their consumption AWS account and the data lake blueprint.

Step 4.2: Review and approve subscription request

Persona: Data publisher

The next time that a member of the Sales Data Assets project signs in to the Amazon DataZone data portal, they will see a notification of the subscription request. Select that notification or navigate in the Amazon DataZone data portal to the project. Choose the Data tab and Incoming requests and then the Requested tab to find the request. Review the request and decide to either Approve or Reject, while providing a disposition reason for future reference.

Step 4.3: Analyze data

Persona: Data consumer

Now that the data consumer has subscribed to all three data assets needed (by repeating steps 4.1-4.2 for each asset), the data consumer navigates to the Product analysis project in the Amazon DataZone data portal. The data consumer can verify that the project has data asset subscriptions by choosing the Data tab and Subscribed data.

Because the project has an environment with the data lake blueprint enabled in their consumption AWS account, the data consumer will see an icon in the right-side tab called Query Data: Amazon Athena. By selecting this icon, they’re taken to the Amazon Athena console.

In the Amazon Athena console, the data consumer sees the data assets their DataZone project is subscribed to (from steps 4.1-4.2). They use the Amazon Athena query editor to query the subscribed data.

Conclusion

In this post, we walked you through an ATPCO use case to demonstrate how Amazon DataZone allows users across an organization to easily discover relevant data products using business terms. Users can then request access to data and build products and insights faster. By providing self-service access to data with the right governance guardrails, Amazon DataZone helps companies tap into the full potential of their data products to drive innovation and data-driven decision making. If you’re looking for a way to unlock the full potential of your data and democratize it across your organization, then Amazon DataZone can help you transform your business by making data-driven insights more accessible and productive.

To learn more about Amazon DataZone and how to get started, refer to the Getting started guide. See the YouTube playlist for some of the latest demos of Amazon DataZone and short descriptions of the capabilities available.

About the Author

Brian Olsen is a Senior Technical Product Manager with Amazon DataZone. His 15 year technology career in research science and product has revolved around helping customers use data to make better decisions. Outside of work, he enjoys learning new adventurous hobbies, with the most recent being paragliding in the sky.

Mitesh Patel is a Principal Solutions Architect at AWS. His passion is helping customers harness the power of Analytics, machine learning and AI to drive business growth. He engages with customers to create innovative solutions on AWS.

Raj Samineni is the Director of Data Engineering at ATPCO, leading the creation of advanced cloud-based data platforms. His work ensures robust, scalable solutions that support the airline industry’s strategic transformational objectives. By leveraging machine learning and AI, Raj drives innovation and data culture, positioning ATPCO at the forefront of technological advancement.

Sonal Panda is a Senior Solutions Architect at AWS with over 20 years of experience in architecting and developing intricate systems, primarily in the financial industry. Her expertise lies in Generative AI, application modernization leveraging microservices and serverless architectures to drive innovation and efficiency.

Manage Amazon Redshift provisioned clusters with Terraform

2024-07-25 Amit Ghodke

Post Syndicated from Amit Ghodke original https://aws.amazon.com/blogs/big-data/manage-amazon-redshift-provisioned-clusters-with-terraform/

Amazon Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it straightforward and cost-effective to analyze all your data using standard SQL and your existing extract, transform, and load (ETL); business intelligence (BI); and reporting tools. Tens of thousands of customers use Amazon Redshift to process exabytes of data per day and power analytics workloads such as BI, predictive analytics, and real-time streaming analytics.

HashiCorp Terraform is an infrastructure as code (IaC) tool that lets you define cloud resources in human-readable configuration files that you can version, reuse, and share. You can then use a consistent workflow to provision and manage your infrastructure throughout its lifecycle.

In this post, we demonstrate how to use Terraform to manage common Redshift cluster operations, such as:

Creating a new provisioned Redshift cluster using Terraform code and adding an AWS Identity and Access Management (IAM) role to it
Scheduling pause, resume, and resize operations for the Redshift cluster

Solution overview

The following diagram illustrates the solution architecture for provisioning a Redshift cluster using Terraform.

In addition to Amazon Redshift, the solution uses the following AWS services:

Amazon Elastic Compute Cloud (Amazon EC2) offers the broadest and deepest compute platform, with over 750 instances and choice of the latest processors, storage, networking, operating system (OS), and purchase model to help you best match the needs of your workload. For this post, we use an m5.xlarge instance with the Windows Server 2022 Datacenter Edition. The choice of instance type and Windows OS is flexible; you can choose a configuration that suits your use case.
IAM allows you to securely manage identities and access to AWS services and resources. We use IAM roles and policies to securely access services and perform relevant operations. An IAM role is an AWS identity that you can assume to gain temporary access to AWS services and resources. Each IAM role has a set of permissions defined by IAM policies. These policies determine the actions and resources the role can access.
AWS Secrets Manager allows you to securely store the user name and password needed to log in to Amazon Redshift.

In this post, we demonstrate how to set up an environment that connects AWS and Terraform. The following are the high-level tasks involved:

Set up an EC2 instance with Windows OS in AWS.
Install Terraform on the instance.
Configure your environment variables (Windows OS).
Define an IAM policy to have minimum access to perform activities on a Redshift cluster, including pause, resume, and resize.
Establish an IAM role using the policy you created.
Create a provisioned Redshift cluster using Terraform code.
Attach the IAM role you created to the Redshift cluster.
Write the Terraform code to schedule cluster operations like pause, resume, and resize.

Prerequisites

To complete the activities described in this post, you need an AWS account and administrator privileges on the account to use the key AWS services and create the necessary IAM roles.

Create an EC2 instance

We begin with creating an EC2 instance. Complete the following steps to create a Windows OS EC2 instance:

On the Amazon EC2 console, choose Launch Instance.
Choose a Windows Server Amazon Machine Image (AMI) that suits your requirements.
Select an appropriate instance type for your use case.
Configure the instance details:
1. Choose the VPC and subnet where you want to launch the instance.
2. Enable Auto-assign Public IP.
3. For Add storage, configure the desired storage options for your instance.
4. Add any necessary tags to the instance.
For Configure security group, select or create a security group that allows the necessary inbound and outbound traffic to your instance.
Review the instance configuration and choose Launch to start the instance creation process.
For Select an existing key pair or create a new key pair, choose an existing key pair or create a new one.
Choose Launch instance.
When the instance is running, you can connect to it using the Remote Desktop Protocol (RDP) and the administrator password obtained from the Get Windows password

Install Terraform on the EC2 instance

Install Terraform on the Windows EC2 instance using the following steps:

RDP into the EC2 instance you created.
Install Terraform on the EC2 instance.

You need to update the environment variables to point to the directory where the Terraform executable is available.

Under System Properties, on the Advanced tab, choose Environment Variables.

Choose the path variable.

Choose New and enter the path where Terraform is installed. For this post, it’s in the C:\ directory.

Confirm Terraform is installed by entering the following command:

terraform -v

Optionally, you can use an editor like Visual Studio Code (VS Code) and add the Terraform extension to it.

Create a user for accessing AWS through code (AWS CLI and Terraform)

Next, we create an administrator user in IAM, which performs the operations on AWS through Terraform and the AWS Command Line Interface (AWS CLI). Complete the following steps:

Create a new IAM user.
On the IAM console, download and save the access key and user key.

Install the AWS CLI.
Launch the AWS CLI and run aws configure and pass the access key ID, secret access key, and default AWS Region.

This prevents the AWS user name and password from being visible in plain text in the Terraform code and prevents accidental sharing when the code is committed to a code repository.

Create a user for Accessing Redshift through code (Terraform)

Because we’re creating a Redshift cluster and subsequent operations, the administrator user name and password required for these processes (different than the admin role we created earlier for logging in to the AWS Management Console) needs to be invoked in the code. To do this securely, we use Secrets Manager to store the user name and password. We write code in Terraform to access these credentials during the cluster create operation. Complete the following steps:

On the Secrets Manager console, choose Secrets in the navigation pane.
Choose Store a new secret.

For Secret type, select Credentials for Amazon Redshift data warehouse.
Enter your credentials.

Set up Terraform

Complete the following steps to set up Terraform:

Create a folder or directory for storing all your Terraform code.
Open the VS Code editor and browse to your folder.
Choose New File and enter a name for the file using the .tf extension

Now we’re ready to start writing our code starting with defining providers. The providers definition is a way for Terraform to get the necessary APIs to interact with AWS.

Configure a provider for Terraform:

terraform {
required_providers {
aws = {
source  = "hashicorp/aws"
version = "5.53.0"
}
}
}

# Configure the AWS Provider
provider "aws" {
region = "us-east-1"
}

Access the admin credentials for the Amazon Redshift admin user:

data "aws_secretsmanager_secret_version" "creds" {
# Fill in the name you gave to your secret
secret_id = "terraform-creds"
}
/*json decode to parse the secret*/
locals {
terraform-creds = jsondecode(
data.aws_secretsmanager_secret_version.creds.secret_string
)
}

Create a Redshift cluster

To create a Redshift cluster, use the aws_redshift_cluster resource:

# Create an encrypted Amazon Redshift cluster

resource "aws_redshift_cluster" "dw_cluster" {
cluster_identifier = "tf-example-redshift-cluster"
database_name      = "dev"
master_username    = local.terraform-creds.username
master_password    = local.terraform-creds.password
node_type          = "ra3.xlplus"
cluster_type       = "multi-node"
publicly_accessible = "false"
number_of_nodes    = 2
encrypted         = true
kms_key_id        = local.RedshiftClusterEncryptionKeySecret.arn
enhanced_vpc_routing = true
cluster_subnet_group_name="<<your-cluster-subnet-groupname>>"
}

In this example, we create a Redshift cluster called tf-example-redshift-cluster, using the ra3.xlplus node type 2 node cluster. We use the credentials from Secrets Manager and jsondecode to access these values. This makes sure the user name and password aren’t passed in plain text.

Add an IAM role to the cluster

Because we didn’t have the option to associate an IAM role during cluster creation, we do so now with the following code:

resource "aws_redshift_cluster_iam_roles" "cluster_iam_role" {
cluster_identifier = aws_redshift_cluster.dw_cluster.cluster_identifier
iam_role_arns      = ["arn:aws:iam::yourawsaccountId:role/service-role/yourIAMrolename"]
}

Enable Redshift cluster operations

Performing operations on the Redshift cluster such as resize, pause, and resume on a schedule offers a more practical use of these operations. Therefore, we create two policies: one that allows the Amazon Redshift scheduler service and one that allows the cluster pause, resume, and resize operations. Then we create a role that has both policies attached to it.

You can perform these steps directly from the console and then referenced in Terraform code. The following example demonstrates the code snippets to create policies and a role, and then to attach the policy to the role.

Create the Amazon Redshift scheduler policy document and create the role that assumes this policy:

#define policy document to establish the Trust Relationship between the role and the entity (Redshift scheduler)

data "aws_iam_policy_document" "assume_role_scheduling" {
statement {
effect = "Allow"
principals {
type        = "Service"
identifiers = ["scheduler.redshift.amazonaws.com"]
}
actions = ["sts:AssumeRole"]
}
}

#create a role that has the above trust relationship attached to it, so that it can invoke the redshift scheduling service
resource "aws_iam_role" "scheduling_role" {
name               = "redshift_scheduled_action_role"
assume_role_policy = data.aws_iam_policy_document.assume_role_scheduling.json
}

Create a policy document and policy for Amazon Redshift operations:

/*define the policy document for other redshift operations*/

data "aws_iam_policy_document" "redshift_operations_policy_definition" {
statement {
effect = "Allow"
actions = [
"redshift:PauseCluster",
"redshift:ResumeCluster",
"redshift:ResizeCluster",
]
resources = ["arn:aws:redshift:*:youraccountid:cluster:*"]
}
}

/*create the policy and add the above data (json) to the policy*/
resource "aws_iam_policy" "scheduling_actions_policy" {
name   = "redshift_scheduled_action_policy"
policy = data.aws_iam_policy_document.redshift_operations_policy_definition.json
}

Attach the policy to the IAM role:

/*connect the policy and the role*/
resource "aws_iam_role_policy_attachment" "role_policy_attach" {
policy_arn = aws_iam_policy.scheduling_actions_policy.arn
role       = aws_iam_role.scheduling_role.name
}

Pause the Redshift cluster:

#pause a cluster
resource "aws_redshift_scheduled_action" "pause_operation" {
name     = "tf-redshift-scheduled-action-pause"
schedule = "cron(00 22 * * ? *)"
iam_role = aws_iam_role.scheduling_role.arn
target_action {
pause_cluster {
cluster_identifier = aws_redshift_cluster.dw_cluster.cluster_identifier
}
}
}

In the preceding example, we created a scheduled action called tf-redshift-scheduled-action-pause that pauses the cluster at 10:00 PM every day as a cost-saving action.

Resume the Redshift cluster:

name     = "tf-redshift-scheduled-action-resume"
schedule = "cron(15 07 * * ? *)"
iam_role = aws_iam_role.scheduling_role.arn
target_action {
resume_cluster {
cluster_identifier = aws_redshift_cluster.dw_cluster.cluster_identifier
}
}
}

In the preceding example, we created a scheduled action called tf-redshift-scheduled-action-resume that resumes the cluster at 7:15 AM every day in time for business operations to start using the Redshift cluster.

Resize the Redshift cluster:

#resize a cluster
resource "aws_redshift_scheduled_action" "resize_operation" {
name     = "tf-redshift-scheduled-action-resize"
schedule = "cron(15 14 * * ? *)"
iam_role = aws_iam_role.scheduling_role.arn
target_action {
resize_cluster {
cluster_identifier = aws_redshift_cluster.dw_cluster.cluster_identifier
cluster_type = "multi-node"
node_type = "ra3.xlplus"
number_of_nodes = 4 /*increase the number of nodes using resize operation*/
classic = true /*default behavior is to use elastic resizeboolean value if we want to use classic resize*/
}
}
}

In the preceding example, we created a scheduled action called tf-redshift-scheduled-action-resize that increases the nodes from 2 to 4. You can do other operations like change the node type as well. By default, elastic resize will be used, but if you want to use classic resize, you have to pass the parameter classic = true as shown in the preceding code. This can be a scheduled action to anticipate the needs of peak periods and resize appripriately for that duration. You can then downsize using similar code during non-peak times.

Test the solution

We apply the following code to test the solution. Change the resource details accordingly, such as account ID and Region name.

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "5.53.0"
    }
  }
}

# Configure the AWS Provider
provider "aws" {
  region = "us-east-1"
}

# access secrets stored in secret manager
data "aws_secretsmanager_secret_version" "creds" {
  # Fill in the name you gave to your secret
  secret_id = "terraform-creds"
}

/*json decode to parse the secret*/
locals {
  terraform-creds = jsondecode(
    data.aws_secretsmanager_secret_version.creds.secret_string
  )
}

#Store the arn of the KMS key to be used for encrypting the redshift cluster

data "aws_secretsmanager_secret_version" "encryptioncreds" {
  secret_id = "RedshiftClusterEncryptionKeySecret"
}
locals {
  RedshiftClusterEncryptionKeySecret = jsondecode(
    data.aws_secretsmanager_secret_version.encryptioncreds.secret_string
  )
}

# Create an encrypted Amazon Redshift cluster
resource "aws_redshift_cluster" "dw_cluster" {
  cluster_identifier = "tf-example-redshift-cluster"
  database_name      = "dev"
  master_username    = local.terraform-creds.username
  master_password    = local.terraform-creds.password
  node_type          = "ra3.xlplus"
  cluster_type       = "multi-node"
  publicly_accessible = "false"
  number_of_nodes    = 2
  encrypted         = true
  kms_key_id        = local.RedshiftClusterEncryptionKeySecret.arn
  enhanced_vpc_routing = true
  cluster_subnet_group_name="redshiftclustersubnetgroup-yuu4sywme0bk"
}

#add IAM Role to the Redshift cluster

resource "aws_redshift_cluster_iam_roles" "cluster_iam_role" {
  cluster_identifier = aws_redshift_cluster.dw_cluster.cluster_identifier
  iam_role_arns      = ["arn:aws:iam::youraccountid:role/service-role/yourrolename"]
}

#for audit logging please create an S3 bucket which has read write privileges for Redshift service, this example does not include S3 bucket creation.

resource "aws_redshift_logging" "redshiftauditlogging" {
  cluster_identifier   = aws_redshift_cluster.dw_cluster.cluster_identifier
  log_destination_type = "s3"
  bucket_name          = "your-s3-bucket-name"
}

#to do operations like pause, resume, resize on a schedule we need to first create a role that has permissions to perform these operations on the cluster

#define policy document to establish the Trust Relationship between the role and the entity (Redshift scheduler)

data "aws_iam_policy_document" "assume_role_scheduling" {
  statement {
    effect = "Allow"
    principals {
      type        = "Service"
      identifiers = ["scheduler.redshift.amazonaws.com"]
    }

    actions = ["sts:AssumeRole"]
  }
}

#create a role that has the above trust relationship attached to it, so that it can invoke the redshift scheduling service
resource "aws_iam_role" "scheduling_role" {
  name               = "redshift_scheduled_action_role"
  assume_role_policy = data.aws_iam_policy_document.assume_role_scheduling.json
}

/*define the policy document for other redshift operations*/

data "aws_iam_policy_document" "redshift_operations_policy_definition" {
  statement {
    effect = "Allow"
    actions = [
      "redshift:PauseCluster",
      "redshift:ResumeCluster",
      "redshift:ResizeCluster",
    ]

    resources =  ["arn:aws:redshift:*:youraccountid:cluster:*"]
  }
}

/*create the policy and add the above data (json) to the policy*/

resource "aws_iam_policy" "scheduling_actions_policy" {
  name   = "redshift_scheduled_action_policy"
  policy = data.aws_iam_policy_document.redshift_operations_policy_definition.json
}

/*connect the policy and the role*/

resource "aws_iam_role_policy_attachment" "role_policy_attach" {
  policy_arn = aws_iam_policy.scheduling_actions_policy.arn
  role       = aws_iam_role.scheduling_role.name
}

#pause a cluster

resource "aws_redshift_scheduled_action" "pause_operation" {
  name     = "tf-redshift-scheduled-action-pause"
  schedule = "cron(00 14 * * ? *)"
  iam_role = aws_iam_role.scheduling_role.arn
  target_action {
    pause_cluster {
      cluster_identifier = aws_redshift_cluster.dw_cluster.cluster_identifier
    }
  }
}

#resume a cluster

resource "aws_redshift_scheduled_action" "resume_operation" {
  name     = "tf-redshift-scheduled-action-resume"
  schedule = "cron(15 14 * * ? *)"
  iam_role = aws_iam_role.scheduling_role.arn
  target_action {
    resume_cluster {
      cluster_identifier = aws_redshift_cluster.dw_cluster.cluster_identifier
    }
  }
}

#resize a cluster

resource "aws_redshift_scheduled_action" "resize_operation" {
  name     = "tf-redshift-scheduled-action-resize"
  schedule = "cron(15 14 * * ? *)"
  iam_role = aws_iam_role.scheduling_role.arn
  target_action {
    resize_cluster {
      cluster_identifier = aws_redshift_cluster.dw_cluster.cluster_identifier
      cluster_type = "multi-node"
      node_type = "ra3.xlplus"
      number_of_nodes = 4 /*increase the number of nodes using resize operation*/
      classic = true /*default behavior is to use elastic resizeboolean value if we want to use classic resize*/
    }
  }
}

Run terraform plan to see a list of changes that will be made, as shown in the following screenshot.

After you have reviewed the changes, use terraform apply to create the resources you defined.

You will be asked to enter yes or no before Terraform starts creating the resources.

You can confirm that the cluster is being created on the Amazon Redshift console.

After the cluster is created, the IAM roles and schedules for pause, resume, and resize operations are added, as shown in the following screenshot.

You can also view these scheduled operations on the Amazon Redshift console.

Clean up

If you deployed resources such as the Redshift cluster and IAM roles, or any of the other associated resources by running terraform apply, to avoid incurring charges on your AWS account, run terraform destroy to tear these resources down and clean up your environment.

Conclusion

Terraform offers a powerful and flexible solution for managing your infrastructure as code using a declarative approach, with a cloud-agnostic nature, resource orchestration capabilities, and strong community support. This post provided a comprehensive guide to using Terraform to deploy a Redshift cluster and perform important operations such as resize, resume, and pause on the cluster. Embracing IaC and using the right tools, such as Workflow Studio, VS Code, and Terraform, will enable you to build scalable and maintainable distributed applications, and automate processes.

About the Authors

Amit Ghodke is an Analytics Specialist Solutions Architect based out of Austin. He has worked with databases, data warehouses and analytical applications for the past 16 years. He loves to help customers implement analytical solutions at scale to derive maximum business value.

Ritesh Kumar Sinha is an Analytics Specialist Solutions Architect based out of San Francisco. He has helped customers build scalable data warehousing and big data solutions for over 16 years. He loves to design and build efficient end-to-end solutions on AWS. In his spare time, he loves reading, walking, and doing yoga.

TRAIL OF TEARS: America’s GREATEST Regret?

2024-07-25 Geographics

Post Syndicated from Geographics original https://www.youtube.com/watch?v=3bHfEd2KfXw

Migrate workloads from AWS Data Pipeline

2024-07-25 Noritaka Sekiyama

Post Syndicated from Noritaka Sekiyama original https://aws.amazon.com/blogs/big-data/migrate-workloads-from-aws-data-pipeline/

AWS Data Pipeline helps customers automate the movement and transformation of data. With Data Pipeline, customers can define data-driven workflows, so that tasks can be dependent on the successful completion of previous tasks. Launched in 2012, Data Pipeline predates several popular Amazon Web Services (AWS) offerings for orchestrating data pipelines such as AWS Glue, AWS Step Functions, and Amazon Managed Workflows for Apache Airflow (Amazon MWAA).

Data Pipeline has been a foundational service for getting customer off the ground for their extract, transform, load (ETL) and infra provisioning use cases. Some customers want a deeper level of control and specificity than possible using Data Pipeline. With the recent advancements in the data industry, customers are looking for a more feature-rich platform to modernize their data pipelines to get them ready for data and machine learning (ML) innovation. This post explains how to migrate from Data Pipeline to alternate AWS services to serve the growing needs of data practitioners. The option you choose depends on your current workload on Data Pipeline. You can migrate typical use cases of Data Pipeline to AWS Glue, Step Functions, or Amazon MWAA.

Note that you will need to modify the configurations and code in the examples provided in this post based on your requirements. Before starting any production workloads after migration, you need to test your new workflows to ensure no disruption to production systems.

Migrating workloads to AWS Glue

AWS Glue is a serverless data integration service that helps analytics users to discover, prepare, move, and integrate data from multiple sources. It includes tooling for authoring, running jobs, and orchestrating workflows. With AWS Glue, you can discover and connect to hundreds of different data sources and manage your data in a centralized data catalog. You can visually create, run, and monitor ETL pipelines to load data into your data lakes. Also, you can immediately search and query cataloged data using Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.

We recommend migrating your Data Pipeline workload to AWS Glue when:

You’re looking for a serverless data integration service that supports various data sources, authoring interfaces including visual editors and notebooks, and advanced data management capabilities such as data quality and sensitive data detection.
Your workload can be migrated to AWS Glue workflows, jobs (in Python or Apache Spark) and crawlers (for example, your existing pipeline is built on top of Apache Spark).
You need a single platform that can handle all aspects of your data pipeline, including ingestion, processing, transfer, integrity testing, and quality checks.
Your existing pipeline was created from a pre-defined template on the AWS Management Console for Data Pipeline, such as exporting a DynamoDB table to Amazon S3, or importing DynamoDB backup data from S3, and you’re looking for the same template.
Your workload doesn’t depend on a specific Hadoop ecosystem application such as Apache Hive.
Your workload doesn’t require orchestrating on-premises servers, user-managed Amazon Elastic Compute Cloude (Amazon EC2) instances, or a user-managed Amazon EMR cluster.

Example: Migrate EmrActivity on EmrCluster to export DynamoDB tables to S3

One of the most common workloads on Data Pipeline is to backup Amazon DynamoDB tables to Amazon Simple Storage Service (Amazon S3). Data Pipeline has a pre-defined template named Export DynamoDB table to S3 to export DynamoDB table data to a given S3 bucket.

The template uses EmrActivity (named TableBackupActivity) which runs on EmrCluster (named EmrClusterForBackup) and backs up data on DynamoDBDataNode to S3DataNode.

You can migrate these pipelines to AWS Glue because it natively supports reading from DynamoDB.

To define an AWS Glue job for the preceding use case:

Open the AWS Glue console.
Choose ETL jobs.
Choose Visual ETL.
For Sources, select Amazon DynamoDB.
On the node Data source - DynamoDB, for DynamoDB source, select Choose the DynamoDB table directly, then select your source DynamoDB table from the menu.
For Connection options, enter s3.bucket and dynamodb.s3.prefix.
Choose + (plus) to add a new node.
For Targets, select Amazon S3.
On the node Data target - S3 bucket, for Format, select your preferred format, for example, Parquet.
For S3 Target location, enter your destination S3 path.
On Job details tab, select IAM role. In case you do not have the IAM role, follow Configuring IAM permissions for AWS Glue.
Choose Save and Run.

Your AWS Glue job has been successfully created and started.

You might notice that there is no property to manage read I/O rate. It’s because the default DynamoDB reader used in Glue Studio does not scan the source DynamoDB table. Instead it uses DynamoDB export.

Example: Migrate EmrActivity on EmrCluster to import DynamoDB from S3

Another common workload on Data Pipeline is to restore DynamoDB tables using backup data on Amazon S3. Data Pipeline has a pre-defined template named Import DynamoDB backup data from S3 to import DynamoDB table data from a given S3 bucket.

The template uses EmrActivity (named TableLoadActivity) which runs on EmrCluster (named EmrClusterForLoad) and loads data from S3DataNode to DynamoDBDataNode.

You can migrate these pipelines to AWS Glue because it natively supports writing to DynamoDB.

Prerequisites are to create a destination DynamoDB table and catalog it on Glue Data Catalog using Glue crawler, Glue console, or the API.

Open the AWS Glue console.
Choose ETL jobs.
Choose Visual ETL.
For Sources, select Amazon S3.
On the node Data source - S3 bucket, for S3 URL, enter your S3 path.
Choose + (plus) to add a new node.
For Targets, select AWS Glue Data Catalog.
On the node Data target - Data Catalog, for Database, select your destination database on Data Catalog.
For Table, select your destination table on Data Catalog.
On Job details tab, select IAM role. In case you do not have the IAM role, follow Configuring IAM permissions for AWS Glue.
Choose Save and Run.

Your AWS Glue job has been successfully created and started.

Migrating workloads to Step Functions

AWS Step Functions is a serverless orchestration service that lets you build workflows for your business-critical applications. With Step Functions, you use a visual editor to build workflows and integrate directly with over 11,000 actions for over 250 AWS services, including AWS Lambda, Amazon EMR, DynamoDB, and more. You can use Step Functions for orchestrating data processing pipelines, handling errors, and working with the throttling limits on the underlying AWS services. You can create workflows that process and publish machine learning models, orchestrate micro-services, as well as control AWS services, such as AWS Glue, to create ETL workflows. You also can create long-running, automated workflows for applications that require human interaction.

We recommend migrating your Data Pipeline workload to Step Functions when:

You’re looking for a serverless, highly available workflow orchestration service.
You’re looking for a cost-effective solution that charges at single-task granularity.
Your workloads are orchestrating tasks for multiple AWS services, such as Amazon EMR, AWS Lambda, AWS Glue, or DynamoDB.
You’re looking for a low-code solution that comes with a drag-and-drop visual designer for workflow creation and doesn’t require learning new programming concepts.
You’re looking for a service that provides integrations with over 250 AWS services covering over 11,000 actions out-of-the-box, as well as allowing integrations with custom non-AWS services and activities.
Both Data Pipeline and Step Functions use JSON format to define workflows. This allows you to store your workflows in source control, manage versions, control access, and automate with continuous integration and development (CI/CD). Step Functions use a syntax called Amazon State Language, which is fully based on JSON and allows a seamless transition between the textual and visual representations of the workflow.
Your workload requires orchestrating on-premises servers, user-managed EC2 instances, or a user-managed EMR cluster.

With Step Functions, you can choose the same version of Amazon EMR that you’re currently using in Data Pipeline.

For migrating activities on Data Pipeline managed resources, you can use AWS SDK service integration on Step Functions to automate resource provisioning and cleaning up. For migrating activities on on-premises servers, user-managed EC2 instances, or a user-managed EMR cluster, you can install an SSM agent to the instance. You can initiate the command through the AWS Systems Manager Run Command from Step Functions. You can also initiate the state machine from the schedule defined in Amazon EventBridge.

Example: Migrate HadoopActivity on EmrCluster

To migrate HadoopActivity on EmrCluster on Data Pipeline to Step Functions:

Open the AWS Step Functions console.
Choose State machines.
Choose Create state machine.
In the Choose a template wizard, search for emr, select Manage an EMR job, and choose Select.

For Choose how to use this template, select Build on it.
Choose Use template.

For Create an EMR cluster state, configure API Parameters based on the EMR release label, EMR capacity, IAM role, and so on based on the existing EmrClusternode configuration on Data Pipeline.

For Run first step state, configure API Parameters based on the JAR file and arguments based on the existing HadoopActivity node configuration on Data Pipeline.
If you have further activities configured on the existing HadoopActivity, repeat step 8.
Choose Create.

Your state machine has been successfully configured. Learn more in Manage an Amazon EMR Job.

Migrating workloads to Amazon MWAA

Amazon MWAA is a managed orchestration service for Apache Airflow that lets you use the Apache Airflow platform to set up and operate end-to-end data pipelines in the cloud at scale. Apache Airflow is an open-source tool used to programmatically author, schedule, and monitor sequences of processes and tasks referred to as workflows. Apache Airflow brings in new concepts like executors, pools, and SLAs that provide you with superior data orchestration capabilities. With Amazon MWAA, you can use Airflow and Python programming language to create workflows without having to manage the underlying infrastructure for scalability, availability, and security. Amazon MWAA automatically scales its workflow runtime capacity to meet your needs and is integrated with AWS security services to help provide you with fast and secure access to your data.

We recommend migrating your Data Pipeline workloads to Amazon MWAA when:

You’re looking for a managed, highly available service to orchestrate workflows written in Python.
You want to transition to a fully managed, widely adopted open source technology—Apache Airflow—for maximum portability.
You require a single platform that can handle all aspects of your data pipeline, including ingestion, processing, transfer, integrity testing, and quality checks.
You’re looking for a service designed for data pipeline orchestration with features such as rich UI for observability, restarts for failed workflows, backfills, retries for tasks, and lineage support with OpenLineage.
You’re looking for a service that comes with more than 1,000 pre-built operators and sensors, covering AWS as well as non-AWS services.
Your workload requires orchestrating on-premises servers, user-managed EC2 instances, or a user-managed EMR cluster.

Amazon MWAA workflows are defined as directed acyclic graphs (DAGs) using Python, so you can also treat them as source code. Airflow’s extensible Python framework enables you to build workflows connecting with virtually any technology. It comes with a rich user interface for viewing and monitoring workflows and can be easily integrated with version control systems to automate the CI/CD process. With Amazon MWAA, you can choose the same version of Amazon EMR that you’re currently using in Data Pipeline.

Example: Migrate HadoopActivity on EmrCluster

Complete the following steps in case you do not have existing MWAA environments:

Create an AWS CloudFormation template on your computer by copying the template from the quick start guide into a local text file.
On the CloudFormation console, choose Stacks in the navigation pane.
Choose Create stack with the option With new resources (standard).
Choose Upload a template file and select the local template file.
Choose Next.
Complete the setup steps, entering a name for the environment, and leave the rest of the parameters as default.
On the last step, acknowledge that resources will be created and choose Submit.

The creation can take 20–30 minutes, until the status of the stack changes to CREATE_COMPLETE. The resource that will take the most time is the Airflow environment. While it’s being created, you can continue with the following steps, until you’re required to open the Airflow UI.

An Airflow workflow is based on a DAG, which is defined by a Python file that programmatically specifies the different tasks involved and its interdependencies. Complete the following scripts to create the DAG:

Create a local file named emr_dag.py using a text editor with following snippets, and configure the EMR related parameters based on the existing Data Pipeline definition:

from airflow import DAG
from airflow.providers.amazon.aws.operators.emr import (
    EmrCreateJobFlowOperator,
    EmrAddStepsOperator,
)
from airflow.providers.amazon.aws.sensors.emr import EmrStepSensor
from airflow.utils.dates import days_ago
from datetime import timedelta
import os
DAG_ID = os.path.basename(__file__).replace(".py", "")
SPARK_STEPS = [
    {
        'Name': 'calculate_pi',
        'ActionOnFailure': 'CONTINUE',
        'HadoopJarStep': {
            'Jar': 'command-runner.jar',
            'Args': ['spark-example', 'SparkPi', '10'],
        },
    }
]
JOB_FLOW_OVERRIDES = {
    'Name': 'my-demo-cluster',
    'ReleaseLabel': 'emr-6.1.0',
    'Applications': [
        {
            'Name': 'Spark'
        },
    ],
    'Instances': {
        'InstanceGroups': [
            {
                'Name': "Master nodes",
                'Market': 'ON_DEMAND',
                'InstanceRole': 'MASTER',
                'InstanceType': 'm5.xlarge',
                'InstanceCount': 1,
            },
            {
                'Name': "Slave nodes",
                'Market': 'ON_DEMAND',
                'InstanceRole': 'CORE',
                'InstanceType': 'm5.xlarge',
                'InstanceCount': 2,
            }
        ],
        'KeepJobFlowAliveWhenNoSteps': False,
        'TerminationProtected': False,
    },
    'VisibleToAllUsers': True,
    'JobFlowRole': 'EMR_EC2_DefaultRole',
    'ServiceRole': 'EMR_DefaultRole'
}
with DAG(
    dag_id=DAG_ID,
    start_date=days_ago(1),
    schedule_interval='@once',
    dagrun_timeout=timedelta(hours=2),
    catchup=False,
    tags=['emr'],
) as dag:
    cluster_creator = EmrCreateJobFlowOperator(
        task_id='create_job_flow',
        job_flow_overrides=JOB_FLOW_OVERRIDES,
        aws_conn_id='aws_default',
    )
    step_adder = EmrAddStepsOperator(
        task_id='add_steps',
        job_flow_id=cluster_creator.output,
        aws_conn_id='aws_default',
        steps=SPARK_STEPS,
    )
    step_checker = EmrStepSensor(
        task_id='watch_step',
        job_flow_id=cluster_creator.output,
        step_id="{{ task_instance.xcom_pull(task_ids='add_steps')[0] }}",
        aws_conn_id='aws_default',
    )
    cluster_creator >> step_adder >> step_checker

Defining the schedule in Amazon MWAA is as simple as updating the schedule_interval parameter for the DAG. For example, to run the DAG daily, set schedule_interval='@daily'.

Now, you create a workflow that invokes the Amazon EMR step you just created:

On the Amazon S3 console, locate the bucket created by the CloudFormation template, which will have a name starting with the name of the stack followed by -environmentbucket- (for example, myairflowstack-environmentbucket-ap1qks3nvvr4).
Inside that bucket, create a folder called dags, and inside that folder, upload the DAG file emr_dag.py that you created in the previous section.
On the Amazon MWAA console, navigate to the environment you deployed with the CloudFormation stack.

If the status is not yet Available, wait until it reaches that state. It shouldn’t take longer than 30 minutes after you deployed the CloudFormation stack.

Choose the environment link on the table to see the environment details.

It’s configured to pick up DAGs from the bucket and folder you used in the previous steps. Airflow will monitor that folder for changes.

Choose Open Airflow UI to open a new tab accessing the Airflow UI, using the integrated IAM security to sign you in.

If there are issues with the DAG file you created, it will display an error on top of the page indicating the lines affected. In that case, review the steps and upload again. After a few seconds, it will parse it and update or remove the error banner.

Clean up

After you migrate your existing Data Pipeline workload and verify that the migration was successful, delete your pipelines in Data Pipeline to stop further runs and billing.

Conclusion

In this blog post, we outlined a few alternate AWS services for migrating your existing Data Pipeline workloads. You can migrate to AWS Glue to run and orchestrate Apache Spark applications, AWS Step Functions to orchestrate workflows involving various other AWS services, or Amazon MWAA to help manage workflow orchestration using Apache Airflow. By migrating, you will be able to run your workloads with a broader range of data integration functionalities. If you have additional questions, post in the comments or read about migration examples in our documentation.

About the authors

Noritaka Sekiyama is a Principal Big Data Architect on the AWS Glue team and AWS Data Pipeline team. He is responsible for building software artifacts to help customers. In his spare time, he enjoys cycling with his road bike.

Vaibhav Porwal is a Senior Software Development Engineer on the AWS Glue and AWS Data Pipeline team. He is working on solving problems in orchestration space by building low cost, repeatable, scalable workflow systems that enables customers to create their ETL pipelines seamlessly.

Sriram Ramarathnam is a Software Development Manager on the AWS Glue and AWS Data Pipeline team. His team works on solving challenging distributed systems problems for data integration across AWS serverless and serverfull compute offerings.

Matt Su is a Senior Product Manager on the AWS Glue team and AWS Data Pipeline team. He enjoys helping customers uncover insights and make better decisions using their data with AWS Analytics services. In his spare time, he enjoys skiing and gardening.

Transition from Amazon CloudSearch to Amazon OpenSearch Service

2024-07-25 Arvind Mahesh

Post Syndicated from Arvind Mahesh original https://aws.amazon.com/blogs/big-data/transition-from-amazon-cloudsearch-to-amazon-opensearch-service/

At AWS, we are constantly innovating and evolving our services to meet the ever-changing needs of our customers. In this post, we want to help you understand the differences between Amazon CloudSearch and Amazon OpenSearch Service, and how you can transition to OpenSearch Service.

Comparing Amazon CloudSearch and Amazon OpenSearch Service

CloudSearch is a fully managed service in the cloud that makes it straightforward to set up, manage, and scale a search solution for your website or application. With CloudSearch, you can search large collections of data such as webpages, document files, forum posts, or product information. You can quickly add search capabilities without having to become a search expert or worry about hardware provisioning, setup, and maintenance. As your volume of data and traffic fluctuates, CloudSearch scales to meet your needs. CloudSearch is internally powered by a customized version of Apache Solr, and supports features such as full-text search, Boolean search, prefix search, term boosting, faceting, hit highlighting, and auto-complete suggestions.

OpenSearch Service is a managed service that makes it seamless to deploy, operate, and scale OpenSearch, a popular open source search and analytics engine. OpenSearch provides best-in-class search capabilities, providing you with all the search features of CloudSearch plus a vector engine supporting semantic search on vector embeddings, and support for both dense and sparse vectors. In addition, with OpenSearch Service, you get advanced security with fine-grained access control, the ability to store and analyze log data for observability and security, along with dashboarding and alerting. You’ll have all of CloudSearch’s capabilities and more.

With OpenSearch Serverless, you get improved, out-of-the-box, hands-free operation. Like CloudSearch, OpenSearch Serverless lets you deploy and use OpenSearch through a REST endpoint. You send your documents to OpenSearch Serverless, which indexes them for search using the OpenSearch REST API. If you want deeper control over your infrastructure for cost and latency optimization, you can choose OpenSearch Service’s managed clusters deployment option. With managed clusters, you get granular control over the instances you would like to use, indexing and data-sharding strategy, and more. OpenSearch Service brings with it the flexibility and extensibility of open source, provides powerful querying and analytics capabilities, and enables cost-effective scalability for growing workloads, with high availability and durability. For more information on the capabilities and benefits of using OpenSearch Service, see Amazon OpenSearch Service.

Transitioning to OpenSearch Service

When transitioning from CloudSearch to OpenSearch Service, you need to re-ingest and index your data into OpenSearch Service. Because OpenSearch Service uses a REST API, numerous methods exist for indexing documents. You can use standard clients like curl or any programming language that can send HTTP requests. To further simplify the process of interacting with it, OpenSearch Service has clients for many programming languages. We recommend that you use Amazon OpenSearch Ingestion to ingest data. OpenSearch Ingestion is a fully managed data collector built within OpenSearch Service that can route data to an OpenSearch Service domain or an OpenSearch Serverless collection. OpenSearch Ingestion can ingest data from a wide variety of sources, such as Amazon Simple Storage Service (Amazon S3) buckets and HTTP endpoints, and has a rich ecosystem of built-in processors to take care of your most complex data transformation needs. OpenSearch Ingestion is serverless in nature and will scale automatically to meet the requirements of your most demanding workloads, helping you focus on your business logic while abstracting away the complexity of managing complex data pipelines for your ingestion use cases. For more information about how to ingest a document into an OpenSearch Serverless collection or a managed cluster using OpenSearch ingestion, see Getting started with Amazon OpenSearch Ingestion. For detailed information on using OpenSearch Ingestion to ingest data into OpenSearch Service, refer to Amazon OpenSearch Ingestion.

Summary

AWS continues to support CloudSearch and continues to invest in security and availability improvements. However, with the advancements in OpenSearch, we recommend that you explore OpenSearch Service to get the latest search capabilities and to meet the rapid evolution of search experience users have come to expect in the machine learning age.

About the Authors

Arvind Mahesh is a Senior Manager-Product at Amazon Web Services for Amazon OpenSearch Service. He has close to two decades of technology experience across a variety of domains such as Analytics, Search, Cloud, Network Security, and Telecom.

Jon Handler is a Senior Principal Solutions Architect at Amazon Web Services based in Palo Alto, CA. Jon works closely with OpenSearch and Amazon OpenSearch Service, providing help and guidance to a broad range of customers who have search and log analytics workloads that they want to move to the AWS Cloud. Prior to joining AWS, Jon’s career as a software developer included 4 years of coding a large-scale, ecommerce search engine. Jon holds a Bachelor of the Arts from the University of Pennsylvania, and a Master of Science and a PhD in Computer Science and Artificial Intelligence from Northwestern University.

How to migrate your AWS CodeCommit repository to another Git provider

2024-07-25 Rodney Bozo

Post Syndicated from Rodney Bozo original https://aws.amazon.com/blogs/devops/how-to-migrate-your-aws-codecommit-repository-to-another-git-provider/

Customers can migrate their AWS CodeCommit Git repositories to other Git providers using several methods, such as cloning the repository, mirroring, or migrating specific branches. This blog describes a basic use case to mirror a repository to a generic provider, and links to instructions for mirroring to more specific providers. Your exact steps could vary depending on the type or complexity of your repository, and the decisions made on what and how you want to migrate. This post only describes how to migrate Git repository data, and does not describe exporting other data from CodeCommit such as pull requests.

Pre-requisites

Before you can migrate your CodeCommit repository to another provider, make sure that you have the necessary credentials and permissions to both the AWS Management Console and the other provider’s account. For migrating to GitHub and Gitlab, use CodeCommit static credentials as described in HTTPS users using Git credentials. If you choose to use the generic migration option process described below, any type of CodeCommit credentials can be used. To learn more about setting up AWS CodeCommit access control see Setting up for AWS CodeCommit.
In the AWS CodeCommit console, select the clone URL for the repository you will migrate. The correct clone URL (HTTPS, SSH, or HTTPS (CRC)) depends on which credential type and network protocol you have chosen to use.

Figure 1: Clone repositories

Migrating your AWS CodeCommit repository to a GitLab repository

Using the CodeCommit clone URL in combination with the HTTPS Git repository credentials, follow the guidance in GitLab’s documentation for importing source code from a repository by URL.

Migrating your AWS CodeCommit repository to a GitHub repository

Using the CodeCommit clone URL in combination with the HTTPS Git repository credentials, follow the guidance in GitHub’s documentation for importing source code.

Generic migration to a different repository provider

1. Clone the AWS CodeCommit Repository
Clone the repository from AWS CodeCommit to your local machine using Git. If you’re using HTTPS, you can do this by running the following command:

git clone --mirror https://your-aws-repository-url your-aws-repository

Replace your-aws-repository-url with the URL of your AWS CodeCommit repository.
Replace your-aws-repository with a name for this repository. Example:

git clone https://git-codecommit.us-east-2.amazonaws.com/v1/repos/MyDemoRepo my-demo-repo

2. Set up new remote repository

Navigate to the directory of your cloned AWS CodeCommit repository. Then, add the repository URL from the new repository provider as a remote:

git remote add <provider name> <provider-repository-url>

Replace <provider name> with the provider name of your choice. (Example: gitlab)
Replace <provider-repository-url> with the URL of your new repository provider’s repository.

3. Push your local repository to the new remote repository

This will push all branches and tags to your new repository provider’s repository. The provider name must match the provider name from step 2.

git push <provider name> --mirror

Notes:

The remote repository should be empty
The remote repository may have protected branches not allowing force push. In this case, navigate to your new repository provider and disable branch protections to allow force push.

4. Verify the Migration

Once the push is complete, verify that all files, branches, and tags have been successfully migrated to the new repository provider. You can do this by browsing your repository online or by cloning it to another location and checking it locally.

5. Update Remote URLs

If you plan to continue working with the migrated repository locally, you may want to update the remote URL to point to the new provider’s repository instead of AWS CodeCommit. You can do this using the following command:

git remote set-url origin <provider-repository-url>

Replace <provider-repository-url> with the URL of your new repository provider’s repository.

6. Update CI/CD Pipelines and fix protected branches

If you have CI/CD pipelines set up that interact with your repository, such as GitLab, GitHub or AWS CodePipeline, update their configuration to reflect the new repository URL. If you removed protected branch permissions in Step 3 you may want to add these back to your main branch.

7. Inform Your Team

If you’re migrating a repository that others are working on, be sure to inform your team about the migration and provide them with the new repository URL.

8. Delete the, now migrated, AWS CodeCommit repository

This action cannot be undone. Navigate back to the AWS CodeCommit console and delete the repository that you have migrated using the “Delete Repository” button.

Figure 2: Delete repositories

Conclusion

This post described a few methods to migrate your existing AWS CodeCommit repository to another Git provider. After migration, you have the option to continue to use your current AWS CodeCommit repository, but doing so will likely require a regular sync operation between AWS CodeCommit and the new repository provider. For more information about repository migration, please see the following resources:

Migrate a repository incrementally – This guide is written to migrate a repository to CodeCommit incrementally but can also be used for other Git providers.

How to migrate from AWS Cloud9 to AWS IDE Toolkits or AWS Cloudshell

2024-07-25 Rodney Bozo

Post Syndicated from Rodney Bozo original https://aws.amazon.com/blogs/devops/how-to-migrate-from-aws-cloud9-to-aws-ide-toolkits-or-aws-cloudshell/

Building with AWS requires you to interact with and manipulate your AWS resources, whether it’s to manage infrastructure, deploy applications, or troubleshoot issues and many AWS customers use AWS Cloud9 to do so today. However, developers want the ability to work with AWS resources within their own Integrated Development Environment (IDE) because it allows them to streamline their workflows and leverage familiar tools. Other customers still want the security and flexibility of working with their resources in the AWS Management Console, but with quicker access and portability across different pages. In this blog, we will discuss two solutions, the AWS IDE Toolkits and AWS CloudShell, and why you may want to migrate from AWS Cloud9 to one of these solutions.

Overview

The AWS IDE Toolkits are a set of open-source plugins that integrate AWS services directly into popular IDEs like Visual Studio Code, IntelliJ, and PyCharm. With these toolkits, you can manage AWS resources, deploy applications, and debug code without leaving your familiar development environment. Key features of the AWS IDE Toolkits include seamless access to AWS services, resource exploration and management, local debugging capabilities, and integration with AWS deployment tools like AWS CloudFormation and AWS SAM. The AWS IDE Toolkits saves you the hassle of deploying and managing an AWS Cloud9 EC2 instance in your account and allows you to interact with AWS services in the context of your IDE’s source code.

AWS CloudShell is a browser-based shell available directly in the AWS Management Console that provides a pre-authenticated and pre-configured environment for running interacting with AWS resources. AWS CLI is pre-installed in the AWS CloudShell environment, eliminating the need for you to install and configure the AWS CLI locally, making it easier to interact with AWS resources from anywhere. You can use AWS CloudShell to check or adjust a configuration file, make a quick fix to a production environment, or even experiment with new AWS services or features. Best of all, usage of AWS CloudShell is free. CloudShell’s accessibility from anywhere in the AWS Management Console makes it an ideal alternative when you want to interact with AWS resources via the command line over the web because you have limitations doing so on your local desktop.

Getting started

If you’re interested in leveraging the AWS IDE Toolkits, the onboarding process is straightforward. In many popular IDE’s like Visual Studio Code, you can simply install the AWS Toolkits extension in the IDE’s extension marketplace and authenticate with your AWS credentials to begin taking advantage of all of the AWS Toolkits features. For more detailed information about installation, you can see the onboarding steps for each supported IDE. To begin using AWS CloudShell, simply click the CloudShell icon in the AWS Management Console and follow the prompts to launch your shell environment. CloudShell leverages the credentials from your AWS Management Console sessions to provide a pre-authenticated shell environment. You can also explore detailed user guides and sample use cases to help you get familiar with the tool.

Figure 1: Click on the AWS CloudShell icon

Summary

Both the AWS IDE Toolkits and AWS CloudShell offer powerful capabilities for interacting with AWS resources. Whether you prefer working within your local IDE or a web-based terminal directly in the AWS Management Console, these solutions provide a seamless and efficient way to manage your AWS infrastructure and applications. Take the time to explore these options and see how they can enhance your development workflows. Finally, don’t forget to delete your AWS Cloud9 EC2 instances once you migrate to avoid incurring unnecessary future costs.

[$] What became of getrandom() in the vDSO

2024-07-25 corbet

Post Syndicated from corbet original https://lwn.net/Articles/983186/

In the previous episode of the
vgetrandom() story, Jason Donenfeld had put together a version of
the getrandom()
system call that ran in user space, significantly improving performance for
applications that need a lot of random data while retaining all of the
guarantees provided by the system call. At that time, it seemed that a
consensus had built around the implementation and that it was headed toward
the mainline in that form. A few milliseconds after that article was
posted, though, a Linus-Torvalds-shaped obstacle appeared in its path.
That obstacle has been overcome and this work has now been merged for the
6.11 kernel, but its form has changed somewhat.

Get it Right in CAMERA! Capturing my BEST Photos in Iceland EVER!

2024-07-25 Matt Granger

Post Syndicated from Matt Granger original https://www.youtube.com/watch?v=mEVL5v3kdZU

[$] More informative kernel panics for Fedora

2024-07-25 daroc

Post Syndicated from daroc original https://lwn.net/Articles/982398/

On July 12, Jocelyn Falempe

proposed a change to the configuration options that Fedora sets for its
kernels, in order to make kernel panics easier to report.
Falempe would like to enable the kernel’s recently added
DRM-panic feature, which adds
a graphical crash screen that is reminiscent of the infamous
Windows “blue screen of death” for kernel panics. The feature introduces a few
tradeoffs, including currently limited driver support, so the proposal spawned a
good deal of discussion.

What Do Teachers, Parents, and Students Have in Common? The Need to Back Up

2024-07-25 Yev

Post Syndicated from Yev original https://www.backblaze.com/blog/what-do-teachers-parents-and-students-have-in-common-the-need-to-back-up/

Our world has moved more online, and blackboards have taken a backseat to laptops in classrooms and lecture halls alike. From homework to lesson planning and grading, to communicating with students and parents, educators and students rely heavily on their computers and sync drives like Google Drive, Dropbox, or OneDrive. And, when it comes to digital resources, there’s always a risk of data loss, which, when it strikes, can wipe out hours or days of work.

This is where backup solutions come into play. In this blog post, we will explore the benefits of computer backup for both students and educators, highlight the importance of choosing an affordable and reliable backup solution, and give you some talking points to help others in your educational community understand the importance of backing up.

Risks of data loss

Data loss can happen for a variety of reasons, including hardware failure, accidental deletion, theft, or cyber attacks.

Imagine this scenario: You’ve spent hours building a detailed lesson plan, preparing engaging multimedia presentations, and grading student assignments. Suddenly, your computer crashes, and you can’t get it to turn back on. Or, you lose your USB drive that has years of work, including lesson plans. Both situations are not great and the result is that all of your hard work is gone in an instant.

Data loss is an issue for anyone, but for educators, the consequences can affect not only your work but also your students’ learning experience. And the same scenario is true for students—working on a research project last minute only to have a blue screen of death five minutes before the deadline can be a frustrating turn of events—and one that affects your grade long-term.

The 3-2-1 backup rule

The good news about avoiding data loss is that there are some established best practices that can give you a great place to start. The most fundamental of these, the 3-2-1 backup rule, says you should have three copies of your data on two types of storage media with one copy stored off-site.

Sync is not backup

Sync services like Google Drive, OneDrive, and Dropbox are great, but they are not the same thing as a true backup. Sync drives are designed to keep all versions consistent with each other, which makes them vulnerable to things like accidental deletion and ransomware attacks. While some may have limited version history or “backup,” those features are typically for a limited amount of time (i.e., 30 days), or are lacking in some of the key areas that schools need to maintain compliance with data protection standards.

Cloud backup services are using a different tool for a different job—you want your synced files to change, whereas you want your backup to be a fixed point in time you can restore if you need to. That’s not to say that you won’t have your backup files constantly up-to-date, like you do with an automatic backup solution, just that you’ll be able to restore a file, or all of your files, to whatever time you choose.

And, if you think the difference is just splitting hairs, studies show that 58% of organizations that experienced data loss last year had some amount of unrecovered data. And, in that same pool of survey takers, 84% of organizations were relying on cloud sync services.

Benefits of backing up

Protection against data loss: The primary benefit of a backup solution is the protection it offers against data loss. By regularly backing up your files, you can ensure that your important documents are safe and can be restored quickly in case of any mishap, or even if you forget your laptop at home.
Enhanced productivity: With a reliable backup system in place, you can focus on what you’re working on without worrying about it getting lost. This peace of mind allows you to work more efficiently and creatively, knowing that your files are secure.
Compliance and accountability: While not at the top of many student’s minds, many educators know that educational institutions have policies and regulations regarding data storage and protection. Having a robust backup solution helps teachers, professors, and the organizations they work for stay compliant with these regulations.
Cost savings: Investing in a backup solution can save you money in the long run. Data recovery services can be expensive after the fact, and the time lost in trying to recreate lost work can be even more costly. An affordable backup solution provides a safety net that prevents these potential expenses.

Students: Back up your data regularly

Often, students are given space on cloud drives or required to submit assignments through learning management systems like Blackboard. But, even with cloud drives, many tools don’t account for adequate backups. When it comes down to it, students are responsible for turning in their work on time.

Getting a backup in place protects you and all the effort you’re putting into your coursework, and you can try it for free to see if it’s right for you.

Educators and faculty members often drive change

Collaboration is one of the hallmarks of an educational environment, and educators and students are often just as responsible for driving change as administrators are. Whatever role you take in your educational community, there are many ways you can make to help others understand why backup is so crucial, and how to choose the tool that’s right for you.

Choosing an affordable and reliable backup solution

When selecting a backup solution, affordability and reliability are key factors to consider. Here are some decision criteria you can share with others to help in choosing a backup solution:

Assess your needs. Determine the amount of data you need to back up and how frequently it changes. This will help you choose a solution that meets your specific requirements without overpaying for unnecessary features.
Cloud vs. local backup. Cloud-based backups offer the advantage of remote access and easy storage to a geographically separate location, while local backups (such as external hard drives) can provide faster recovery times. Both methods have a place in a solid 3-2-1 backup strategy.
Ease of use. Look for a backup solution that is user-friendly and doesn’t require extensive technical knowledge. The easier it is to use, the more likely you are to maintain regular backups.
Security features. Ensure that the backup solution you choose has robust security features, such as encryption, to protect your data from unauthorized access and cyber threats.
Cost-effective plans. Many backup service providers offer tiered pricing plans based on storage needs and features, or are based on the number of devices you need protected. Backblaze Computer Backup, for example, starts at $9 per computer per month for unlimited backup, with discounts for one year or two year plans.

Share resources to facilitate discussion

If you don’t have a robust backup strategy in place through your IT department or district, send them this article or this Texas A&M case study and recommend that they get started with a backup strategy.

Save your data (and yourself!): Think about backups ahead of time

Remember, regardless of how you’re creating data, the question isn’t if you will experience data loss, but when. Be prepared and make backup a priority—if not in your organization, then definitely in your personal tech choices.

The post What Do Teachers, Parents, and Students Have in Common? The Need to Back Up appeared first on Backblaze Blog | Cloud Storage & Cloud Backup