Кои са най-важните проблеми за американците в изборната година? Част 2: Имиграция, здравеопазване, външна политика

2024-09-13 Йоанна Елми

Post Syndicated from Йоанна Елми original https://www.toest.bg/koi-sa-nay-vajnite-problemi-za-amerikantsite-v-izbornata-godina-chast-2/

<< Към първа част

Имигранти, които ядат кучета и котки. Ако преди два месеца, когато излезе първата част на този анализ, някой можеше да предскаже как ще започнем втората… вероятно щях да му повярвам, защото все пак е изборна година, а никой не прави по-добро изборно шоу от САЩ.

„В Спрингфийлд [Охайо – б.а.] ядат кучетата. Хората, които са дошли там, ядат котките. Ядат – ядат домашните любимци на хората, които живеят там“, заяви по време на вторите кандидатпрезидентски дебати бившият президент Доналд Тръмп. И това бързо се превърна в меме.

I better go superviral for this… pic.twitter.com/00Laf5QUrx

— Fearghas Kelly (@FearghasKelly) September 11, 2024

Изказванията на Тръмп, комбинирани с песен от анимацията „Семейство Симпсън“, които също живеят във въображаем Спрингфийлд

Майк Деуайн, републиканският губернатор на щата Охайо, заяви по време на пресконференция, че „имигрантите от Хаити, които са тук, ако говорите с представители на местната власт, ако питате хората, те също се справят добре. Ходят на работа. Някои от тях не говорят английски, имаме известни проблеми по отношение на здравеопазването, но много от тях полагат усилия да се осигурят и да получат достъп до здравни услуги“. Деуайн отчита, че са възможни проблеми, когато голям брой хора се преместят от друга държава, където здравеопазването е на много по-ниско ниво, но набляга на „възможността“ за възникване на такива проблеми. В рамките на обръщение към общността властите в Спрингфийлд заявиха, че не са открили никакви доказателства за вреди, нанесени на животни от страна на имигрантите. Слуховете са тръгнали от история отпреди месец, извън Спрингфийлд, с извършител, който вече е арестуван.

„Подобни слухове отвличат вниманието от реалните проблеми, като тези с липсата на жилища, ресурси в училищата и издъхващата здравна система“, каза кметът на града Роб Рю.

Страшните мигранти или добрата стара дезинформация?

Изказванията на местните власти не попречиха нито на Тръмп, нито на неговия кандидат за вицепрезидент Джей Ди Ванс да повторят опорката в ефир пред милиони. Ванс заяви, че градската управа няма как да знае за какво става въпрос, защото не е „на терен“, и че самият той е получил множество сигнали от избиратели за гъски, които са улавяни в езерата, за да бъдат колени, както и за изядени домашни любимци.

Въпреки изобилието от медийно отразяване на историята през последните дни Ванс обвини медиите, че не говорят за проблема. Тези твърдения той направи пред журналистката от CNN Кейтлан Колинс, която в отговор го попита: „Ако някой ви се обади и ви каже, че е видял Голямата стъпка, това не значи, че е видял Голямата стъпка. Имате отговорност като кандидат, както и бившият президент, да не разпространявате невярна информация, нали?“

“If someone calls your office and says they saw Bigfoot, that doesn’t mean they saw Bigfoot. You have a sense of responsibility as a running mate, and he certainly does as the candidate, to not promote false information, right?” pic.twitter.com/sUfbsgtMm8

— Kaitlan Collins (@kaitlancollins) September 11, 2024

Източникът на въпросната информация е обичайната изолирана екосистема от алт-райт и консервативно манипулирано съдържание. Разследване на Daily Dot разкрива първоизточника: ареста на Алекис Телия Ферел, живееща в град Кантън, Охайо, която „смачква главата на котката си и я изяжда пред очите на множество свидетели в жилищен квартал“ на 16 август. Ферел е регистрирана гласоподавателка в Охайо от шест години насам и няма никакви данни да е от Хаити.

Базираният в Малайзия коментатор Майлс Ченг разпространява полицейски записи от ареста сред своите милион последователи в X. „Наркотици?“ пита неизвестен потребител? „По-лошо. Хаитянци“, отговаря Ченг от своя виртуален „терен“. Напълно безпочвената информация се разпространява мълниеносно в социалната мрежа (откакто Х бе придобит от Илън Мъск, манипулативно съдържание от всякакъв вид циркулира почти безпрепятствено). „Тази да не е от Хаити? Те правят така тука“, пише друг анонимен потребител. „Чувам такива истории за кучета и котки, които ги ядат нелегалните.“ Подобни безпочвени твърдения се споделят от множество алтернативни и десни инфлуенсъри, като Чарли Кърк. Сред тях е и снимка на мигрант, държащ гъска, а фотографията всъщност е направена в град Калъмбъс, Охайо.

Въпреки че полицията в Спрингфийлд излезе с официална позиция, че не е получавала никакви сигнали за изчезнали животни, както и че в други форуми като Reddit потребители и местни не сигнализират за подобни случаи, невярната информация паразитира върху адаптирането на новодoшлите мигранти от Хаити – между 15 000 и 20 000 в град с население от 60 000 души.

Добре познати теми

Имиграцията е сред основните проблеми, които вълнуват американците. Колко зависи от това за кого гласуват и какви медии четат и гледат? Пресни данни от 9 септември 2024 г. поставят икономиката на първо място както за демократите, така и за републиканците. Оттук нататък обаче картината се променя. Ако имиграцията е на второ място за републиканците, следвана от престъпността и външната политика, то за демократите най-важните проблеми след икономиката са здравеопазването и назначенията във Върховния съд.

От януари насам „престъпленията на мигрантите“ са основна тема на коментаторските сегменти в консервативната Fox News. „Мигрантите престъпници убиват американците“, „Президентът Джо Байдън е инструмент за убийства“, „ВЪЛНА ОТ МИГРАНТИ ПРЕСТЪПНИЦИ ЗАЛИВА АМЕРИКА“ – това са стандартни заглавия в делнична вечер.

Обсебеността от мигрантите престъпници не е нещо ново. Tя е сред основните опорни точки на алтернативните медии в САЩ, както и на дясната Fox News, а още по време на първата си президентска кампания през 2016 г. Доналд Тръмп заяви, че „Мексико изпраща убийци и изнасилвачи“ в страната. Таблоиди като New York Post генерират съдържание, подобно на българските ПИК и БЛИЦ, залагайки на криминални хроники и сензационни заглавия с непропорционално отразяване на инциденти и престъпления, извършени от цветнокожи и имигранти. Множество експерти, от учени до журналисти, посочват, че крайнодясната медийна екосистема функционира като изолиран балон, в който събития се навързват в алтернативна реалност с помощта на манипулация и невярна информация.

Затова няма никакво значение, че според статистиката престъпленията срещу собствеността (кражби, палежи, обири) са много по-чести от престъпленията срещу личността (убийства, изнасилвания). Нито че честотата на престъпленията като цяло е спаднала драстично от 90-те години насам. Нито че за имигрантите е много по-малко вероятно да извършат престъпление в сравнение с родените в САЩ. Защото тук не става въпрос за факти и данни, а за история, стара като света: „те“ – различните, неудобните, цветнокожите – идват, пазете се. И ако през първите десетилетия от създаването на САЩ на мушка са били католиците, след това източноевропейците, после китайците и японците и т.н., то днес е ред на хаитяните.

Здравеопазването: идея за план

По време на дебата във вторник бившият президент Тръмп отново се закани да отмени и замени Закона за защита на пациентите и достъп до здравеопазване, познат като Obamacare. В рамките на изключително сложната система на здравеопазване в САЩ законът цели да предостави достъпно осигуряване на възможно най-много американци. За да се изговорят всички детайли, плюсове и минуси около него, няма да стигнат и два пълни материала. Но според независими експерти на този етап плюсовете са повече от минусите, поне за средностатистическите американци. Важно е също да се каже, че законът е обект на ежегодни промени и допълнения.

Попитан с какво ще замени закона, който дава на хиляди американци достъп до осигуряване, Тръмп отговори, че има идея за план. По време на първия си мандат бившият президент не успя нито да предложи конструктивни промени в законодателството, нито да го отмени въпреки няколкото си опита. В междинните избори през 2018 г. демократите поставиха здравеопазването в центъра на кампанията и така си върнаха контрола над Камарата на представителите. През 2019 г. Тръмп се закани, че ако спечели изборите, ще разкрие план, който „направо ще отвее Obamacare“. Преди да изгуби изборите през 2020 г., каза, че „след две седмици одобряваме план за здравеопазване, пълен и завършен план за здравно осигуряване“.

САЩ харчи около 18% от БВП за здравеопазване, но американците умират по-млади и боледуват повече от граждани на други страни със сходни доходи и стандарт на живот. Освен това САЩ е страната с най-висок процент на починали по предотвратими причини. Страната е единствената сред развитите, която няма изградена система за универсално здравно осигуряване.

В допълнение на тези черни статистики са най-ниска очаквана продължителност на живота при раждане в сравнение с повечето развити страни, най-голям дял хора с множество хронични заболявания, процент на затлъстяване двойно над средните стойности на ОИСР, най-висока смъртност при родилки и новородени отново в сравнителен план, както и значителна честота на самоубийствата. Дали предложенията на Камала Харис за подобряване на Obamacare ще подобрят тези стойности, няма как да знаем – мерките могат да се оценят само след като се приложат.

Независимо от партийната принадлежност американците подкрепят мерки, вече предложени от Харис, като замразяването на цените на инсулина. Това продължава опитите на администрацията на Джо Байдън за сваляне на цените на ключови животоспасяващи лекарства. Концепциите за план, от друга страна, си остават просто липса на решение за много сериозни проблеми.

Външна политика

Няколко часа след дебата Тръмп обяви, че според всички проучвания той е спечелил. Статистическите въпросници, които разчитат на представителна и методологически точна извадка, обаче сочат, че победителят е Харис. Данните, цитирани от Тръмп, са от въпросници онлайн, които позволяват на неограничен брой потребители да кликват и отговарят, независимо на каква възраст са и къде живеят. След дебата Харис води пред Тръмп с около 5%.

До ноември обаче има още много време поне ако го измерваме в предизборни дни. Двата външнополитически конфликта – в Палестина и руската инвазия в Украйна – нямат потенциал да решат изборите, но със сигурност могат да им повлияят, особено ако има ключово развитие в една или друга посока. Други важни елементи са външната политика спрямо Китай и бедственото изтегляне от Афганистан; тук можем да причислим и имиграцията, доколкото двустранните отношения със страните от Латинска Америка могат да влияят.

Политиката на Тръмп в тази област не се различава от първия му мандат: опасна игра с диктатори, славословене на авторитарни лидери като Виктор Орбан, изолационизъм и враждебност спрямо Китай и латиноамериканските имигранти – републиканският слон има какво да изпочупи в стъкларския магазин на света, много по-крехък, отколкото в периода 2016 – 2020 г. Тръмп се закани, че ще прекрати войната в Украйна, още преди да встъпи в длъжност. Но и тук, както в случая със здравеопазването, не са ясни конкретните мерки, които той смята да предприеме.

Харис напомни на гласоподавателите, че митата на Тръмп им струват скъпо – както финансово, така и по отношение на сигурността (администрацията на Байдън запази митата на Тръмп, а Харис също подкрепя определени мита като икономическа мярка). Предателство спрямо Украйна би означавало оттегляне от основни разбирания за суверенитет и териториална цялост, на които е изграден светът след Втората световна война. И това предателство може да има катастрофални последствия. В една от най-умелите си реплики Харис се обърна директно към огромната полска диаспора в щата Пенсилвания (където се намира Филаделфия, градът на провеждане на дебата): „Защо не кажеш на 800-те хиляди американци от полски произход тук, в Пенсилвания, колко бързо ще се предадеш в името на това да те покровителстват, и как, мислиш, би изглеждало едно приятелство с диктатор, който ще те изяде за закуска?“

Войната в Газа не беше основна тема по време на дебатите. Както и с Украйна, Тръмп се закани да реши конфликта бързо, без обаче да обясни как. Бившият президент избегна въпрос как би преговарял с премиера на Израел Бенямин Нетаняху и с „Хамас“, за да се стигне до прекратяване на огъня и на избиването на цивилни, както и до освобождаване на израелските заложници. Тръмп е отявлен поддръжник на Нетаняху. Камала Харис подчерта подкрепата на САЩ за Израел, но и каза, че създаването на независима палестинска държава (т.нар. двустранно решение) е надежден път към сигурност. Откритата ѝ заявка, че войната трябва да приключи, може да се интерпретира негативно в Израел.

Ден след дебата Тръмп заяви, че повече няма да участва в дебати, както и че дебатът е бил нагласен в полза на Харис, макар да го е спечелил според всички анкети и да се е справил добре. Какво друго всъщност остава да се каже? Дебатът приключи, но борбата тепърва предстои.

Големият взрив и корабът на Тезей

2024-09-13 Емилия Милчева

Post Syndicated from Емилия Милчева original https://www.toest.bg/golemiyat-vzriv-i-korabut-na-tezey/

Големият взрив и корабът на Тезей

Българската политика не е заразена от вируса на разцеплението, а от трансформация и макар процесът да започна от най-старите партии на статуквото БСП и ДПС, ще засегне всички политически сили – до една. Факторите за този рестарт се натрупваха: политическата криза, войната в Украйна, демографската криза и смяната на поколенията. Средата се променяше, променяха се и избирателите, но политиците на върха, вкопчени в своите тронове, изглеждаха непоклатими.

До днес.

Снимката

На пръв поглед изглежда като преконфигуриране на политическото пространство, но всъщност са процеси на разруха. Изхвърлянето на Корнелия Нинова като партиен лидер, така както самата тя изхвърли мнозина свои противници от партията, е само епизод от сериала „Носферату“ на „Позитано“ 20. БСП успя да се превърне в кавър на „Възраждане“, а съюзяването с бивши функционери, приети отново във VIP каютите на кораба майка, ще увеличи единствено битката за по-добрите места на горната палуба.

БСП има опит с вътрешнопартийните преврати. Така беше свален Тодор Живков, впоследствие Жан Виденов – след един денонощен инфарктен конгрес на БСП. А Корнелия Нинова направо беше изхвърлена от партията без шанс за завръщане, което потвърди и съдът. Дали това отваря път за коалиция с ГЕРБ, след като Нинова винаги е поддържала ролята на БСП като опозиция? Лидерът на ГЕРБ Бойко Борисов пръв обяви евентуална коалиция след 27 октомври.

Естествените партньори са тези, които бяхме в „сглобката“ [ГЕРБ, ПП–ДБ и ДПС – б.а.], плюс БСП и ИТН.

Временният председател на БСП Атанас Зафиров се измъкна от категоричен отговор – „ние в този пазарлък не участваме“, „имаме си колективни органи“ и т.н. Само преди три месеца беше казал, че БСП няма да участва в кабинет с ГЕРБ–СДС и ДПС „под никаква форма“.

Никой от ДПС-тата не коментира „естествените партньори“ – нито от фракцията на естествения партньор Пеевски, нито от „автентичното“ ДПС, скупчено около почетния председател Доган. Но вече е ясно как двете части ще се явят на избори – „ДПС – Ново начало“ (Пеевски) и „Алианс за права и свободи“ (Доган).

„Наблюдаваме агонията на динозаврите.“ Така социоложката Боряна Димитрова от „Алфа Рисърч“ определи по БНР случващото се в БСП и ДПС.

Това са единствените две партии, които присъстват в българския парламент от началото на Прехода. Сега наблюдаваме разпада на един модел, който специално за ДПС, а и в много моменти от историята на БСП беше антидемократичен и авторитарен, оплетен в доста икономически обвързаности.

Корабът

В класическия философски парадокс за кораба на Тезей се поставя въпросът дали един обект остава същият, ако всички негови части постепенно се заменят. Ще бъде ли пак същият кораб, ако оригиналните дъски до една бъдат подменени поетапно, докато той все още плава, пита Плутарх. Тоест парадоксът подсказва, че системата се самовзривява, когато промените в нейните части станат толкова значителни, че нарушат основната ѝ идентичност.

Най-старите системни партии – БСП и ДПС – се срутиха, защото отдавна са изгубили връзка със своята изначална мисия/кауза, в политическата история те ще бъдат запомнени като „майка“ и „баща“ на клептократичния модел, надграден от ГЕРБ. Лявото не е идентификация за БСП, нито либерализмът за ДПС, чиято авторитарна конструкция е несъвместима с въпросната политическа философия с основна цел свободата.

Социалният популизъм на БСП вече трудно прилъгва избиратели, освен носталгиците по социализма, а и те намаляват. Докато управляваше, партията следваше не принципите и целите на истинската лява политика, а на обогатяване на елита си и на олигархичните си структури. Този модел е до голяма степен наследен от БКП, управлявала еднопартийната система с авторитарен режим, обслужвайки комунистическата върхушка. Голямото неравенство в България, за което толкова много обичат да говорят политиците от „Позитано“ 20, е резултат от модела на Прехода, зададен от функционери на същата тази партия и от неолибералните политики, които прегърна. Тези „другари“ заедно с високопоставени служители на ДС имаха достъп до вътрешна информация, връзки и ресурси, които им позволиха да сключат съмнителни сделки, придобивайки държавни активи на много ниски цени.

Принос за този модел има и ДПС с ролята си на „балансьор“ между различните кланове и проводник на интересите на групировки, създадени от хора на бившите служби. И ако ДПС случи на избиратели, които не се разбягаха поради спойката на етноса, при БСП отливът беше на вълни и те се отдръпваха след всяко управление на т.нар. социалисти.

Първата беше след националната катастрофа 1996–1997 г., когато фалираха 16 банки, бюджетът претърпя осем актуализации за две години, а хиперинфлацията доведе до масово обедняване и икономически колапс. Следващата беше след Тройната коалиция, когато се роди ГЕРБ и БСП никога не се върна на власт освен с отделни министри. През този период БСП се превърна в националпопулистка и консервативна политическа сила, радикализира проруската си ориентация, а резултатът е, че разпадът ще продължи до пълното ѝ маргинализиране.

Такъв край очаква и ДПС, макар и не толкова бързо. Скоростта зависи от готовността на останалите политически сили да преодолеят предубежденията си за представители в листите на други етноси, различни от българския. Докато този вот остава сравнително монолитен, той ще гарантира политическо представителство за клановете, които ДПС представлява, не за избирателите, които нямат илюзии за порциите, раздавани на техен гръб. Но всесилното и мълчаливо ДПС, което познаваме, приключи.

Капанът на тази амбивалентност, каквато представляват интересите на управляващите кланове и партийната идеология, щракна и за БСП, и за ДПС. Партии, управлявани от милионери, които биват избирани от бедняци.

Първите преяли с власт се пръснаха.

Последствията

Разрухата в двете системни партии ще прекрои възможните коалиции. Отношението към ДПС 1 и ДПС 2 има потенциала да се превърне в червена линия за политическите сили. Ако и двете фракции на ДПС получат парламентарно представителство – напълно възможно при ниска избирателна активност, – всяка от тях би участвала в различни управленски сдружения. Например ИТН може да се комбинира с „Ново начало“, но не и с „Алианс за права и свободи“, а БСП и широката ѝ коалиция вероятно биха избрали „автентичността“. ПП–ДБ декларират неколкократно, че ще разграждат модела „Пеевски“, макар да бяха го привлекли като партньор в конституционно (и не само) мнозинство.

Кого ще предпочете ГЕРБ, чиито политици са така предпазливи (както и всички останали впрочем) за конфликта в Движението за права и свободи? Несъмнено Борисов ще изчака, макар че развръзката едва ли ще настъпи с изборите, а с предстоящия избор на главен прокурор, който Висшият съдебен съвет възнамерява да направи на 16 януари. Не е имало избор на главен прокурор, чието име да не е известно най-напред в един тесен политически кръг. Изборът на Никола Филчев е бил „мотивиран“ от председателя на 38-мия парламент Йордан Соколов, което потвърждава преди години по bTV председателката на Съюза на съдиите Нели Куцкова.

За избора на настоящия [към 2006 г. – б.р.] главен прокурор Никола Филчев имаше политически натиск върху някои от членовете на Висшия съдебен съвет.

Следващият главен прокурор Борис Велчев се издигна на поста, след като бе съветник на президента Георги Първанов. За Сотир Цацаров паметна ще остане фразата на един бивш вече градски прокурор към Бойко Борисов: „Не ми се подсмихвай, ти си го избра!“ Иван Гешев, някога наблюдаващ прокурор на делото КТБ, се издигна до главен, след като така и не разпозна Пеевски в документите за фалита на четвъртата по големина банка в България.

Разпадът в ДПС и БСП прави непредсказуеми резултатите от изборите на членове на ВСС от парламентарната квота, за да може нов ВСС да определи следващия главен прокурор. В края на юли Народното събрание удължи на 6 месеца срока, в който парламентът и органите на съдебната власт започват процедура за нов избор на членове на ВСС. Този срок изтича в края на януари. А според хронограмата на кадровиците на съдебната власт окончателният избор на нов главен прокурор трябва да е на 16 януари. Юристи отчитат, че шансовете да бъде спряна процедурата са нищожни.

Бързат клановете, бърза Борисов, бърза Пеевски. Колко му е в мътната вода на блатото да се улови един нов филчеввелчевцацаровгешев.

Скоро нищо няма да е същото. Идва времето и на ГЕРБ.

По буквите: Георги Господинов

2024-09-13 Зорница Христова

Post Syndicated from Зорница Христова original https://www.toest.bg/po-bukvite-georgi-gospodinov/

„Градинарят и смъртта“ от Георги Господинов

По буквите: Георги Господинов

Пловдив: изд. „Жанет 45“, 2024

Имам една любима картина на финския художник Хуго Симберг – „Градината на Смъртта“. В нея Смъртта кротко полива подредени върху висока леха крехки растения. Самата Смърт също е крехка, вглъбена, грижовна. Не знаем дали е мъж, или жена – бял скелет с черна пелерина, тъмнозелена лейка, едната ръка опряна на лехата, зад която се подава тънка бяла пета. В съседство виси бяла кърпа. На заден план – още лехи, още смърти. И все тъй нежни.

Свикнали сме да мислим, че градината е обратното на смъртта. Да градинарстваш значи да заставаш на страната на живота – през август хортензиите ми издържат само ден, ако не ги полея. Разбира се, природата няма много нужда от подобно съучастничество; тя прекрасно се справя и без нас. Градинарят обаче се грижи за конкретния живот, не за живота като цяло. Пази го от суша и от врагове, пресажда го, помага му да се прероди.

„Градинарят и смъртта“ е такава книга. Така се случи, че познавам много книги за смъртта – някои прекалено отблизо. „Годината на магическото мислене“ на Джоун Дидиън я преведох; тя описва в детайли какво чувства човек през първата година след опустошителна загуба. Такъв е и „Дневник на скръбта“ на Ролан Барт. И двете са болезнено точни; могат да ви накарат да бъдете мили с близките си, ако са още живи, и да се стреснете от образа си в огледалото, ако не са. Или да усетите несподелимото като все пак споделено.

Това, което изцяло отсъства и от двете книги, е образът на човека, който си е отишъл. Този образ присъства като отпечатък, разранен отпечатък. Други книги – всички биографии, да речем – се опитват да направят обратното: да възстановят отделния живот като разказ, в който смъртта е само (макар и финален) епизод.

„Градинарят и смъртта“ на Георги Господинов не е нито биография, нито дневник на скръбта – и все пак някак е и двете. Тя е мрежа за улавяне на живота, конкретния живот, който ни се изплъзва. Мрежа от истории. В романа „В памет на паметта“ Мария Степанова говори за една особена привилегия, срещу която никой още не се е сетил да се бори – привилегията на онези, чийто живот е интересен, онези, които запомняме. И това е дарът, който писателят може да направи за своите близки – да ги запази в истории, защото паметта е устроена като литературата.

„Градинарят и смъртта“ е портрет, който се състои от истории – подредени в привидно безредната, асоциативна вселена на паметта. Истории за бащата и истории за отпечатъка, за раната от неговото отсъствие.

Тук книгата се родее със сборника „Бащите не си отиват“ на издателство ICU, и с книги като „Наивно изкуство“ на Марин Бодаков. Включва се в улавянето на бащиния образ чрез истории на сборника, подхваща теми, които звучат в стихосбирката –

Каква супа е обичал баща ми като дете, дояждал ли си е чинията, какво е криел под голямата възглавница, на коя дума от приказките е заспивал, наистина ли се е плашил от бомбардировките, никога няма да науча това, моето отечество е безвъзвратно загубено.

Бих казала, че за първи път у нас едно поколение писатели (и не само) осмисля на глас смъртта и загубата – и отваря пространство за разговор към другите. „Градинарят и смъртта“ е част от този опит за братство пред бездната. И това е сила, не недостатък на книгата.

Кога, ако не в осиротяването на света, имаме нужда от общност, от действителния смисъл на „аз сме“?

Именно общото е и начинът да избегнеш дилемата при всяко говорене за мъртвите. Обичта те заставя да кажеш „или добро, или нищо“ – но паметта не понася плоски образи, едностранчиви герои, семейни светци. Без сянка образът се изтрива – или по-лошо, бива подменен и този, когото уж сме издърпали от подземното царство, се оказва някой друг, благообразно усмихнат… двойник. „Градинарят и смъртта“ решава тази дилема, като разказва за сянката – но я скрива в полето от сенки, в поколенческото, в общото. Бащите ни пускаха шамарената фабрика, казва книгата. Или „бащите ни отсъстваха“.

Книгата черпи от утехата на общото – но и умее да я дава, милостива е към читателя, който търси изречения за пластир върху раната. Някои са прастари коренища, за които човечеството се лови от векове, като ботаническото безсмъртие и Сенека. Други са нови. В трети се усещат – и предават – терапевтични модели. По-важното е, че споделяйки, книгата разширява отвореното от други книги пространство за споделяне, пуска в себе си читателя да разкаже и своята история, историята на своята мъка и на своите мъртви. И да каже наум – като автора в една среща от книгата – „и аз“.

Активните дарители на „Тоест“ получават постоянна отстъпка в размер на 20% от коричната цена на всички заглавия от каталозите на „Жанет 45“, „Лист“, ICU и няколко други български издателства в рамките на партньорската програма Читателски клуб „Тоест“. За повече информация прочетете на toest.bg/club.

В емблематичната си колонка, започната още през 2008 г. във в-к „Култура“, Марин Бодаков ни представяше нови литературни заглавия и питаше с какво точно тези книги ни променят. Вярваме, че е важно тази рубрика да продължи. От човек до човек, с нова книга в ръка.

Docker Raises Prices Up to 80 Percent and More

2024-09-13 Cliff Robinson

Post Syndicated from Cliff Robinson original https://www.servethehome.com/docker-raises-prices-up-to-80-percent-and-more/

Docker Pro annual pricing is increaseing by 80 percent while some of the benefits are being adjusted to the subscription level

The post Docker Raises Prices Up to 80 Percent and More appeared first on ServeTheHome.

Craters

2024-09-13 xkcd.com

Post Syndicated from xkcd.com original https://xkcd.com/2985/

It's annoying that the Nastapoka Arc isn't a meteor impact crater, but I truly believe that--with enough time, effort, and determination--we could make it one.

The Demise of Long Henry

2024-09-13 The History Guy: History Deserves to Be Remembered

Post Syndicated from The History Guy: History Deserves to Be Remembered original https://www.youtube.com/watch?v=0npyrE6qBmE

Amazon RDS for MySQL zero-ETL integration with Amazon Redshift, now generally available, enables near real-time analytics

2024-09-13 Matheus Guimaraes

Post Syndicated from Matheus Guimaraes original https://aws.amazon.com/blogs/aws/amazon-rds-for-mysql-zero-etl-integration-with-amazon-redshift-now-generally-available-enables-near-real-time-analytics/

Zero-ETL integrations help unify your data across applications and data sources for holistic insights and breaking data silos. They provide a fully managed, no-code, near real-time solution for making petabytes of transactional data available in Amazon Redshift within seconds of data being written into Amazon Relational Database Service (Amazon RDS) for MySQL. This eliminates the need to create your own ETL jobs simplifying data ingestion, reducing your operational overhead and potentially lowering your overall data processing costs. Last year, we announced the general availability of zero-ETL integration with Amazon Redshift for Amazon Aurora MySQL-Compatible Edition as well as the availability in preview of Aurora PostgreSQL-Compatible Edition, Amazon DynamoDB, and RDS for MySQL.

I am happy to announce that Amazon RDS for MySQL zero-ETL with Amazon Redshift is now generally available. This release also includes new features such as data filtering, support for multiple integrations, and the ability to configure zero-ETL integrations in your AWS CloudFormation template.

In this post, I’ll show how you can get started with data filtering and consolidating your data across multiple databases and data warehouses. For a step-by-step walkthrough on how to set up zero-ETL integrations, see this blog post for a description of how to set one up for Aurora MySQL-Compatible, which offers a very similar experience.

Data filtering
Most companies, no matter the size, can benefit from adding filtering to their ETL jobs. A typical use case is to reduce data processing and storage costs by selecting only the subset of data needed to replicate from their production databases. Another is to exclude personally identifiable information (PII) from a report’s dataset. For example, a business in healthcare might want to exclude sensitive patient information when replicating data to build aggregate reports analyzing recent patient cases. Similarly, an e-commerce store may want to make customer spending patterns available to their marketing department, but exclude any identifying information. Conversely, there are certain cases when you might not want to use filtering, such as when making data available to fraud detection teams that need all the data in near real time to make inferences. These are just a few examples, so I encourage you to experiment and discover different use cases that might apply to your organization.

There are two ways to enable filtering in your zero-ETL integrations: when you first create the integration or by modifying an existing integration. Either way, you will find this option on the “Source” step of the zero-ETL creation wizard.

You apply filters by entering filter expressions that can be used to either include or exclude databases or tables from the dataset in the format of database*.table*. You can add multiple expressions and they will be evaluated in order from left to right.

If you’re modifying an existing integration, the new filtering rules will apply from that point in time on after you confirm your changes and Amazon Redshift will drop tables that are no longer part of the filter.

If you want to dive deeper, I recommend you read this blog post, which goes in depth into how you can set up data filters for Amazon Aurora zero-ETL integrations since the steps and concepts are very similar.

Create multiple zero-ETL integrations from a single database
You are now also able to configure up integrations from a single RDS for MySQL database to up to 5 Amazon Redshift data warehouses. The only requirement is that you must wait for the first integration to finish setting up successfully before adding others.

This allows you to share transactional data with different teams while providing them ownership over their own data warehouses for their specific use cases. For example, you can also use this in conjunction with data filtering to fan out different sets of data to development, staging, and production Amazon Redshift clusters from the same Amazon RDS production database.

Another interesting scenario where this could be really useful is consolidation of Amazon Redshift clusters by using zero-ETL to replicate to different warehouses. You could also use Amazon Redshift materialized views to explore your data, power your Amazon Quicksight dashboards, share data, train jobs in Amazon SageMaker, and more.

Conclusion
RDS for MySQL zero-ETL integrations with Amazon Redshift allows you to replicate data for near real-time analytics without needing to build and manage complex data pipelines. It is generally available today with the ability to add filter expressions to include or exclude databases and tables from the replicated data sets. You can now also set up multiple integrations from the same source RDS for MySQL database to different Amazon Redshift warehouses or create integrations from different sources to consolidate data into one data warehouse.

This zero-ETL integration is available for RDS for MySQL versions 8.0.32 and later, Amazon Redshift Serverless, and Amazon Redshift RA3 instance types in supported AWS Regions.

In addition to using the AWS Management Console, you can also set up a zero-ETL integration via the AWS Command Line Interface (AWS CLI) and by using an AWS SDK such as boto3, the official AWS SDK for Python.

See the documentation to learn more about working with zero-ETL integrations.

— Matheus Guimaraes

VirtualBox 7.1.0 released

2024-09-13 jzb

Post Syndicated from jzb original https://lwn.net/Articles/990125/

Version
7.1.0 of the VirtualBox virtualization system has been
released. Changes include a major GUI update, a new Network Address
Translation (NAT) engine with IPv6 support, shared clipboard support on Wayland, and more.

Harness Zero Copy data sharing from Salesforce Data Cloud to Amazon Redshift for Unified Analytics – Part 2

2024-09-12 Rajkumar Irudayaraj

Post Syndicated from Rajkumar Irudayaraj original https://aws.amazon.com/blogs/big-data/harness-zero-copy-data-sharing-from-salesforce-data-cloud-to-amazon-redshift-for-unified-analytics-part-2/

In the era of digital transformation and data-driven decision making, organizations must rapidly harness insights from their data to deliver exceptional customer experiences and gain competitive advantage. Salesforce and Amazon have collaborated to help customers unlock value from unified data and accelerate time to insights with bidirectional Zero Copy data sharing between Salesforce Data Cloud and Amazon Redshift.

In the Part 1 of this series, we discussed how to configure data sharing between Salesforce Data Cloud and customers’ AWS accounts in the same AWS Region. In this post, we discuss the architecture and implementation details of cross-Region data sharing between Salesforce Data Cloud and customers’ AWS accounts.

Solution overview

Salesforce Data Cloud provides a point-and-click experience to share data with a customer’s AWS account. On the AWS Lake Formation console, you can accept the datashare, create the resource link, mount Salesforce Data Cloud objects as data catalog views, and grant permissions to query the live and unified data in Amazon Redshift. Cross-Region data sharing between Salesforce Data Cloud and a customer’s AWS accounts is supported for two deployment scenarios: Amazon Redshift Serverless and Redshift provisioned clusters (RA3).

Cross-Region data sharing with Redshift Serverless

The following architecture diagram depicts the steps for setting up a cross-Region datashare between a Data Cloud instance in US-WEST-2 with Redshift Serverless in US-EAST-1.

Cross-Region data sharing set up consists of the following steps:

The Data Cloud admin identifies the objects to be shared and creates a Data Share in the data cloud provisioned in the US-WEST-2
The Data Cloud admin links the Data Share with the Amazon Redshift Data Share target. This creates an AWS Glue Data Catalog view and a cross-account Lake Formation resource share using the AWS Resource Access Manager (RAM) with the customer’s AWS account in US-WEST-2.
The customer’s Lake Formation admin accepts the datashare invitation in US-WEST-2 from the Lake Formation console and grants default (select and describe) permissions to an AWS Identity and Access Management (IAM) principal.
The Lake Formation admin switches to US-EAST-1 and creates a resource link pointing to the shared database in the US-WEST-2 Region.
The IAM principal can log in to the Amazon Redshift query editor in US-EAST-1 and creates an external schema referencing the datashare resource link. The data can be queried through these external tables.

Cross-Region data sharing with a Redshift provisioned cluster

Cross-Region data sharing across Salesforce Data Cloud and a Redshift provisioned cluster requires additional steps on top of the Serverless set up. Based on the Amazon Redshift Spectrum considerations, the provisioned cluster and the Amazon Simple Storage Service (Amazon S3) bucket must be in the same Region for Redshift external tables. The following architecture depicts a design pattern and steps to share data with Redshift provisioned clusters.

Steps 1–5 in the set up remain the same across Redshift Serverless and provisioned cluster cross-Region sharing. Encryption must be enabled on both Redshift Serverless and the provisioned cluster. Listed below are the additional steps:

Create a table from datashare data with the CREATE TABLE AS SELECT Create a datashare in Redshift serverless and grant access to the Redshift provisioned cluster.
Create a database in the Redshift provisioned cluster and grant access to the target IAM principals. The datashare is ready for query.

The new table needs to be refreshed periodically to get the latest data from the shared Data Cloud objects with this solution.

Considerations when using data sharing in Amazon Redshift

For a comprehensive list of considerations and limitations of data sharing, refer to Considerations when using data sharing in Amazon Redshift. Some of the important ones for Zero Copy data sharing includes:

Data sharing is supported for all provisioned RA3 instance types (ra3.16xlarge, ra3.4xlarge, and ra3.xlplus) and Redshift Serverless. It isn’t supported for clusters with DC and DS node types.
For cross-account and cross-Region data sharing, both the producer and consumer clusters and serverless namespaces must be encrypted. However, they don’t need to share the same encryption key.
Data Catalog multi-engine views are generally available in commercial Regions where Lake Formation, the Data Catalog, Amazon Redshift, and Amazon Athena are available.
Cross-Region sharing is available in all LakeFormation supported regions.

Prerequisites

The prerequisites remain the same across same-Region and cross-Region data sharing, which are required before proceeding with the setup.

Configure cross-Region data sharing

The steps to create a datashare, create a datashare target, link the datashare target to the datashare, and accept the datashare in Lake Formation remain the same across same-Region and cross-Region data sharing. Refer to Part 1 of this series to complete the setup.

Cross-Region data sharing with Redshift Serverless

If you’re using Redshift Serverless, complete the following steps:

On the Lake Formation console, choose Databases in the navigation pane.
Choose Create database.
Under Database details¸ select Resource link.
For Resource link name, enter a name for the resource link.
For Shared database’s region, choose the Data Catalog view source Region.
The Shared database and Shared database’s owner ID fields are populated manually from the database metadata.
Choose Create to complete the setup.

The resource link appears on the Databases page on the Lake Formation console, as shown in the following screenshot.

Launch Redshift Query Editor v2 for the Redshift Serverless workspace The cross-region data share tables are auto-mounted and appear under awsdatacatalog. To query, run the following command and create an external schema. Specify the resource link as the Data Catalog database, the Redshift Serverless Region, and the AWS account ID.
```
CREATE external SCHEMA cross_region_data_share --<<SCHEMA_NAME>>
FROM DATA CATALOG DATABASE 'cross-region-data-share' --<<RESOURCE_LINK_NAME>>
REGION 'us-east-1' --<TARGET_REGION>
IAM_ROLE 'SESSION' CATALOG_ID '<<aws_account_id>>'; --<<REDSHIFT AWS ACCOUNT ID>>
```
Refresh the schemas to view the external schema created in the dev database
Run the show tables command to check the shared objects under the external database:
```
SHOW TABLES FROM SCHEMA dev.cross_region_data_share --<<schema name>>
```

Query the datashare as shown in the following screenshot.

SELECT * FROM dev.cross_region_data_share.churn_modellingcsv_tableaus3_dlm; --<<change schema name & table name>>

Cross-Region data sharing with Redshift provisioned cluster

This section is a continuation of the previous section with additional steps needed for data sharing to work when the consumer is a provisioned Redshift cluster. Refer to Sharing data in Amazon Redshift and Sharing datashares for a deeper understanding of concepts and the implementation steps.

Create a new schema and table in the Redshift Serverless in the consumer Region:

CREATE SCHEMA customer360_data_share;
CREATE TABLE customer360_data_share. customer_churn as
SELECT * from dev.cross_region_data_share.churn_modellingcsv_tableaus3__dlm;

Get the namespace for the Redshift Serverless (producer) and Redshift provisioned cluster (consumer) by running the following query in each cluster:
```
select current_namespace
```

Create a datashare in the Redshift Serverless (producer) and grant usage to the Redshift provisioned cluster (consumer). Set the datashare, schema, and table names to the appropriate values, and set the namespace to the consumer namespace.

CREATE DATASHARE customer360_redshift_data_share;
ALTER DATASHARE customer360_redshift_data_share ADD SCHEMA customer360_data_share;
ALTER DATASHARE customer360_redshift_data_share ADD TABLE customer360_data_share.customer_churn; 
GRANT USAGE ON DATASHARE customer360_redshift_data_share 
TO NAMESPACE '5709a006-6ac3-4a0c-a609-d740640d3080'; --<<Data Share Consumer Namespace>>

Log in as a superuser in the Redshift provisioned cluster, create a database from the datashare, and grant permissions. Refer to managing permissions for Amazon Redshift datashare for detailed guidance.

The datashare is now ready for query.

You can periodically refresh the table you created to get the latest data from the data cloud based on your business requirement.

Conclusion

Zero Copy data sharing between Salesforce Data Cloud and Amazon Redshift represents a significant advancement in how organizations can use their customer 360 data. By eliminating the need for data movement, this approach offers real-time insights, reduced costs, and enhanced security. As businesses continue to prioritize data-driven decision-making, Zero Copy data sharing will play a crucial role in unlocking the full potential of customer data across platforms.

This integration empowers organizations to break down data silos, accelerate analytics, and drive more agile customer-centric strategies. To learn more, refer to the following resources:

About the Authors

Rajkumar Irudayaraj is a Senior Product Director at Salesforce with over 20 years of experience in data platforms and services, with a passion for delivering data-powered experiences to customers.

Sriram Sethuraman is a Senior Manager in Salesforce Data Cloud product management. He has been building products for over 9 years using big data technologies. In his current role at Salesforce, Sriram works on Zero Copy integration with major data lake partners and helps customers deliver value with their data strategies.

Jason Berkowitz is a Senior Product Manager with AWS Lake Formation. He comes from a background in machine learning and data lake architectures. He helps customers become data-driven.

Ravi Bhattiprolu is a Senior Partner Solutions Architect at AWS. Ravi works with strategic ISV partners, Salesforce and Tableau, to deliver innovative and well-architected products and solutions that help joint customers achieve their business and technical objectives.

Avijit Goswami is a Principal Solutions Architect at AWS specialized in data and analytics. He supports AWS strategic customers in building high-performing, secure, and scalable data lake solutions on AWS using AWS managed services and open source solutions. Outside of his work, Avijit likes to travel, hike, watch sports, and listen to music.

Ife Stewart is a Principal Solutions Architect in the Strategic ISV segment at AWS. She has been engaged with Salesforce Data Cloud over the last 2 years to help build integrated customer experiences across Salesforce and AWS. Ife has over 10 years of experience in technology. She is an advocate for diversity and inclusion in the technology field.

Michael Chess is a Technical Product Manager at AWS Lake Formation. He focuses on improving data permissions across the data lake. He is passionate about enabling customers to build and optimize their data lakes to meet stringent security requirements.

Mike Patterson is a Senior Customer Solutions Manager in the Strategic ISV segment at AWS. He has partnered with Salesforce Data Cloud to align business objectives with innovative AWS solutions to achieve impactful customer experiences. In his spare time, he enjoys spending time with his family, sports, and outdoor activities.

The AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables

2024-09-12 Sandeep Adwankar

Post Syndicated from Sandeep Adwankar original https://aws.amazon.com/blogs/big-data/the-aws-glue-data-catalog-now-supports-storage-optimization-of-apache-iceberg-tables/

The AWS Glue Data Catalog now enhances managed table optimization of Apache Iceberg tables by automatically removing data files that are no longer needed. Along with the Glue Data Catalog’s automated compaction feature, these storage optimizations can help you reduce metadata overhead, control storage costs, and improve query performance.

Iceberg creates a new version called a snapshot for every change to the data in the table. Iceberg has features like time travel and rollback that allow you to query data lake snapshots or roll back to previous versions. As more table changes are made, more data files are created. In addition, any failures during writing to Iceberg tables will create data files that aren’t referenced in snapshots, also known as orphan files. Time travel features, though useful, may conflict with regulations like GDPR that require permanent data deletion. Because time travel allows accessing data through historical snapshots, additional safeguards are needed to maintain compliance with data privacy laws. To control storage costs and comply with regulations, many organizations have created custom data pipelines that periodically expire snapshots in a table that are no longer needed and remove orphan files. However, building these custom pipelines is time-consuming and expensive.

With this launch, you can enable Glue Data Catalog table optimization to include snapshot and orphan data management along with compaction. You can enable this by providing configurations such as a default retention period and maximum days to keep orphan files. The Glue Data Catalog monitors tables daily, removes snapshots from table metadata, and removes the data files and orphan files that are no longer needed. The Glue Data Catalog honors retention policies for Iceberg branches and tags referencing snapshots. You can now get an always-optimized Amazon Simple Storage Service (Amazon S3) layout by automatically removing expired snapshots and orphan files. You can view the history of data, manifest, manifest lists, and orphan files deleted from the table optimization tab on the AWS Glue Data Catalog console.

In this post, we show how to enable managed retention and orphan file deletion on an Apache Iceberg table for storage optimization.

Solution overview

For this post, we use a table called customer in the iceberg_blog_db database, where data is added continuously by a streaming application—around 10,000 records (file size less than 100 KB) every 10 minutes, which includes change data capture (CDC) as well. The customer table data and metadata are stored in the S3 bucket. Because the data is updated and deleted as part of CDC, new snapshots are created for every change to the data in the table.

Managed compaction is enabled on this table for query optimization, which results in new snapshots being created when compaction rewrites several small files into a few compacted files, leaving the old small files in storage. This results in data and metadata in Amazon S3 growing at a rapid pace, which can become cost-prohibitive.

Snapshots are timestamped versions of an iceberg table. Snapshot retention configurations allow customers to enforce how long to retain snapshots and how many snapshots to retain. Configuring a snapshot retention optimizer can help manage storage overhead by removing older, unnecessary snapshots and their underlying files.

Orphan files are files that are no longer referenced by the Iceberg table metadata. These files can accumulate over time, especially after operations like table deletions or failed ETL jobs. Enabling orphan file deletion allows AWS Glue to periodically identify and remove these unnecessary files, freeing up storage.

The following diagram illustrates the architecture.

architecture

In the following sections, we demonstrate how to enable managed retention and orphan file deletion on the AWS Glue managed Iceberg table.

Prerequisite

Have an AWS account. If you don’t have an account, you can create one.

Set up resources with AWS CloudFormation

This post includes a CloudFormation template for a quick setup. You can review and customize it to suit your needs. The template generates the following resources:

An S3 bucket to store the dataset, Glue job scripts, and so on
Data Catalog database
An AWS Glue job that creates and modifies sample customer data in your S3 bucket with a Trigger every 10 mins
AWS Identity and Access Management (AWS IAM) roles and policies – glueroleoutput

To launch the CloudFormation stack, complete the following steps:

Sign in to the AWS CloudFormation console.
Choose Launch Stack.
Choose Next.
Leave the parameters as default or make appropriate changes based on your requirements, then choose Next.
Review the details on the final page and select I acknowledge that AWS CloudFormation might create IAM resources.
Choose Create.

This stack can take around 5-10 minutes to complete, after which you can view the deployed stack on the AWS CloudFormation console.

CFN

Note down the role glueroleouput value that will be used when enabling optimization setup.

From the Amazon S3 console, note the Amazon S3 bucket and you can monitor how the data will be continuously updated every 10 mins with the AWS Glue Job.

S3 buckets

Enable snapshot retention

We want to remove metadata and data files of snapshots older than 1 day and the number of snapshots to retain a maximum of 1. To enable snapshot expiry, you enable snapshot retention on the customer table by setting the retention configuration as shown in the following steps, and AWS Glue will run background operations to perform these table maintenance operations, enforcing these settings one time per day.

Sign in to the AWS Glue console as an administrator.
Under Data Catalog in the navigation pane, choose Tables.
Search for and select the customer table.
On the Actions menu, choose Enable under Optimization.
Specify your optimization settings by selecting Snapshot retention.
Under Optimization configuration, select Customize settings and provide the following:
1. For IAM role, choose role created as CloudFormation resource.
2. Set Snapshot retention period as 1 day.
3. Set Minimum snapshots to retain as 1.
4. Choose Yes for Delete expire files.
Select the acknowledgement check box and choose Enable.

optimization enable

Alternatively, you can install or update the latest AWS Command Line Interface (AWS CLI) version to run the AWS CLI to enable snapshot retention. For instructions, refer to Installing or updating the latest version of the AWS CLI. Use the following code to enable snapshot retention:

aws glue create-table-optimizer
--catalog-id 112233445566
--database-name iceberg_blog_db
--table-name customer
--table-optimizer-configuration
'{
"roleArn": "arn:aws:iam::112233445566:role/<glueroleoutput>",
"enabled": true,
"retentionConfiguration": {
"icebergConfiguration": {
"snapshotRetentionPeriodInDays": 1,
"numberOfSnapshotsToRetain": 1,
"cleanExpiredFiles": true
}
}
}'
--type retention
--region us-east-1

Enable orphan file deletion

We want to remove metadata and data files that aren’t referenced of snapshots older than 1 day and the number of snapshots to retain a maximum of 1. Complete the steps to enable orphan file deletion on the customer table, and AWS Glue will run background operations to perform these table maintenance operations enforcing these settings one time per day.

Under Optimization configuration, select Customize settings and provide the following:
1. For IAM role, choose role created as CloudFormation resource.
2. Set Delete orphan file period as 1 day.
Select the acknowledgement check box and choose Enable.

Alternatively, you can use the AWS CLI to enable orphan file deletion:

aws glue create-table-optimizer
--catalog-id 112233445566
--database-name iceberg_blog_db
--table-name customer
--table-optimizer-configuration
'{
"roleArn": "arn:aws:iam::112233445566:role/<glueroleoutput>",
"enabled": true,
"orphanFileDeletionConfiguration": {
"icebergConfiguration": {
"orphanFileRetentionPeriodInDays": 1
}
}
}'
--type orphan_file_deletion
--region us-east-1

Based on the optimizer configuration, you will start seeing the optimization history in the AWS Glue Data Catalog

runs

Validate the solution

To validate the snapshot retention and orphan file deletion configuration, complete the following steps:

Sign in to the AWS Glue console as an administrator.
Under Data Catalog in the navigation pane, choose Tables.
Search for and choose the customer table.
Choose the Table optimization tab to view the optimization job run history.

runs

Alternatively, you can use the AWS CLI to verify snapshot retention:

aws glue get-table-optimizer --catalog-id 112233445566 --database-name iceberg_blog_db --table-name customer --type retention

You can also use the AWS CLI to verify orphan file deletion:

aws glue get-table-optimizer --catalog-id 112233445566 --database-name iceberg_blog_db --table-name customer --type orphan_file_deletion

Monitor CloudWatch metrics for Amazon S3

The following metrics show a steep increase in the bucket size as streaming of customer data happens along with CDC, leading to an increase in the metadata and data objects as snapshots are created. When snapshot retention (“snapshotRetentionPeriodInDays“: 1, “numberOfSnapshotsToRetain“: 50) and orphan file deletion (“orphanFileRetentionPeriodInDays“: 1) enabled, there is drop in the total bucket size for the customer prefix and the total number of objects as the maintenance takes place, eventually leading to optimized storage.

metrics

Clean up

To avoid incurring future charges, delete the resources you created in the Glue, Data Catalog, and S3 bucket used for storage.

Conclusion

Two of the key features of Iceberg are time travel and rollbacks, allowing you to query data at previous points in time and roll back unwanted changes to your tables. This is facilitated through the concept of Iceberg snapshots, which are a complete set of data files in the table at a point in time. With these new releases, the Data Catalog now provides storage optimizations that can help you reduce metadata overhead, control storage costs, and improve query performance.

To learn more about using the AWS Glue Data Catalog, refer to Optimizing Iceberg Tables.

A special thanks to everyone who contributed to the launch: Sangeet Lohariwala, Arvin Mohanty, Juan Santillan, Sandya Krishnanand, Mert Hocanin, Yanting Zhang and Shyam Rathi.

About the Authors

Sandeep Adwankar is a Senior Product Manager at AWS. Based in the California Bay Area, he works with customers around the globe to translate business and technical requirements into products that enable customers to improve how they manage, secure, and access data.

Srividya Parthasarathy is a Senior Big Data Architect on the AWS Lake Formation team. She enjoys building data mesh solutions and sharing them with the community.

Paul Villena is a Senior Analytics Solutions Architect in AWS with expertise in building modern data and analytics solutions to drive business value. He works with customers to help them harness the power of the cloud. His areas of interests are infrastructure as code, serverless technologies, and coding in Python.

1983: The World’s Most Dangerous Year?

2024-09-12 The History Guy: History Deserves to Be Remembered

Post Syndicated from The History Guy: History Deserves to Be Remembered original https://www.youtube.com/watch?v=6NmikdVRXcc

Pete Buttigieg and Others on the Future of Mobility, Work, and AI | The Atlantic Festival 2024

2024-09-12 The Atlantic

Post Syndicated from The Atlantic original https://www.youtube.com/watch?v=DAJVYQXb4qA

Mammotion #YUKA Robot Mower

2024-09-12 digiblur DIY

Post Syndicated from digiblur DIY original https://www.youtube.com/watch?v=oqYcB8G3Bzg

2025 ZettaFLOPS Scale Compute as Oracle Looks to Operate Hundreds of Thousands of NVIDIA GPUs

2024-09-12 Cliff Robinson

Post Syndicated from Cliff Robinson original https://www.servethehome.com/2025-zettaflops-scale-compute-as-oracle-looks-to-operate-hundreds-of-thousands-of-nvidia-gpus/

Oracle is looking to deploy hundreds of thousands of NVIDIA GPUs as it scales its AI infrastructure to a ZettaFLOPS scale in 2025

The post 2025 ZettaFLOPS Scale Compute as Oracle Looks to Operate Hundreds of Thousands of NVIDIA GPUs appeared first on ServeTheHome.

Три за превод: Документални комикси

2024-09-12 Нева Мичева

Post Syndicated from Нева Мичева original https://www.toest.bg/tri-za-prevod-dokumentalni-komiksi/

Струва ми се, че комиксите са, от една страна, много директни. Възприемането им е силно интуитивно. От друга страна, те са и твърде абстрактни. Трябва да положиш доста повече труд за дешифрирането на някоя лента (strip), отколкото за разбирането на филм или дори за прочитането на книга,

Три за превод: Документални комикси

казва Арт Спигълман, един от класиците в рисувано-словесните истории.

В България навикът за създаване и потребление на „деветото изкуство“ е трудно забележим, нищо че не му липсват база и контекст – достатъчно е да споменем някои родствени изкуства, в които имаме силни традиции: карикатурата, илюстрованите книги, анимацията, театъра (да, комиксът е не по-малко изкуство на точната реплика и яркия конфликт от драмата). В „Кратка история на българския комикс“ Антон Стайков дори прехвърля мост назад във времето чак до житийните сцени от някои икони и до пещерните рисунки в Магурата…

Българският комиксов пазар обаче продължава да е очебийно малък не само на фона на белгийския, френския, американския и японския (гигантите в тази сфера), но и на тези в балканските страни. От старите фестивали на комикса в България се дочуват само далечни отзвуци – нищо че на миналогодишното издание в Лука имаше над 700 изложители, а благодарение на срещите и събитията в Ангулем комиксите се превърнаха във френските книги с най-много продадени права за издаване в чужбина. Оригиналите ни се броят на пръсти, а преводът на „разкази в картинки“ (както са склонни да ги наричат някои от малкото им съвременни български автори) се оживи напоследък, но преобладаващо в популярната зона на американските супергерои и техните „вселени“. При това положение е очевидно защо

у нас специфичната ниша на комиксовата документалистика пустее.

Вече представихме три подборки от по три непревеждани, но препоръчителни за „внос“ заглавия от различни литератури (нидерландска, норвежка и украинска). Сега ни се ще да привлечем вниманието не върху даден език или ареал, а върху възхитителния жанр на комикса, който се стреми да опознае, осъзнае и обсъди сложни части от света ни, като за целта успешно си служи с градивата на публицистиката и на историята, без обаче да се лишава от характерни черти като хумора, хиперболата, олицетворението, заявените пристрастия и речта в „балончетата“.

Макар името и биографията на комикса да са неразривно свързани с комедията, темите на произведенията по-долу са също така трагични, политически, социални. Тревожно актуални, при все че авторите им са от различни страни и поколения и са твърде несходни като техника и темперамент. Общото е, че и тримата участват в творбите си едновременно като повествователи и персонажи и работят издълбоко, с обилен и съвестно проучен контекст, тласкани от състрадание към онеправданите и от неудобство поради изходната си неосведоменост.

Maus, Art Spiegelman

Събраните в общ том две части на „Маус“ – единствения комикс, награден с „Пулицър“ – възлизат на 300 изключително концентрирани страници, за чието създаване са отишли около 13 години. В тях авторът разговаря с баща си Владек Шпигелман, полски евреин, за събитията преди, по време на и след хвърлянето му в лагера на смъртта Аушвиц. Книгата е колкото за спомените на възрастния мъж от една кошмарна епоха, толкова и за сложните отношения баща–син. Два крайно трудни прехода през почти непреодолими препятствия, с които Арт се справя виртуозно.

Три за превод: Документални комикси — © „Маус", корицата на норвежкото издание; © Арт Спигълман в интервю за Studio Q (2015)

Спигълман се ражда в Стокхолм през 1948 г. – по бежанския път на родителите си към САЩ, където живее от тригодишен. Амбицията на баща му Владек и майка му Аня е той да стане зъболекар, но Арт знае от дете, че ще рисува. Семейната му история е показателна не толкова за ХХ век, колкото за определен вид изродена социална атмосфера, която не знае времеви и географски ограничения… През 1943 г., от страх пред неизбежната депортация, леля му отравя себе си и трите дечица, за които се грижи – едно от тях е шестгодишният Ришо, по-големият брат на Арт, който завинаги ще витае в живота му като отсъствие – идеалното мъртво дете. В Холокоста загиват 70 от близките на Владек и Аня. Когато Арт е на 20 години, майка му, също оцеляла от Аушвиц, се самоубива.

Maus от заглавието е немската дума за „мишка“ и всички евреи в комикса са изобразени с тела на хора и глави на мишки – метафора, пасваща според автора на представянето на евреите като „вредители“ в нацистката реторика, но и даваща възможност да се дистанцираме от ада на онези времена, да се приближим по-безстрашно до мисли и гледки, трудни за преглъщане. Немците, разбира се, са с муцуни на котки, поляците са прасета, американците – кучета, а Маусшвиц е черната дупка, която тегли материята на целия разказ към себе си.

Първата поява на „Маус“ е в една рисувана антология от 1972 г. – зачатък от три страници. През 1980 г. Спигълман започва да издава със съпругата си Франсоаз Мули годишното списание за комикси Raw и от втория от общо 11-те му реализирани броя се връща към идеята за „Маус“, който публикува на глави.

От странното си начало до блестящата си покъртителна последна реплика „Маус“ е безумно рисковано и красиво начинание.

„Ако беше блус, сега сигурно щеше да звучи в рекламите за коли – казва Спигълман за огромния периметър, в който се прочува през годините „Маус“. – Навлезе в културата по начини, които не можех и да предположа…“

Комиксът е преведен на всички „големи“ езици, но и на литовски, естонски, украински, грузински, унгарски, фарси – каквото ви хрумне. Съседите ни имат „Маус“ на своите езици от 20 години (сърбите и турците), от 15 (гърците), от 10 (румънците), от 5 (македонците)… А у нас го няма дори на английски в каталога на Националната библиотека. Това е не просто културен пропуск, а неизвинено отсъствие на България от часа по история.

(Малка скоба в същия ред на мисли: срамота е и липсата на български на „Персеполис“ на Марджане Сатрапи, иранката, която преди почти две десетилетия превърна този свой страхотно интересен рисуван мемоар в анимационен филм и спечели с него наградата на журито в Кан, а през 2019 г. режисира „Радиоактивност“ по превъзходния научно-любовен комикс на Лорън Реднис за Мари и Пиер Кюри…)

Palestine, Joe Sacco

Както Арт Спигълман е отъждествяван изключително с „Маус“, така и Джо Сако е първо „Палестина“ и чак после всичко останало. С тази разлика, че и много от всичкото му останало заслужава да се преведе час по-скоро: босненските истории „Горажде, безопасна зона“ (2000) и „Помагачът“ (2003), „Бележки под линия в Газа“ (2009), сборникът с репортажи „Журналистика“ (2011).

Сако (1960) се ражда в Малта, отраства в Австралия, завършва журналистика в Университета в Орегон през 1981-ва и същата година получава първия си отказ от списание, на което е пратил малка комиксова история. Това е именно споменатото по-горе Raw… Десетилетия по-късно един от най-влиятелните вестници в света започна така статията си за извънредния нов тираж на най-известната му книга:

Когато Джо Сако създаде „Палестина“, никой нямаше представа що е комиксовата журналистика, а сега неговата основополагаща творба се сдоби с нетърпеливи нови читатели.

Сако сочи като свои кумири в писането и рисуването Джордж Оруел и Робърт Кръмб. Рисунъкът му е елегантен, а точността, с която заковава важните въпроси – впечатляваща. В типа журналистика, която си е избрал, цени особено възможността миналото да се появява редом с настоящето, а неизбежното стилизиране на образите да направи по-поносимо осмислянето им („…рисунките на Сако често оказват въздействие на снимки, които нито един фотограф не би дръзнал да направи“, с право се отбелязва в една рецензия).

Методът му е следният: отива на места, които го вълнуват; остава там колкото е нужно, за да се ориентира; разговаря с голям брой хора; води си записки, щрихира, снима; прибира се вкъщи и месеци наред оформя събраното.

Най-големият ми страх е, че ще предам недостоверно нечия история,

казва Сако. И още:

Научих, че хората обичат да говорят за себе си. Стига да няма нещо, което много се опитват да скрият, фактът, че някой им задава въпроси, им е приятен.

През зимата на 1991/1992 г. Сако изкарва два месеца в Израел и Окупираните територии (Западния бряг, Източен Йерусалим и ивицата Газа) и през 1993 г. започва да публикува поредица от девет брошури по 32 страници, в които нахвърля срещите и впечатленията си от региона. През годините те се превръщат в деветте глави първо на двутомник, сетне на книга с предговор от Едуард Саид („Повечето възрастни са склонни да свързват комиксите с нещо фриволно или ефимерно и съществува нагласата, че с порастването човек ги изоставя в търсене на по-сериозни занимания […] но комиксите явно казват неща, които иначе не биха могли да се кажат, и не се подчиняват на обичайните мисловни процеси, надзиравани, канализирани и прекроявани чрез всякакъв педагогически и идеологически натиск“).

През 2007-а се появява специално издание, вече на 320 страници, с включени вътре спомени, чернови, фотографии, а най-новата „Палестина“ излезе преди дни и съдържа и послеслов от журналистката Амира Хас.

В „Палестина“ Сако казва:

Аз съм скептик. Като журналист си длъжен да бъдеш Тома Неверни… Трябва да сложиш пръст в раната, а най-добре – цялата си глава.

Всяка негова страница е вихър от лица, жестове, реплики, ненадейни ракурси към характерни интериори и екстериори, внезапни хрумвания, абсурдни сценки, включвания от новините или книгите. Разказът обаче тече съвсем разбираемо и някак се родее с „романа от гласове“ на нобелистката Светлана Алексиевич (както тя нарече творчеството си пред „Тоест“) –

поразителна амалгама от лични разговори с прегазените от историята.

Серия от коментари на Сако за случващото се понастоящем в Газа върви от началото на годината в The Comics Journal – освен за да си съставим мнение за стила и мирогледа му, те могат да ни помогнат и да си набавим още малко нюанси и детайли по една безкрайно наболяла тема.

Kobane Calling, Zerocalcare

Не е изключено да познавате Дзерокалкаре от „Откъсни по пунктира“ (2021) и „Този свят няма да ме направи гадняр“ (2023) – двата му къси анимационни сериала за „Нетфликс“, които е задължително да се гледат със субтитри, за да се чуе как авторът говори като картечница на римски диалект (той озвучава и себе си като главен герой, и част от другите). А ако го познавате от тях, значи сте вече на „ти“ с основните действащи лица в живота и в комиксите му – майката (французойка с външността на квачката Лейди Клъв от Дисниевия „Робин Худ“), приятеля Секо (малословен, луд по сладоледа и покера онлайн), огромния угрижен Броненосец (въплъщение на вътрешния му глас). И сте забелязали колко сърдечен и ангажиран е този човек под смешните си рисунки.

Зад Zerocalcare (букв. „нула варовик“ – интернетски псевдоним, заимстван от натрапчива реклама на препарат против котлен камък) се крие Микеле Рек (1983), който, откак се помни, живее в Ребибия и я превръща в редовен фон на разказите си. Ребибия е краен квартал на Рим с горе-долу три забележителности: последната станция на една от линиите на метрото; едноименния затвор с над 2000 обитатели (апропо там се развива „Цезар трябва да умре“ на братя Тавиани); останките от праисторически мамут.

Стеснителен, тих, с меланхолични сини очи, вечно облечен в черно и способен да дава автографи по 13 часа без почивка, Дзеро е чудна смесица от юношеска възбудимост и дзен безвремие. И той като Спигълман и Сако започва да рисува от ученическите си години, но за разлика от тях се прочува още с блога си и с дебютната си самиздатска „Поличбата на Броненосеца“ (2011), а докъм 35-тата си година вече е продал над милион бройки от тази и следващите си творби. (Стартовият тираж на една от най-новите му книги – посветена на малък язидски град в Иракски Кюрдистан – беше 234 000 екземпляра…)

Предвестникът на „Кобани зове“ се появява под формата на рисуван репортаж в сп. „Интернационале“ на 16 януари 2015 г. и предизвиква такъв интерес, че на следващата седмица от същия брой е пусната допечатка. Не след дълго невротичният младеж от Ребибия, който изпада в екзистенциална криза дори само ако промени обичайния си списък за пазаруване, се емва отново до Рожава – преобладаващо кюрдски регион в Северна Сирия, в който с цената на страшни жертви неотдавна е заработил един почти утопичен обществен договор. В резултат от срещите му с местните хора и от усилието да проумее какво им се случва под всеобщия натиск („варварството на ИДИЛ, репресиите на Турция, агресията на Ирак“) през 2016 г. излиза „Кобани зове“ и моментално поражда публичен отзвук, преводи на дузина езици, две театрални постановки. В предговора към допълненото издание от 2020 г. авторът формулира задачата, която си е поставил, така:

… да разкажа на широка публика без специални познания относно близкоизточните конфликти… за едно късче земя, в което цяла мозайка от народности се организира и бори за общество, стъпило върху равенството на мъже и жени, мирното съжителство, зачитането на културните и религиозните различия, социалната справедливост и екологията.

И действително, тук се разказва от първа ръка и от човешки ръст за едни изключително самотни в битките си наши съвременници, за които рядко се говори в новините и още по-рядко – като за съвкупност от личности.

Повествованието е по-рошаво и по-егоцентрично от тези на Спигълман и Сако, а езикът е от различно поколение и спонтанно прелива в жаргон и позовавания на музикални парчета, касови филми и други попкултурни дадености

(в един момент разказвачът от неудобство се превръща в жабока Кермит, в друг събеседниците му се явяват в облика на парче козе сирене и подлютена маслина и пр.). Същевременно Дзеро е безпощадно саркастичен към собствените си недостатъци и особено към специфичната западняшка инфантилност – естествена смесица от невежество и изнеженост, – с която е склонен да наднича оттатък границите на уредения си живот. И все пак „Кобани зове“ е истински богато преживяване.

Колкото повече начини да се обсъжда светът – толкова повече събеседници. И толкова повече бъдеще.

New whitepaper available: Building security from the ground up with Secure by Design

2024-09-12 Bertram Dorn

Post Syndicated from Bertram Dorn original https://aws.amazon.com/blogs/security/new-whitepaper-available-building-security-from-the-ground-up-with-secure-by-design/

Developing secure products and services is imperative for organizations that are looking to strengthen operational resilience and build customer trust. However, system design often prioritizes performance, functionality, and user experience over security. This approach can lead to vulnerabilities across the supply chain.

As security threats continue to evolve, the concept of Secure by Design (SbD) is gaining importance in the effort to mitigate vulnerabilities early, minimize risks, and recognize security as a core business requirement. We’re excited to share a whitepaper we recently authored with SANS Institute called Building Security from the Ground up with Secure by Design, which addresses SbD strategy and explores the effects of SbD implementations.

The whitepaper contains context and analysis that can help you take a proactive approach to product development that facilitates foundational security. Key considerations include the following:

Integrating SbD into the software development lifecycle (SDLC)
Supporting SbD with automation
Reinforcing defense-in-depth
Applying SbD to artificial intelligence (AI)
Identifying threats in the design phase with threat modeling
Using SbD to simplify compliance with requirements and standards
Planning for the short and long term
Establishing a culture of security

While the journey to a Secure by Design approach is an iterative process that is different for every organization, the whitepaper details five key action items that can help set you on the right path. We encourage you to download the whitepaper and gain insight into how you can build secure products with a multi-layered strategy that meaningfully improves your technical and business outcomes. We look forward to your feedback and to continuing the journey together.

Download Building Security from the Ground up with Secure by Design.

If you have feedback about this post, submit comments in the Comments section below.

Differentiate generative AI applications with your data using AWS analytics and managed databases

2024-09-12 Diego Colombatto

Post Syndicated from Diego Colombatto original https://aws.amazon.com/blogs/big-data/differentiate-generative-ai-applications-with-your-data-using-aws-analytics-and-managed-databases/

While the potential of generative artificial intelligence (AI) is increasingly under evaluation, organizations are at different stages in defining their generative AI vision. In many organizations, the focus is on large language models (LLMs), and foundation models (FMs) more broadly. This is just the tip of the iceberg, because what enables you to obtain differential value from generative AI is your data.

Generative AI applications are still applications, so you need the following:

Operational databases to support the user experience for interaction steps outside of invoking generative AI models
Data lakes to store your domain-specific data, and analytics to explore them and understand how to use them in generative AI
Data integrations and pipelines to manage (sourcing, transforming, enriching, and validating, among others) and render data usable with generative AI
Governance to manage aspects such as data quality, privacy and compliance to applicable privacy laws, and security and access controls

LLMs and other FMs are trained on a generally available collective body of knowledge. If you use them as is, they’re going to provide generic answers with no differential value for your company. However, if you use generative AI with your domain-specific data, it can provide a valuable perspective for your business and enable you to build differentiated generative AI applications and products that will stand out from others. In essence, you have to enrich the generative AI models with your differentiated data.

On the importance of company data for generative AI, McKinsey stated that “If your data isn’t ready for generative AI, your business isn’t ready for generative AI.”

In this post, we present a framework to implement generative AI applications enriched and differentiated with your data. We also share a reusable, modular, and extendible asset to quickly get started with adopting the framework and implementing your generative AI application. This asset is designed to augment catalog search engine capabilities with generative AI, improving the end-user experience.

You can extend the solution in directions such as the business intelligence (BI) domain with customer 360 use cases, and the risk and compliance domain with transaction monitoring and fraud detection use cases.

Solution overview

There are three key data elements (or context elements) you can use to differentiate the generative AI responses:

Behavioral context – How do you want the LLM to behave? Which persona should the FM impersonate? We call this behavioral context. You can provide these instructions to the model through prompt templates.
Situational context – Is the user request part of an ongoing conversation? Do you have any conversation history and states? We call this situational context. Also, who is the user? What do you know about user and their request? This data is derived from your purpose-built data stores and previous interactions.
Semantic context – Is there any meaningfully relevant data that would help the FMs generate the response? We call this semantic context. This is typically obtained from vector stores and searches. For example, if you’re using a search engine to find products in a product catalog, you could store product details, encoded into vectors, into a vector store. This will enable you to run different kinds of searches.

Using these three context elements together is more likely to provide a coherent, accurate answer than relying purely on a generally available FM.

There are different approaches to design this type of solution; one method is to use generative AI with up-to-date, context-specific data by supplementing the in-context learning pattern using Retrieval Augmented Generation (RAG) derived data, as shown in the following figure. A second approach is to use your fine-tuned or custom-built generative AI model with up-to-date, context-specific data.

The framework used in this post enables you to build a solution with or without fine-tuned FMs and using all three context elements, or a subset of these context elements, using the first approach. The following figure illustrates the functional architecture.

Technical architecture

When implementing an architecture like that illustrated in the previous section, there are some key aspects to consider. The primary aspect is that, when the application receives the user input, it should process it and provide a response to the user as quickly as possible, with minimal response latency. This part of the application should also use data stores that can handle the throughput in terms of concurrent end-users and their activity. This means predominantly using transactional and operational databases.

Depending on the goals of your use case, you might store prompt templates separately in Amazon Simple Storage Service (Amazon S3) or in a database, if you want to apply different prompts for different usage conditions. Alternatively, you might treat them as code and use source code control to manage their evolution over time.

NoSQL databases like Amazon DynamoDB, Amazon DocumentDB (with MongoDB compatibility), and Amazon MemoryDB can provide low read latencies and are well suited to handle your conversation state and history (situational context). The document and key value data models allow you the flexibility to adjust the schema of the conversation state over time.

User profiles or other user information (situational context) can come from a variety of database sources. You can store that data in relational databases like Amazon Aurora, NoSQL databases, or graph databases like Amazon Neptune.

The semantic context originates from vector data stores or machine learning (ML) search services. Amazon Aurora PostgreSQL-Compatible Edition with pgvector and Amazon OpenSearch Service are great options if you want to interact with vectors directly. Amazon Kendra, our ML-based search engine, is a great fit if you want the benefits of semantic search without explicitly maintaining vectors yourself or tuning the similarity algorithms to be used.

Amazon Bedrock is a fully managed service that makes high-performing FMs from leading AI startups and Amazon available through a unified API. You can choose from a wide range of FMs to find the model that is best suited for your use case. Amazon Bedrock also offers a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Amazon Bedrock provides integrations with both Aurora and OpenSearch Service, so you don’t have to explicitly query the vector data store yourself.

The following figure summarizes the AWS services available to support the solution framework described so far.

Catalog search use case

We present a use case showing how to augment the search capabilities of an existing search engine for product catalogs, such as ecommerce portals, using generative AI and customer data.

Each customer will have their own requirements, so we adopt the framework presented in the previous sections and show an implementation of the framework for the catalog search use case. You can use this framework for both catalog search use cases and as a foundation to be extended based on your requirements.

One additional benefit about this catalog search implementation is that it’s pluggable to existing ecommerce portals, search engines, and recommender systems, so you don’t have to redesign or rebuild your processes and tools; this solution will augment what you currently have with limited changes required.

The solution architecture and workflow is shown in the following figure.

The workflow consists of the following steps:

The end-user browses the product catalog and submits a search, in natual language, using the web interface of the frontend catalog application (not shown). The catalog frontend application sends the user search to the generative AI application. Application logic is currently implemented as a container, but it can be deployed with AWS Lambda as required.
The generative AI application connects to Amazon Bedrock to convert the user search into embeddings.
The application connects with OpenSearch Service to search and retrieve relevant search results (using an OpenSearch index containing products). The application also connects to another OpenSearch index to get user reviews for products listed in the search results. In terms of searches, different options are possible, such as k-NN, hybrid search, or sparse neural search. For this post, we use k-NN search. At this stage, before creating the final prompt for the LLM, the application can perform an additional step to retrieve situational context from operational databases, such as customer profiles, user preferences, and other personalization information.
The application gets prompt templates from an S3 data lake and creates the engineered prompt.
The application sends the prompt to Amazon Bedrock and retrieves the LLM output.
The user interaction is stored in a data lake for downstream usage and BI analysis.
The Amazon Bedrock output retrieved in Step 5 is sent to the catalog application frontend, which shows results on the web UI to the end-user.
DynamoDB stores the product list used to display products in the ecommerce product catalog. DynamoDB zero-ETL integration with OpenSearch Service is used to replicate product keys into OpenSearch.

Security considerations

Security and compliance are key concerns for any business. When adopting the solution described in this post, you should always factor in the Security Pillar best practices from the AWS Well-Architecture Framework.

There are different security categories to consider and different AWS Security services you can use in each security category. The following are some examples relevant for the architecture shown in this post:

Data protection – You can use AWS Key Management Service (AWS KMS) to manage keys and encrypt data based on the data classification policies defined. You can also use AWS Secrets Manager to manage, retrieve, and rotate database credentials, API keys, and other secrets throughout their lifecycles.
Identity and access management – You can use AWS Identity and Access Management (IAM) to specify who or what can access services and resources in AWS, centrally manage fine-grained permissions, and analyze access to refine permissions across AWS.
Detection and response – You can use AWS CloudTrail to track and provide detailed audit trails of user and system actions to support audits and demonstrate compliance. Additionally, you can use Amazon CloudWatch to observe and monitor resources and applications.
Network security – You can use AWS Firewall Manager to centrally configure and manage firewall rules across your accounts and AWS network security services, such as AWS WAF, AWS Network Firewall, and others.

Conclusion

In this post, we discussed the importance of using customer data to differentiate generative AI usage in applications. We presented a reference framework (including a functional architecture and a technical architecture) to implement a generative AI application using customer data and an in-context learning pattern with RAG-provided data. We then presented an example of how to apply this framework to design a generative AI application using customer data to augment search capabilities and personalize the search results of an ecommerce product catalog.

Contact AWS to get more information on how to implement this framework for your use case. We’re also happy to share the technical asset presented in this post to help you get started building generative AI applications with your data for your specific use case.

About the Authors

Diego Colombatto is a Senior Partner Solutions Architect at AWS. He brings more than 15 years of experience in designing and delivering Digital Transformation projects for enterprises. At AWS, Diego works with partners and customers advising how to leverage AWS technologies to translate business needs into solutions.

Angel Conde Manjon is a Sr. EMEA Data & AI PSA, based in Madrid. He has previously worked on research related to Data Analytics and Artificial Intelligence in diverse European research projects. In his current role, Angel helps partners develop businesses centered on Data and AI.

Tiziano Curci is a Manager, EMEA Data & AI PDS at AWS. He leads a team that works with AWS Partners (G/SI and ISV), to leverage the most comprehensive set of capabilities spanning databases, analytics and machine learning, to help customers unlock the through power of data through an end-to-end data strategy.

How ZS built a clinical knowledge repository for semantic search using Amazon OpenSearch Service and Amazon Neptune

2024-09-12 Abhishek Pan

Post Syndicated from Abhishek Pan original https://aws.amazon.com/blogs/big-data/how-zs-built-a-clinical-knowledge-repository-for-semantic-search-using-amazon-opensearch-service-and-amazon-neptune/

In this blog post, we will highlight how ZS Associates used multiple AWS services to build a highly scalable, highly performant, clinical document search platform. This platform is an advanced information retrieval system engineered to assist healthcare professionals and researchers in navigating vast repositories of medical documents, medical literature, research articles, clinical guidelines, protocol documents, activity logs, and more. The goal of this search platform is to locate specific information efficiently and accurately to support clinical decision-making, research, and other healthcare-related activities by combining queries across all the different types of clinical documentation.

ZS is a management consulting and technology firm focused on transforming global healthcare. We use leading-edge analytics, data, and science to help clients make intelligent decisions. We serve clients in a wide range of industries, including pharmaceuticals, healthcare, technology, financial services, and consumer goods. We developed and host several applications for our customers on Amazon Web Services (AWS). ZS is also an AWS Advanced Consulting Partner as well as an Amazon Redshift Service Delivery Partner. As it relates to the use case in the post, ZS is a global leader in integrated evidence and strategy planning (IESP), a set of services that help pharmaceutical companies to deliver a complete and differentiated evidence package for new medicines.

ZS uses several AWS service offerings across the variety of their products, client solutions, and services. AWS services such as Amazon Neptune and Amazon OpenSearch Service form part of their data and analytics pipelines, and AWS Batch is used for long-running data and machine learning (ML) processing tasks.

Clinical data is highly connected in nature, so ZS used Neptune, a fully managed, high performance graph database service built for the cloud, as the database to capture the ontologies and taxonomies associated with the data that formed the supporting a knowledge graph. For our search requirements, We have used OpenSearch Service, an open source, distributed search and analytics suite.

About the clinical document search platform

Clinical documents comprise of a wide variety of digital records including:

Study protocols
Evidence gaps
Clinical activities
Publications

Within global biopharmaceutical companies, there are several key personas who are responsible to generate evidence for new medicines. This evidence supports decisions by payers, health technology assessments (HTAs), physicians, and patients when making treatment decisions. Evidence generation is rife with knowledge management challenges. Over the life of a pharmaceutical asset, hundreds of studies and analyses are completed, and it becomes challenging to maintain a good record of all the evidence to address incoming questions from external healthcare stakeholders such as payers, providers, physicians, and patients. Furthermore, almost none of the information associated with evidence generation activities (such as health economics and outcomes research (HEOR), real-world evidence (RWE), collaboration studies, and investigator sponsored research (ISR)) exists as structured data; instead, the richness of the evidence activities exists in protocol documents (study design) and study reports (outcomes). Therein lies the irony—teams who are in the business of knowledge generation struggle with knowledge management.

ZS unlocked new value from unstructured data for evidence generation leads by applying large language models (LLMs) and generative artificial intelligence (AI) to power advanced semantic search on evidence protocols. Now, evidence generation leads (medical affairs, HEOR, and RWE) can have a natural-language, conversational exchange and return a list of evidence activities with high relevance considering both structured data and the details of the studies from unstructured sources.

Overview of solution

The solution was designed in layers. The document processing layer supports document ingestion and orchestration. The semantic search platform (application) layer supports backend search and the user interface. Multiple different types of data sources, including media, documents, and external taxonomies, were identified as relevant for capture and processing within the semantic search platform.

Document processing solution framework layer

All components and sub-layers are orchestrated using Amazon Managed Workflows for Apache Airflow. The pipeline in Airflow is scaled automatically based on the workload using Batch. We can broadly divide layers here as shown in the following figure:

Document Processing Solution Framework Layers

Data crawling:

In the data crawling layer, documents are retrieved from a specified source SharePoint location and deposited into a designated Amazon Simple Storage Service (Amazon S3) bucket. These documents could be in variety of formats, such as PDF, Microsoft Word, and Excel, and are processed using format-specific adapters.

Data ingestion:

The data ingestion layer is the first step of the proposed framework. At this later, data from a variety of sources smoothly enters the system’s advanced processing setup. In the pipeline, the data ingestion process takes shape through a thoughtfully structured sequence of steps.
These steps include creating a unique run ID each time a pipeline is run, managing natural language processing (NLP) model versions in the versioning table, identifying document formats, and ensuring the health of NLP model services with a service health check.
The process then proceeds with the transfer of data from the input layer to the landing layer, creation of dynamic batches, and continuous tracking of document processing status throughout the run. In case of any issues, a failsafe mechanism halts the process, enabling a smooth transition to the NLP phase of the framework.

Database ingestion:

The reporting layer processes the JSON data from the feature extraction layer and converts it into CSV files. Each CSV file contains specific information extracted from dedicated sections of documents. Subsequently, the pipeline generates a triple file using the data from these CSV files, where each set of entities signifies relationships in a subject-predicate-object format. This triple file is intended for ingestion into Neptune and OpenSearch Service. In the full document embedding module, the document content is segmented into chunks, which are then transformed into embeddings using LLMs such as llama-2 and BGE. These embeddings, along with metadata such as the document ID and page number, are stored in OpenSearch Service. We use various chunking strategies to enhance text comprehension. Semantic chunking divides text into sentences, grouping them into sets, and merges similar ones based on embeddings.

Agentic chunking uses LLMs to determine context-driven chunk sizes, focusing on proposition-based division and simplifying complex sentences. Additionally, context and document aware chunking adapts chunking logic to the nature of the content for more effective processing.

NLP:

The NLP layer serves as a crucial component in extracting specific sections or entities from documents. The feature extraction stage proceeds with localization, where sections are identified within the document to narrow down the search space for further tasks like entity extraction. LLMs are used to summarize the text extracted from document sections, enhancing the efficiency of this process. Following localization, the feature extraction step involves extracting features from the identified sections using various procedures. These procedures, prioritized based on their relevance, use models like Llama-2-7b, mistral-7b, Flan-t5-xl, and Flan-T5-xxl to extract important features and entities from the document text.

The auto-mapping phase ensures consistency by mapping extracted features to standard terms present in the ontology. This is achieved through matching the embeddings of extracted features with those stored in the OpenSearch Service index. Finally, in the Document Layout Cohesion step, the output from the auto-mapping phase is adjusted to aggregate entities at the document level, providing a cohesive representation of the document’s content.

Semantic search platform application layer

This layer, shown in the following figure, uses Neptune as the graph database and OpenSearch Service as the vector engine.

Semantic search platform application layer

Amazon OpenSearch Service:

OpenSearch Service served the dual purpose of facilitating full-text search and embedding-based semantic search. The OpenSearch Service vector engine capability helped to drive Retrieval-Augmented Generation (RAG) workflows using LLMs. This helped to provide a summarized output for search after the retrieval of a relevant document for the input query. The method used for indexing embeddings was FAISS.

OpenSearch Service domain details:

Version of OpenSearch Service: 2.9
Number of nodes: 1
Instance type: r6g.2xlarge.search
Volume size: Gp3: 500gb
Number of Availability Zones: 1
Dedicated master node: Enabled
Number of Availability Zones: 3
No of master Nodes: 3
Instance type(Master Node) : r6g.large.search

To determine the nearest neighbor, we employ the Hierarchical Navigable Small World (HNSW) algorithm. We used the FAISS approximate k-NN library for indexing and searching and the Euclidean distance (L2 norm) for distance calculation between two vectors.

Amazon Neptune:

Neptune enables full-text search (FTS) through the integration with OpenSearch Service. A native streaming service for enabling FTS provided by AWS was established to replicate data from Neptune to OpenSearch Service. Based on the business use case for search, a graph model was defined. Considering the graph model, subject matter experts from the ZS domain team curated custom taxonomy capturing hierarchical flow of classes and sub-classes pertaining to clinical data. Open source taxonomies and ontologies were also identified, which would be part of the knowledge graph. Sections and entities were identified to be extracted from clinical documents. An unstructured document processing pipeline developed by ZS processed the documents in parallel and populated triples in RDF format from documents for Neptune ingestion.

The triples are created in such a way that semantically similar concepts are linked—hence creating a semantic layer for search. After the triples files are created, they’re stored in an S3 bucket. Using the Neptune Bulk Loader, we were able to load millions of triples to the graph.

Neptune ingests both structured and unstructured data, simplifying the process to retrieve content across different sources and formats. At this point, we were able to discover previously unknown relationships between the structured and unstructured data, which was then made available to the search platform. We used SPARQL query federation to return results from the enriched knowledge graph in the Neptune graph database and integrated with OpenSearch Service.

Neptune was able to automatically scale storage and compute resources to accommodate growing datasets and concurrent API calls. Presently, the application sustains approximately 3,000 daily active users. Concurrently, there is an observation of approximately 30–50 users initiating queries simultaneously within the application environment. The Neptune graph accommodates a substantial repository of approximately 4.87 million triples. The triples count is increasing because of our daily and weekly ingestion pipeline routines.

Neptune configuration:

Instance Class: db.r5d.4xlarge
Engine version: 1.2.0.1

LLMs:

Large language models (LLMs) like Llama-2, Mistral and Zephyr are used for extraction of sections and entities. Models like Flan-t5 were also used for extraction of other similar entities used in the procedures. These selected segments and entities are crucial for domain-specific searches and therefore receive higher priority in the learning-to-rank algorithm used for search.

Additionally, LLMs are used to generate a comprehensive summary of the top search results.

The LLMs are hosted on Amazon Elastic Kubernetes Service (Amazon EKS) with GPU-enabled node groups to ensure rapid inference processing. We’re using different models for different use cases. For example, to generate embeddings we deployed a BGE base model, while Mistral, Llama2, Zephyr, and others are used to extract specific medical entities, perform part extraction, and summarize search results. By using different LLMs for distinct tasks, we aim to enhance accuracy within narrow domains, thereby improving the overall relevance of the system.

Fine tuning :

Already fine-tuned models on pharma-specific documents were used. The models used were:

PharMolix/BioMedGPT-LM-7B (finetuned LLAMA-2 on medical)
emilyalsentzer/Bio_ClinicalBERT
stanford-crfm/BioMedLM
microsoft/biogpt

Re ranker, sorter, and filter stage:

Remove any stop words and special characters from the user input query to ensure a clean query. Upon pre-processing the query, create combinations of search terms by forming combinations of terms with varying n-grams. This step enriches the search scope and improves the chances of finding relevant results. For instance, if the input query is “machine learning algorithms,” generating n-grams could result in terms like “machine learning,” “learning algorithms,” and “machine learning algorithms”. Run the search terms simultaneously using the search API to access both Neptune graph and OpenSearch Service indexes. This hybrid approach broadens the search coverage, tapping into the strengths of both data sources. Specific weight is assigned to each result obtained from the data sources based on the domain’s specifications. This weight reflects the relevance and significance of the result within the context of the search query and the underlying domain. For example, a result from Neptune graph might be weighted higher if the query pertains to graph-related concepts, i.e. the search term is related directly to the subject or object of a triple, whereas a result from OpenSearch Service might be given more weightage if it aligns closely with text-based information. Documents that appear in both Neptune graph and OpenSearch Service receive the highest priority, because they likely offer comprehensive insights. Next in priority are documents exclusively sourced from the Neptune graph, followed by those solely from OpenSearch Service. This hierarchical arrangement ensures that the most relevant and comprehensive results are presented first. After factoring in these considerations, a final score is calculated for each result. Sorting the results based on their final scores ensures that the most relevant information is presented in the top n results.

Final UI

An evidence catalogue is aggregated from disparate systems. It provides a comprehensive repository of completed, ongoing and planned evidence generation activities. As evidence leads make forward-looking plans, the existing internal base of evidence is made readily available to inform decision-making.

The following video is a demonstration of an evidence catalog:

Customer impact

When completed, the solution provided the following customer benefits:

The search on multiple data source (structured and unstructured documents) enables visibility of complex hidden relationships and insights.
Clinical documents often contain a mix of structured and unstructured data. Neptune can store structured information in a graph format, while the vector database can handle unstructured data using embeddings. This integration provides a comprehensive approach to querying and analyzing diverse clinical information.
By building a knowledge graph using Neptune, you can enrich the clinical data with additional contextual information. This can include relationships between diseases, treatments, medications, and patient records, providing a more holistic view of healthcare data.
The search application helped in staying informed about the latest research, clinical developments, and competitive landscape.
This has enabled customers to make timely decisions, identify market trends, and help positioning of products based on a comprehensive understanding of the industry.
The application helped in monitoring adverse events, tracking safety signals, and ensuring that drug-related information is easily accessible and understandable, thereby supporting pharmacovigilance efforts.
The search application is currently running in production with 3000 active users.

Customer success criteria

The following success criteria were use to evaluate the solution:

Quick, high accuracy search results: The top three search results were 99% accurate with an overall latency of less than 3 seconds for users.
Identified, extracted portions of the protocol: The sections identified has a precision of 0.98 and recall of 0.87.
Accurate and relevant search results based on simple human language that answer the user’s question.
Clear UI and transparency on which portions of the aligned documents (protocol, clinical study reports, and publications) matched the text extraction.
Knowing what evidence is completed or in-process reduces redundancy in newly proposed evidence activities.

Challenges faced and learnings

We faced two main challenges in developing and deploying this solution.

Large data volume

The unstructured documents were required to be embedded completely and OpenSearch Service helped us achieve this with the right configuration. This involved deploying OpenSearch Service with master nodes and allocating sufficient storage capacity for embedding and storing unstructured document embeddings entirely. We stored up to 100 GB of embeddings in OpenSearch Service.

Inference time reduction

In the search application, it was vital that the search results were retrieved with lowest possible latency. With the hybrid graph and embedding search, this was challenging.

We addressed high latency issues by using an interconnected framework of graphs and embeddings. Each search method complemented the other, leading to optimal results. Our streamlined search approach ensures efficient queries of both the graph and the embeddings, eliminating any inefficiencies. The graph model was designed to minimize the number of hops required to navigate from one entity to another, and we improved its performance by avoiding the storage of bulky metadata. Any metadata too large for the graph was stored in OpenSearch, which served as our metadata store for graph and vector store for embeddings. Embeddings were generated using context-aware chunking of content to reduce the total embedding count and retrieval time, resulting in efficient querying with minimal inference time.

The Horizontal Pod Autoscaler (HPA) provided by Amazon EKS, intelligently adjusts pod resources based on user-demand or query loads, optimizing resource utilization and maintaining application performance during peak usage periods.

Conclusion

In this post, we described how to build an advanced information retrieval system designed to assist healthcare professionals and researchers in navigating through a diverse range of medical documents, including study protocols, evidence gaps, clinical activities, and publications. By using Amazon OpenSearch Service as a distributed search and vector database and Amazon Neptune as a knowledge graph, ZS was able to remove the undifferentiated heavy lifting associated with building and maintaining such a complex platform.

If you’re facing similar challenges in managing and searching through vast repositories of medical data, consider exploring the powerful capabilities of OpenSearch Service and Neptune. These services can help you unlock new insights and enhance your organization’s knowledge management capabilities.

About the authors

Abhishek Pan is a Sr. Specialist SA-Data working with AWS India Public sector customers. He engages with customers to define data-driven strategy, provide deep dive sessions on analytics use cases, and design scalable and performant analytical applications. He has 12 years of experience and is passionate about databases, analytics, and AI/ML. He is an avid traveler and tries to capture the world through his lens.

Gourang Harhare is a Senior Solutions Architect at AWS based in Pune, India. With a robust background in large-scale design and implementation of enterprise systems, application modernization, and cloud native architectures, he specializes in AI/ML, serverless, and container technologies. He enjoys solving complex problems and helping customer be successful on AWS. In his free time, he likes to play table tennis, enjoy trekking, or read books

Kevin Phillips is a Neptune Specialist Solutions Architect working in the UK. He has 20 years of development and solutions architectural experience, which he uses to help support and guide customers. He has been enthusiastic about evangelizing graph databases since joining the Amazon Neptune team, and is happy to talk graph with anyone who will listen.

Sandeep Varma is a principal in ZS’s Pune, India, office with over 25 years of technology consulting experience, which includes architecting and delivering innovative solutions for complex business problems leveraging AI and technology. Sandeep has been critical in driving various large-scale programs at ZS Associates. He was the founding member the Big Data Analytics Centre of Excellence in ZS and currently leads the Enterprise Service Center of Excellence. Sandeep is a thought leader and has served as chief architect of multiple large-scale enterprise big data platforms. He specializes in rapidly building high-performance teams focused on cutting-edge technologies and high-quality delivery.

Alex Turok has over 16 years of consulting experience focused on global and US biopharmaceutical companies. Alex’s expertise is in solving ambiguous, unstructured problems for commercial and medical leadership. For his clients, he seeks to drive lasting organizational change by defining the problem, identifying the strategic options, informing a decision, and outlining the transformation journey. He has worked extensively in portfolio and brand strategy, pipeline and launch strategy, integrated evidence strategy and planning, organizational design, and customer capabilities. Since joining ZS, Alex has worked across marketing, sales, medical, access, and patient services and has touched over twenty therapeutic categories, with depth in oncology, hematology, immunology and specialty therapeutics.

Страшните мигранти или добрата стара дезинформация?

Добре познати теми

Здравеопазването: идея за план

Външна политика

Снимката

Корабът

Последствията

Кога, ако не в осиротяването на света, имаме нужда от общност, от действителния смисъл на „аз сме“?

Solution overview

Cross-Region data sharing with Redshift Serverless

Cross-Region data sharing with a Redshift provisioned cluster

Considerations when using data sharing in Amazon Redshift

Prerequisites

Configure cross-Region data sharing

Cross-Region data sharing with Redshift Serverless

Cross-Region data sharing with Redshift provisioned cluster

Conclusion

About the Authors

Solution overview

Prerequisite

Set up resources with AWS CloudFormation

Enable snapshot retention

Enable orphan file deletion

Validate the solution

Monitor CloudWatch metrics for Amazon S3

Clean up

Conclusion

About the Authors

у нас специфичната ниша на комиксовата документалистика пустее.

Maus, Art Spiegelman

От странното си начало до блестящата си покъртителна последна реплика „Маус“ е безумно рисковано и красиво начинание.

Palestine, Joe Sacco

поразителна амалгама от лични разговори с прегазените от историята.

Kobane Calling, Zerocalcare

Solution overview

Technical architecture

Catalog search use case

Security considerations

Conclusion

About the Authors

About the clinical document search platform

Overview of solution

Document processing solution framework layer

Semantic search platform application layer

Customer impact

Customer success criteria

Challenges faced and learnings

Large data volume

Inference time reduction

Conclusion

About the authors

The collective thoughts of the interwebz