Genomics workflows, Part 7: analyze public RNA sequencing data using AWS HealthOmics

Post Syndicated from Rostislav Markov original https://aws.amazon.com/blogs/architecture/genomics-workflows-part-7-analyze-public-rna-sequencing-data-using-aws-healthomics/

Genomics workflows process petabyte-scale datasets on large pools of compute resources. In this blog post, we discuss how life science organizations can use Amazon Web Services (AWS) to run transcriptomic sequencing data analysis using public datasets. This allows users to quickly test research hypotheses against larger datasets in support of clinical diagnostics. We use AWS HealthOmics and AWS Step Functions to orchestrate the entire lifecycle of preparing and analyzing sequence data and remove the associated heavy lifting.

Use case

In genomics, transcription relates to the process of making a ribonucleic acid (RNA) copy from a gene’s deoxyribonucleic acid (DNA). Usually, RNA is single-stranded, although some RNA viruses are double-stranded. With RNA sequencing (RNA-Seq), scientists isolate the RNA, prepare an RNA library, and use next-generation sequencing technology to decode it. Organizations around the world use RNA-Seq to support clinical diagnostics.

In our use case, life science research teams use workflows written in Nextflow to process RNA-Seq datasets in FASTQ file format. Following their initial RNA-Seq studies on internal datasets, scientists can extend their insights by using public datasets. For example, the Gene Expression Omnibus (GEO) functional genomics data repository is hosted by the National Center for Biotechnology Information (NCBI) and offers multiple download options and formats. Scientists can download datasets in FASTQ format from GEO File Transfer Protocol (FTP) and compress them into the .gz format before further analysis.

Scaling and automating the data ingestion can be challenging. For example, scientists might need to do the following:

  • Manually download FASTQ files and invoke their analysis pipelines
  • Monitor the workflow runs, which can span hours, days, or weeks
  • Manage the infrastructure for performance and scale

This blog post presents a solution that removes this undifferentiated heavy lifting.

Prerequisites

To build this solution, you must be analyzing transcriptomic sequencing data with the Nextflow workflow system and make use of GEO FASTQ datasets. In addition, you must do the following:

  1. Create three Amazon Simple Storage Service (Amazon S3) buckets with the following purposes:
    • Uploaded GEO Accession IDs (GEO IDs)
    • Ingested FASTQ datasets
    • RNA-Seq output files
  2. Create one Amazon DynamoDB table to track the status of data ingestion. This helps with checkpointing and avoids repetitive ingestion jobs so that you can keep data ingestion cost to a minimum.

Solution overview

Using AWS, you can automate the entire RNA-Seq Nextflow pipeline. Users only need to provide the GEO IDs, then the pipeline ingests the corresponding FASTQ sample files and performs the subsequent data analysis.

Our solution, shown in Figure 1, uses a combination of AWS HealthOmics and AWS Step Functions. HealthOmics manages the compute, scalability, scheduling, and orchestration required for processing large RNA-Seq datasets. This helps scientists focus on writing their pipelines in Nextflow while AWS takes care of the underlying infrastructure. Step Functions adds reliability to the workflow from dataset ingestion to output archival. Automating the entire workflow also helps with tracing specific invocations and troubleshooting errors.

This figure visualizes the AWS services involved in each processing step, starting with users uploading CSV files with GEO metadata to Amazon S3, and concluding with AWS HealthOmics performing the RNA-Seq analysis and putting the output data on Amazon S3.

Figure 1. RNA sequencing using HealthOmics

Our solution includes the following:

  1. The scientist creates and uploads a CSV file to the GEO metadata S3 bucket. The CSV file includes a reference to the specific GEO ID that is ingested. An Amazon S3 Event Notification configured on s3:ObjectCreated events (in this case, the CSV file upload) invokes an AWS Lambda function.
  2. The Lambda function first extracts the corresponding Sequence Read Run (SRR) IDs of the GEO ID. Next, it starts a Step Functions state machine with the following input parameters: the SRR IDs, species of the samples, and GEO ID. The state machine uses an AWS Batch job queue for parallel ingestion.
  3. The Lambda function writes the following metadata to a DynamoDB table for future reference:
    • Ingested GEO ID and corresponding list of SRR IDs
    • Amazon S3 output paths to the ingested FASTQ files
    • Overall workflow status
    • Ingested species
  4. Upon ingestion completion, the state machine puts the RNA-Seq sample sheet into the FASTQ S3 bucket. This invokes a Lambda function, which launches the RNA-Seq analysis workflow with the following input parameters:
    • Sample sheet
    • GEO ID
    • Other relevant metadata
  5. Our RNA-Seq data analysis is run with HealthOmics and the associated sequence store. We use Step Functions to launch this workflow and ingest the relevant files to the sequence store.
  6. Upon workflow completion, HealthOmics writes the output data (BAM files) to the output S3 bucket.

Implementation considerations

Dataset preparation

The Step Functions state machine orchestrates the ingestion of FASTQ files through the following steps:

  1. The state machine invokes the Map state in Step Functions that uses dynamic parallelism for increased scale, with the SRR IDs array as input. You can now launch multiple AWS Batch jobs in parallel to ingest the FASTQ files that correspond to the SRR ID input.
  2. The state machine checks our ingestion DynamoDB table to see if the corresponding SRR ID has already been processed and has ingested the corresponding FASTQ files. If the SRR ID ingested the files, the state machine writes the sample sheet to the FASTQ S3 bucket and terminates successfully.
  3. The state machine uses the NCBI-provided sra-tools Docker container and fasterq-dump command to ingest the FASTQ files. The state machine generates the set of ingestion commands and starts the AWS Batch job. The ingestion commands are a set of shell commands that interact with NCBI for downloading FASTQ files. These commands compress the files with pigz, and then uploads them to an S3 bucket.
  4. The state machine updates the DynamoDB table with the ingestion status.
    1. If the ingestion is successful, then the state machine continues to step 5.
    2. If the ingestion isn’t successful, the state machine writes a message to Amazon Simple Notification Service (Amazon SNS) to notify scientists of the failure.
  5. A Lambda function generates the RNA-Seq sample sheet with the combined samples to analyze. This sample sheet is a CSV file containing:
    1. The paths to the ingested FASTQ files.
    2. The names of each corresponding SRR ID as input to the RNA-Seq workflow.
  6. The state machine notifies that the ingestion job is complete by publishing a message to an Amazon SNS topic before terminating itself.

Figure 2 provides a detailed overview of the state machine.

This Map state definition in AWS Step Functions visualizes the aforementioned steps for FASTQ file ingestion including orchestration of the associated AWS Batch job.

Figure 2. RNA sequencing data ingestion

Dataset analysis

A Lambda function divides the RNA-Seq sample sheet in compliance with the Step Functions service quota. This enables parallel processing using a Map state.

Our transcriptomic analysis workflow does the following:

  1. Checks if samples are single-end (one FASTQ file per sample) or paired-end (two sets of FASTQ files per sample).
  2. Ingests the appropriate set of FASTQ files into the HealthOmics sequence store.
  3. Monitors the status until all files are imported.

In parallel, a Lambda function initiates the HealthOmics RNA-Seq workflow.

Upon successful completion, HealthOmics stores the output data in Amazon S3. Finally, our state machine imports the output BAM files into the HealthOmics sequence store for future use.

Figure 3 provides a detailed overview of our state machine.

This AWS Step Functions workflow visualizes the aforementioned steps for data analysis including orchestration of the associated AWS HealthOmics workflow and FASTQ file ingestion into the HealthOmics Sequence Store.

Figure 3. RNA sequencing workflow

Cleanup (optional)

Delete all AWS resources that you no longer want to maintain.

Conclusion

HealthOmics removes the heavy lifting associated with gaining insights from genomics, transcriptomics, and other omics data. We used RNA-Seq analysis to showcase an example scientific workflow that can benefit from HealthOmics. When using HealthOmics in combination with Step Functions, scientists can automate the entire workflow from initial dataset preparation to archival. To learn more, we encourage you to explore our HealthOmics tutorials on GitHub.

Related information

По буквите: Остър, Грийн, Бойков

Post Syndicated from Зорница Христова original https://www.toest.bg/po-bukvite-auster-greene-boykov/

„Сънсет парк“ от Пол Остър

По буквите: Остър, Грийн, Бойков

превод от английски Иглика Василева, София: изд. „Колибри“, 2024

Остъровият „Сънсет парк“ започва с описание на къщи, изоставени от собствениците си и предназначени за събаряне. Началото хваща веднага – и как не, като темата е очевидно важна и централна за Остър, автобиографична, както става ясно от „Изобретяване на самотата“, в която писателят вижда точно такъв вход към живота на баща си през внезапно опустялата му, нуждаеща се от опразване къща. Опиши ми какво притежаваш, и ще ти кажа кой си.

По буквите: Остър, Грийн, Бойков

Изоставената къща е и образ, в който свободно се нанасят значения. Първият кандидат е самият живот – всичко, дето се трупа и събира, докато живеем, биографични факти и социални условности. Точно това, което героите на Остър обожават да изоставят, за да преживеят нашата фантазия какво би се случило, ако можехме да се изхлузим от собствената си роля и да живеем някак иначе. Всъщност

типичният Остъров герой е именно такъв – образован и многообещаващ, но избрал да скита извън собствения си път.

Избрал или принуден – в зависимост от книгата, но почти задължително живеещ поне два живота.

Майлс от „Сънсет парк“ не прави изключение. Той напуска внезапно живота си, средата си, семейството си, изчезва… и се преражда след низ от низови работни места, като част от екип за почистване на изоставени къщи. Докато обстоятелствата не му налагат сам да се нанесе в такава.

Но защо? Според Остър от „Сънсет парк“

има събития, които те правят непригоден да бъдеш себе си.

За Майлс това е травмата от смърт, за която той не е сигурен дали е виновен; за героите от „Най-хубавите години от нашия живот“ (филм, към който има повече от обстойна алюзия) това е войната. Една от героините на книгата – Алис, пише дисертация по темата – и обстойно подчертава как тази непригодност е потвърдена и от реалния опит на нейното и други семейства. Фактът на войната – опитът на военното време – прави тези мъже неприспособими към реалния живот. Нещо в тяхната постройка е рухнало.

„21 разказа“ от Греъм Грийн

превод от английски Иглика Василева, София: изд. „Кръг“, 2024

Неочакван паралел: първият от „21 разказа“ на Греъм Грийн е именно за това как непосредствено след войната кварталните вагабонти разрушават една пощадена по чудо къща, строена от Кристофър Рен (създателя на катедралата „Сейнт Пол“) – унищожават я буквално докато стопанинът ѝ стои заключен в градинската тоалетна, унищожават я, защото е красива. И това по някакъв начин е непоносимо.

Потресаващ, чудовищен разказ, който нищо не обяснява, никъде не спекулира, не казва дали гамените са заразени с някаква военновременна екзалтация на разрушението, или са травмирани, или пък обратно, именно войната е логичен завършек на вродения човешки инстинкт да унищожава. Греъм Грийн просто разказва какво правят, и ти се присвиваш, защото знаеш, че е точно така.

По буквите: Остър, Грийн, Бойков

Вторият вариант е подвид на първия. Къщата е тялото, а напусналият е човек – душата; в този смисъл споменатото по-горе прераждане следва да се разбира почти буквално. Опразненото от живота тяло се появява съвсем ясно в „Сънсет парк“ – най-вече в историята за женкаря, тръгнал с яхта и три млади „асистентки“, но предал неочаквано Богу дух, а те се носили из морето дни наред, тъй като не можели да управляват яхтата. Заедно с тях се носело и неговото тяло – ужасна история, знам, но повод за Остър да преподчертае какво има предвид, когато говори за телесност. Заедно с изброяването на мъртъвците, срещнати по пътя на един от героите.

Ако приемем тази линия за възможна, то

скиталчествата на Остъровите персонажи могат да бъдат описани като скиталчества на душата.

Впрочем пак неочакван паралел с Греъм Грийн – в един от разказите жертва на убийство влиза в треторазредно кино и коментира пред раздразнения си съсед по стол, че убийството на екрана е съвсем неубедително. Или онзи, в който един странно пресипнал лектор се опитва да говори пред одосадената публика за… относителната стойност на материята и духа.

„Верно на оригиналот“ от Николай Бойков

Пловдив: изд. „Жанет 45“, 2024

И Остър, и Грийн очевидно се вълнуват сериозно от представата за втори живот. А пък в книгата на Николай Бойков два живота живее езикът. Тя е

написана едновременно на български и македонски

и в нея разказвачът, двойник или несъщ близнак на своя автор, пребивава в съседната ни страна, за да превежда от македонски, двойник или несъщ близнак на родния му език. Първото нещо, което вижда читателят, е една комедия от грешки, защото каква е работата на близнаците, ако питате Шекспир, а и на двойниците, ако питаме Борхес, освен да не си съвпадат, да са различни в ключови („Дванайсета нощ“) или не толкова ключови („Двамата веронци“) пунктове? За смях на публиката и тотално объркване на всички, които настояват, че двамата са един и същ човек.

Ето, вижте какво може да настане, ако някой обърка двамата Дромио, тоест българския и македонския: да сложи лук вместо чесън, защото на македонски думата е такава, а нашият „лук“ е „кромид“; да сложи целина („зелер“) вместо зеле, да опипа някого, вместо просто да го потърси („бара“)…

Примерите ви се струват твърде ежедневни? Такива са. Героят разказвач на Николай Бойков не се интересува от романтичните сюблимности, неговият избор е обикновеното, простодушно битие на езика, уморил се да бъде идеологическо оръжие,

езикът, който се изхлузва от своето призвание да бъде „език свещен на моите деди“ и заживява по пазарите, при ваксаджиите, в задните дворове,

езикът като Остъров герой, който понякога издава своето амбициозно минало (омайното, сладко… пеене на мюезина), езикът, който просто обръща гръб на жадните за дуели реплики („Вие, българите, сте македонци, но не искате да си го признаете“) и се заема със своите дневни задачи. Да свари супа. Да преведе стихотворение. Да разбере кой е. Докато реже думите като лук.

По буквите: Остър, Грийн, Бойков

Николай Бойков е чудесен преводач от унгарски; може би най-отдалеченият от българския език в Европа. Захващането с македонския би било изненадващо, ако Бойков освен това не беше и поет, усетил именно

поетичния потенциал в разместените значения между двата езика.

Застанали до българските, македонските думи удрят едновременно два тона – на речниковото си значение, пояснено на момента, и на своя отглас в българския, днешен и някогашен. И читателят хем чува едно смислово многогласие, хем се вслушва в своя някогашен език. Този, разбира се, отпреди травмата, отпреди войните, когато близостта е била по-възможна.

В книгата на Николай Бойков това предвоенно езиково богатство се връща,

оглежда се объркано като непригоден за мирен живот ветеран и се чуди има ли място за него. Струва ми се знаково, че на перона го посреща не друг, а Иглика Василева, редактор на тази и преводач на другите две книги в днешния обзор, но и майстор на другите наши възможни езици. И това е по-убедителен хепиенд и от насвятканите решения на казусите на женските персонажи в „Сънсет парк“, и от завръщането на блудната дъщеря в „Разходка сред природата“. Българският език, казва ни този завършек, може да опознае себе си в цялата си сложност и несигурност, в пропускливите си и болезнени граници, за да стане все по-жизнен и силен.

„Сънсет парк“ не е от каноничните романи на Остър и е видно защо – второстепенните му персонажи не са особено убедителни, хеле женските – разказвачът уж се опитва да погледне през техните очи, но вижда само собствения си мъжки поглед и предполагаемия им стремеж към него. Развръзката пък е хем логична, хем озадачаваща – логична в първия, буквален план на самонастаняването в изоставена къща, озадачаваща спрямо очакванията за втори, символен план.

Това обаче не пречи на читателя да се самонастани в някои от редовете на тази книга, които предлагат дом и подслон за много конкретни безпризорни емоции. Разказите на Греъм Грийн са добре окръглени, завършени в своята фабулна цялост – някои така се търкулват, други обаче остават – в тях има нещо потрошено, болка, която стърчи над симетрията.

С какво затваря своята дневникова одисея Николай Бойков? С връщане в София, разбира се, с огледалната необходимост от премълчаване, когато някой ти каже „А, тя, Македония, е наша“. И с финален пасаж, който плува под повърхността на Дунава, издиша под водата, понякога остава със затворени очи и над повърхността, защото символните значения са си символни значения, а което има да тече, си го прави съвсем буквално. И все пак книгата те оставя и под, и над повърхността на езика:

ще ми се иска да кажа лесна работа, ще кажа лека работа.

А на мен ще ми се иска да не изоставяме още къщата на езика, макар да я обитаваме да не е нито лесно, нито леко.


Активните дарители на „Тоест“ получават постоянна отстъпка в размер на 20% от коричната цена на всички заглавия от каталозите на „Колибри“, „Кръг“, „Жанет 45“ и няколко други български издателства в рамките на партньорската програма Читателски клуб „Тоест“. За повече информация прочетете на toest.bg/club.

В емблематичната си колонка, започната още през 2008 г. във в-к „Култура“, Марин Бодаков ни представяше нови литературни заглавия и питаше с какво точно тези книги ни променят. Вярваме, че е важно тази рубрика да продължи. От човек до човек, с нова книга в ръка.

За практиките на отказа и изкуството на живота. Разговор с Крис Клийв

Post Syndicated from original https://www.toest.bg/za-praktikite-na-otkaza-i-izkustvoto-na-zhivota/

За практиките на отказа и изкуството на живота. Разговор с Крис Клийв

Крис Клийв (р. 1973) е английски писател, автор на разкази и на четири романа, всички от които са издадени у нас от издателство ICU – „Възпламеняване“, „Другата ръка“, „На смелите се прощава“, а сега и „Злато“. Възпитаник е на престижния Бейлиол Колидж в Оксфорд, където завършва психология, която също практикува. От 2007 до 2010 г. списва рубрика в „Гардиън“, посветена на децата и родителството. Живее в Лондон със съпругата си и трите им деца.

В първото ни интервю преди няколко години говорихме за предстоящия тогава превод на „Злато“ (последния Ви роман, преведен от Невена Дишлиева и Велин Кръстев). Тогава казахте, че това е Вашата най-позитивна, най-малко изпълнена с трупове книга. Сега в увода към нея добавяте и че е единствената, в която някой наистина печели в живота. Това означава ли, че не сте привърженик на хепиенда? Как гледате на финалите – различават ли се в живота и в литературата?

Страхотен въпрос! Мисля, че литературата започва точно както историите започват в реалния живот: двама души се срещат във влака или някой бяга от затвора – това са естествени и нормални начала. Финалите обаче са толкова различни! Романите трябва все някога да свършат, докато историите в истинския живот никога не приключват. Те се раздвояват ли, раздвояват, подобно на клони на дърво. Това е всъщност повече от липсата на свършек. Историите също така променят и значението си, както когато самите вие станете нечии предци и житейската ви история се превърне в историята на произхода на тези след вас. Така че, действително, да започнеш един роман е много лесно и естествено, но да го завършиш си е истинско изкуство. Много ми е трудно да пиша финали. Не харесвам книги, в които всичко е завършено, приключено, обяснено. Обичам книги, които ме оставят с неясно усещане – нещо като камъче в обувката, нещо, за което да мисля през следващите няколко дни. Така че се опитвам и аз да пиша по този начин – не твърде весело и не прекалено тъжно. Просто да оставя нещо интересно, с което съзнанието на читателя да е ангажирано за известно време.

Писал сте „Злато“ през 2010 г., две години преди Летните олимпийски игри в Лондон. „Възпламеняване“ е написана през 2005 г., когато тероризмът все още беше водеща новина (спомняме си, че публикуването на романа съвпадна с атентатите в Лондон). „На смелите се прощава“ пък е за Втората световна война. Съзнателно ли избирате конкретни, но мащабни събития за сюжетите си и по-лесно ли е да разгръщате обикновените човешки драми на техен фон?

Да, обичам да показвам обикновените животи в рамките на мащабни събития. Намирам художествените романи за интересни, защото ни позволяват да видим несъвършенствата и изключенията в чистите наративи на голямата история. Хората са толкова интересни същества – те правят наистина странни неща, решават дилемата и драмата на съществуването по напълно неочаквани начини. Вярвам, че животът на обикновения човек е значим и очарователен. Като писател използвам големите и силни наративи, които всички познаваме (тероризъм, принудителна миграция, спорт, война), под формата на рамка, съхраняваща по-малките и деликатни човешки истории. По същия начин може да използваме колчета, за да отгледаме грах в градината.

Говорейки за Лондон във въведението към „Злато“, споменавате, че всичките Ви книги са по един или друг начин свързани с града, който противопоставяте на други, доста по user-friendly места. И все пак бихте ли се съгласил, че едно такова огромно, космополитно, изобилно откъм разнообразие, култури и съдби пространство Ви дава по-голямо предимство в прозата, отколкото, да речем, ако като автор работите с едно далеч по-скромно, регионално, непознато място?

Трябва да бъда честен и да кажа, че не знам. Не знам какво би било да напиша книга, чието действие се развива на място, на което не живея. Странно е, нали? Мога да си представя да пиша от гледната точка на най-различни хора, но не мога да си представя история, която не е свързана с родния ми град. Не знам какво говори това за мен като писател. Може би чувствам, че всички истории са наистина за места и времена, и за начина, по който човешкото сърце се проявява в тях. Като писател трябва да усетя мястото буквално в костите си, за да се почувствам уверен, че съм способен да зная как би се чувствало едно или друго човешко сърце на него.

Треньорът Том, един от героите ви в „Злато“, казва: „На моята възраст забележителното събитие не е онова, което те плаши.“ Какво Ви плаши лично Вас днес, на сегашната Ви възраст (може би в сравнение с – да речем – времето, когато написахте книгата)?

Отново страхотен въпрос! Когато написах книгата, не се страхувах много. Бях на около четирийсет, възраст, на която всичко ми изглеждаше възможно. Сега съм на петдесет и много добре осъзнавам, че вече не притежавам това безстрашие. Това, което ме плаши сега, е, че ще се откажем един от друг, че ще разлюбим собствената си човечност. Живеем в епоха на силно емоционално насилие и разединение. Мисля, че сме изтощени един от друг. Мисля, че на хората им е трудно да оценят и да се зарадват на големите, оригиналните, изобретателните сърца, които са ни дадени, защото толкова често тези сърца са ни разочаровали. Всъщност това е причината да уча за психотерапевт и сега да практикувам тази професия, докато същевременно пиша. Интересувам се от проекти, които ни помагат да обикнем отново човека и човешкото – и у себе си, и у другите.

В книгата става дума за два вида успех, пресъздадени през образите и съдбите на двете Ви героини. Ако всичко в този живот минава за някаква форма на постижение или триумф (като това да се излекуваш например), то какво тогава отличава спорта от другите ни житейски победи? Твърде буквален ли е успехът в него?

Спортът е интересен, защото именно абсолютното изискване за абсолютен успех е онова, което прави другите успехи възможни. Добър пример е миналогодишния „Тур дьо Франс“ – имам предвид съперничеството между Йонас Вингегор и Тадей Погакар. Те се отнасяха един към друг безкрайно учтиво и с уважение. Това беше успех на човещината, на който наистина се радвах повече, отколкото на успеха в реалното състезание. Имаше нужда обаче от изискването за спортна победа, защото именно то направи забележителен начина, по който двамата се държаха един с друг. Лесно е да се отнасяш към някого на опашката пред автобуса с любов и уважение – много по-трудно е да направиш същото за най-заклетия си съперник. Ето как спортът ни дава изключителни уроци какви можем да бъдем като човешки същества. Друг чудесен пример са маратоните „Баркли“ – състезание, което е почти невъзможно да бъде завършено. Тази година невероятната Джасмин Парис стана първата жена в историята, постигнала това. Отново подчертавам, че не се интересувам толкова от маратоните сами по себе си, но спортният успех трябва да бъде абсолютно безспорен именно за да направи човешкия успех толкова значим.

Смятате ли, че има момент, в който човек трябва да отстъпи и може би дори да се откаже, имайки предвид, че една от днешните мантри е Never Give Up, Never Give In – никога, нито за миг да не се предаваме?

Абсолютно! Отказването е това, което трябва да практикуваме най-вече. Повечето ни идеи, планове и проекти, оказва се, всъщност не струват и няма смисъл да жертваме живота си на олтара на ината. Изкуството на живота, на това да бъдем хора, е в това да открием малкото неща, от които никога няма да се откажем.

А как може успехът да ни тегли надолу?

Наистина е важно да извоюваме победи и да бележим успехи от време на време. Имаме нужда от доказателства за собствената ни компетентност и умения. В противен случай дори най-смелите сред нас губят кураж и храброст. Но при успеха има два проблема: единият е, че не ни учи на кой знае какво, а другият – че започваме да се страхуваме да не го загубим. Твърде многото успехи ни карат да се боим да рискуваме онова, което вече имаме; твърде многото провали ни карат да губим смелост. Между тези два вида страх ние трябва да преценяваме намеренията и проектите си. Добре е да успяваме и да печелим в около половината от времето, поне така мисля аз.

Една от героините Ви често е на второ място заради своята емпатия и доброта. Възможно ли е любовта да ни прави неспособни да се състезаваме, да ни прави „губещи“?

Не мисля така. Любовта може да ни направи и свирепи, и ожесточени. Вярно е, че спортистите трябва да отнемат победата от съперниците си. Но те също така дават нещо. На спорта, на зрителите, на своите колеги състезатели. Да се ​​състезаваш с цялото си сърце е дар, а този дар може да бъде даден с любов. Вярно е обаче, че има много форми на любов. Мисля, че най-деликатните моменти в спорта са тези, в които любовта към състезанието се балансира с други видове любов.

В предговора разказвате донякъде забавната история за една двойка, която си разпъва сгъваеми столчета на улицата и седи на тях, докато чака да се приближат размирни и опасни улични протестиращи. Какъв е препоръчителният подход в очакване на насилието, на неконтролируемото, на неизбежното?

Любов един към друг – във всяка минута, която ни остава.

Спомняте си лозунга от Втората световна война Keep Calm and Carry On, който споменавате и в „На смелите се прощава“, нали? Възможно ли е забравянето да е част от инстинкта ни да продължим? Или паметта е обратното – необходим пътен знак в процеса на продължаването?

Мисля, че паметта е дълбока форма на любов. Ето защо тоталитарните режими и упражняващите газлайтинг винаги се опитват да я изтрият или контролират. Паметта не е запис на някакви неопровержими факти, подобно на видеозаписа. Тя е непрекъснато променяща се история, която си разказваме, за нещата, имали значение за нас някога. За да продължим, трябва да сме в любяща връзка точно с тези неща – които имат значение за нас. Мисля, че нашата памет и нашата духовност са изключително тясно свързани.

Ще се срещате с читатели в няколко български града. Смятате ли, че публиките ви по света се различават значително една от друга и Ви четат по различен начин, или… романите ви предлагат нещо като есперанто, универсален език?

Мисля, че историите ми са доста универсални и общочовешки, но също така могат и да разделят. Открих, че хората или ги обичат, или ги мразят, така че дори двама души в един и същи град могат да ги прочетат по много различен начин! Изключително съм благодарен за предстоящото ми посещение в България и наистина очаквам с нетърпение да се срещна с читателите. Благодаря ви още веднъж за това, че прочетохте моите книги, и за невероятните въпроси! Това значи страшно много за мен.


Може да видите Крис Клийв на живо в България и да си вземете автограф на следните дати и места:

11 юни, 18.30 ч.
По покана на „Книжарница в куфар“
Стара Загора, РБ „Захарий Княжески“
Крис Клийв заедно с Иво Иванов и Камен Алипиев – Кедъра

12 юни, 19.30 ч.
По покана на „Пловдив чете“
Пловдив, Stage Park, Младежки хълм
Крис Клийв заедно с Иво Иванов и Камен Алипиев – Кедъра

14 юни, 20.00 ч.
По покана на „Литературни срещи“ 
София, Борисова градина, One More Park Bar
Крис Клийв в разговор с Лора Ненковска

The state of SourceHut

Post Syndicated from jzb original https://lwn.net/Articles/977174/

Drew DeVault has published
an update about the state of the SourceHut software development
platform and its plans for the coming months. This is the first update
since the January post-mortem
following a distributed denial-of-service (DDoS) attack that resulted
in a prolonged
outage
:

As you can imagine, it has been a stressful time for us. However, I
wish to stress that everything we’ve been dealing with is planned for
in our models, both technical and financial. There is no existential
threat to SourceHut. Nevertheless, we are grateful for your patience
and support.

[…] We have been focusing on two things this year: provisioning
and managing our infrastructure and getting as much rest as
possible. Our situation has calmed down, and while we still have a lot
of loose ends to attend to I’m happy to say that we’re resuming a
sense of normalcy here and preparing to resume our work on the
features you need.

[$] Comparing BPF performance between implementations

Post Syndicated from daroc original https://lwn.net/Articles/976317/

Alan Jowett returned for a second remote presentation at the 2024
Linux Storage,
Filesystem, Memory Management, and BPF Summit
to compare the performance of
different BPF runtimes. He showed the results of the MIT-licensed BPF

microbenchmark suite
he has been working on.
The benchmark suite does not yet provide a good direct comparison between all
platforms, so the results should be
taken with a grain of salt. They do
seem to indicate that there is some significant variation between
implementations, especially for different types of BPF maps.

Security updates for Wednesday

Post Syndicated from jzb original https://lwn.net/Articles/977233/

Security updates have been issued by Fedora (deepin-qt5integration, deepin-qt5platform-plugins, dotnet8.0, dwayland, fcitx-qt5, fcitx5-qt, gammaray, kddockwidgets, keepassxc, kf5-akonadi-server, kf5-frameworkintegration, kf5-kwayland, plasma-integration, python-qt5, qadwaitadecorations, qgnomeplatform, qt5, qt5-qt3d, qt5-qtbase, qt5-qtcharts, qt5-qtconnectivity, qt5-qtdatavis3d, qt5-qtdeclarative, qt5-qtdoc, qt5-qtgamepad, qt5-qtgraphicaleffects, qt5-qtimageformats, qt5-qtlocation, qt5-qtmultimedia, qt5-qtnetworkauth, qt5-qtquickcontrols, qt5-qtquickcontrols2, qt5-qtremoteobjects, qt5-qtscript, qt5-qtscxml, qt5-qtsensors, qt5-qtserialbus, qt5-qtserialport, and qt5-qtspeech), Oracle (389-ds-base and ruby:3.1), Red Hat (389-ds-base, glibc, and kernel), SUSE (python-PyMySQL), and Ubuntu (libarchive).

European Union elections 2024: securing democratic processes in light of new threats

Post Syndicated from Petra Arts original https://blog.cloudflare.com/eu-elections-2024


Between June 6-9 2024, hundreds of millions of European Union (EU) citizens will be voting to elect their members of the European Parliament (MEPs). The European elections, held every five years, are one of the biggest democratic exercises in the world. Voters in each of the 27 EU countries will elect a different number of MEPs according to population size and based on a proportional system, and the 720 newly elected MEPs will take their seats in July. All EU member states have different election processes, institutions, and methods, and the security risks are significant, both in terms of cyber attacks but also with regard to influencing voters through disinformation. This makes the task of securing the European elections a particularly complex one, which requires collaboration between many different institutions and stakeholders, including the private sector. Cloudflare is well positioned to support governments and political campaigns in managing large-scale cyber attacks. We have also helped election entities around the world by providing tools and expertise to protect them from attack. Moreover, through the Athenian Project, Cloudflare works with state and local governments in the United States, as well as governments around the world through international nonprofit partners, to provide Cloudflare’s highest level of protection for free to ensure that constituents have access to reliable election information.

Election security in 2024: dealing with new and upcoming threats

Ensuring a free, fair, and open electoral process and securing candidate campaigns is understandably a top priority for the EU institutions, as well as for national governments and cybersecurity agencies across the EU. European authorities have already taken a number of measures to ensure the elections are well-protected. Efforts to coordinate election security measures amongst the EU countries are led by the NIS Cooperation Group, with the support of the EU Agency for Cybersecurity (ENISA), the European Commission, and the European External Action Service (the EU’s foreign service).

The NIS Cooperation Group recently issued an updated Compendium on safeguarding the elections amidst cybersecurity challenges, noting that “since the last EU elections in 2019, the elections threat landscape has evolved significantly”. Governments note in particular the impact of Artificial Intelligence (AI), including deep fakes, but also the increased sophistication of threat actors and the trend of “hacktivists-for-hire” as new risks that need to be taken into account. European institutions also highlight today’s geopolitical context, with conflicts in Ukraine and the Middle East impacting cyber threats and foreign influence campaigns in Europe. The European External Action Service analyzed cases of FIMI (Foreign Information Manipulation and Interference) during recent national elections in Spain and Poland, and put together suggested plans for governments on how to respond to the various stages of those FIMI campaigns originating from foreign (e.g. non-EU) actors. EU High Representative for Foreign Affairs Josep Borrell said in a recent blog post that protecting the election process and more broadly European public debate from malign foreign actors “is a security challenge, which we need to tackle seriously”.

Some national governments have also warned against the risks of so-called hybrid threats, whereby foreign governments deploy various methods to exert influence on other states, including disinformation campaigns, cyberattacks and espionage. Germany’s Federal Ministry of the Interior notes that “elections are often a catalyst for increased levels of illegitimate activity by foreign governments, because stoking fear and spreading hate can contribute to the polarization of society, influencing voting habits. (…) We must make a determined effort to counter these threats.”

EU readiness for election season

As part of national and EU-level coordination amongst governments and agencies to prepare to mitigate threats and risks to the European elections, ENISA supports national governments’ measures to ensure the elections will be secure, including by organizing a cybersecurity exercise to test the various crisis plans and responses to potential attacks by national and EU level agencies and governments. ENISA has also put together a checklist for authorities in order to raise awareness on specific risks and threats to the election process.

The European Union has also prepared for other phenomena endangering the security and integrity of the election process, including the spread of disinformation via online platforms. For example, the European Commission recently issued strict guidelines for “Very Large Online Platforms” (VLOPs) and “Very Large Search Engines” (VLOSEs) under the EU Digital Services Act on measures to mitigate systemic risks online that may impact the integrity of elections. These large companies will be required to have dedicated staff to monitor for disinformation threats in the 23 official EU languages across the 27 member states, collaborating closely with European cybersecurity authorities. In addition, in line with upcoming EU legislation on transparency of political advertising, political ads on large social media platforms should be clearly labeled as such.

In its 11th EU Threat Landscape report, published in 2023, ENISA also warned about the risks associated with the rise of AI-enabled information manipulation, including the disruptive impacts of AI chatbots. The European Commission, in its efforts to fight the proliferation of deep fakes and sophisticated voter manipulation tactics through advanced generative AI systems, recently launched inquiries into major AI developers and promoted industry pledges in the context of the EU AI Pact.

The view from Cloudflare: increases in cyber attacks around elections

It is likely that the EU is going to see a trend similar to many other jurisdictions where there have been increases in cyber threats targeting election entities. In the period between November 2022 and August 2023, Cloudflare mitigated 213.78 million threats to government election websites in the United States. That amounts to 703,223 threats mitigated per day on average. There is indeed already evidence that European institutions are subject to increasing attacks.

In November 2023, the European Parliament website was subject to a large cyber attack. And in March 2024, French government websites faced attacks of “unprecedented intensity,” according to a spokesperson. A few days before the attacks, on February 25, 2024, Cloudflare blocked a significant DDoS attack on a French government website. It reached as much as 420 million requests per hour and lasted for over three hours.

The UK government warned last year that there were “sustained” cyberattacks against civil society organizations, journalists and public sector groups, as well as phishing attempts directed at British politicians. Most recently, the IT infrastructure of German political party CDU was hit by a “serious cyberattack” according to the German Interior Ministry.

We have also seen that the magnitude of cyber attacks overall is growing every year. As outlined in Cloudflare’s latest DDoS threat report, published in Q1 2024, Cloudflare’s defense systems automatically mitigated 4.5 million DDoS attacks during that first quarter, representing a 50% year-over-year (YoY) increase. EU governments noted in their 2024 Compendium on safeguarding the elections that DDoS attacks “can still be very effective in undermining the public’s trust in the electoral process, especially if affecting its most critical and visible phases – that is the transmission, aggregation and display of voting results”.

However, it is not only an increase in the size of attacks on websites that is keeping election officials up at night. There are often multiple attack vectors that need to be taken into account, and ensuring election processes and public institutions remain secure is a very complicated task. For example, in the three months leading up to the 2022 U.S. midterm elections, Cloudflare prevented around 150,000 phishing emails targeting campaign officials. ENISA’s latest EU Threat Landscape report, when discussing phishing campaigns, pointed to the risks of AI applied to social engineering (e.g. used for crafting more convincing phishing messages), which can make phishing less costly, easier to scale-up, and more effective. These developments all show how securing voter registration systems, ensuring the integrity of election-related information, and planning effective incident response are necessary as online threats grow more and more sophisticated.

Securing the democratic process in the digital age requires partnerships between governments, civil society, and the private sector. Cloudflare has helped election entities around the world by providing tools and expertise to protect themselves from cyberattack. For example, in 2020, we partnered with the International Foundation for Electoral Systems to provide Enterprise-level services to six election management bodies, including the Central Election Commission of Kosovo, State Election Commission of North Macedonia, and many local election bodies in Canada.

Impact on Internet traffic

Cloudflare’s global network, which spans more than 120 countries and protects around 20% of all websites, allows us a unique view of the trends and patterns seen in Internet traffic. Some of those trends, including traffic, connection quality, and Internet outages, can be seen in our Internet insights platform, Cloudflare Radar.

Several of these trends are especially important to watch during election season. Upon deeper analysis, we observed spikes in traffic to websites related to elections, and to news websites, during this time. From data obtained in 2023 through an analysis of US state and local government websites protected under the Athenian Project, as well as US nonprofit organizations that work in voting rights and promoting democracy under Project Galileo, and political campaigns and parties under Cloudflare for Campaigns, Cloudflare observed an increase in traffic to US election and non-profit websites during the run-up to elections, and then a significant spike on election day as seen in the graphs below.

Cloudflare observed similar patterns for election information websites and news media during the first day of the 2022 French Presidential elections and during the Presidential elections in Brazil that same year.

DNS traffic to election domains observed through Cloudflare’s 1.1.1.1 resolver in April 2022, during the first round of the French Presidential elections

Coordinated efforts are key

The protection of election entities and related organizations and institutions is a huge and complex task. As noted, this requires partnerships and collaboration between different actors, both public and private, with specific expertise. The work done by EU governments and agencies to prepare, be ready and collaborate on election security precautions as outlined above is both welcome and necessary in order to ensure free, fair and above all secure elections. This can only ever be a coordinated effort, with both governments and industry working together to ensure a robust response to any threats to the democratic process. For its part, Cloudflare is protecting a number of governmental and political campaign websites across the EU.

We want to ensure that all groups working to promote democracy around the world have the tools they need to stay secure online. If you work in the election space and need our help, please get in touch. If you are an organization looking for protection under Project Galileo, please visit our website at cloudflare.com/galileo.

More information about the European Union elections can be found here. And if you are based in the EU, do not forget to vote!

Securing AI Development in the Cloud: Navigating the Risks and Opportunities

Post Syndicated from Rapid7 original https://blog.rapid7.com/2024/06/05/securing-ai-development-in-the-cloud-navigating-the-risks-and-opportunities/

AI-TRiSM – Trust, Risk and Security Management in the Age of AI

Securing AI Development in the Cloud: Navigating the Risks and Opportunities

Co-authored by Lara Sunday and Pojan Shahrivar

As artificial intelligence (AI) and machine learning (ML) technologies continue to advance and proliferate, organizations across industries are investing heavily in these transformative capabilities. According to Gartner, by 2027, spending on AI software will grow to $297.9 billion at a compound annual growth rate of 19.1%. Generative AI (GenAI) software spend will rise from 8% of AI software in 2023 to 35% by 2027.

With the promise of enhanced efficiency, personalization, and innovation, organizations are increasingly turning to cloud environments to develop and deploy these powerful AI and ML technologies. However, this rapid innovation also introduces new security risks and challenges that must be addressed proactively to protect valuable data, intellectual property, and maintain the trust of customers and stakeholders.

Benefits of Cloud Environments for AI Development

Cloud platforms offer unparalleled scalability, allowing organizations to easily scale their computing resources up or down to meet the demanding requirements of training and deploying complex AI models.

“The ability to spin up and down resources on-demand has been a game-changer for our AI development efforts,” says Stuart Millar, Principal AI Engineer at Rapid7. “We can quickly provision the necessary compute power during peak training periods, then scale back down to optimize costs when those resources are no longer needed.”

Cloud environments also provide a cost-effective way to develop AI models, with usage-based pricing models that avoid large upfront investments in hardware and infrastructure. Additionally, major cloud providers offer access to cutting-edge AI hardware and pre-built tools and services, such as Amazon SageMaker, Azure Machine Learning, and Google Cloud AI Platform, which can accelerate development and deployment cycles.

Challenges and Risks of Cloud-Based AI Development

While the cloud offers numerous advantages for AI development, it also introduces unique challenges that organizations must navigate. Limited visibility into complex data flows and model updates can create blind spots for security teams, leaving them unable to effectively monitor for potential threats or anomalies.

In their  AI Threat Landscape Report, HiddenLayer highlighted that 98% of all the companies surveyed identified that elements of their AI models were crucial to their business success, and 77% identified breaches to their AI in the past year. Additionally, multi-cloud and hybrid deployments bring monitoring, governance, and reporting challenges, making it difficult to assess AI/ML risk in context across different cloud environments.

New Attack Vectors and Risk Types

Developing AI in the cloud also exposes organizations to new attack vectors and risk types that traditional security tools may not be equipped to detect or mitigate. Some examples include:

Prompt Injection (LLM01): Imagine a large language model used for generating marketing copy. An attacker could craft a special prompt that tricks the model into generating harmful or offensive content, damaging the company’s brand and reputation.

Training Data Poisoning (LLM03, ML02): Adversaries can tamper with training data to compromise the integrity and reliability of cloud-based AI models. In the case of an AI model used for image recognition in a security surveillance system, poisoned training data containing mislabeled images could cause the model to generate incorrect classifications, potentially missing critical threats.

Model Theft (LLM10, ML05): Unauthorized access to proprietary AI models deployed in the cloud poses risks to intellectual property and competitive advantage. If a competitor were to steal a model trained on a company’s sensitive data, they could potentially replicate its functionality and gain valuable insights.

Supply Chain Vulnerabilities (LLM05, ML06): Compromised libraries, datasets, or services used in cloud AI development pipelines can lead to widespread security breaches. A malicious actor might introduce a vulnerability into a widely used open-source library for AI, which could then be exploited to gain access to AI models deployed by multiple organizations.

Developing Best Practices for Securing AI Development

To address these challenges and risks, organizations need to develop and implement best practices and standards tailored to their specific business needs, striking the right balance between enabling innovation and introducing risk.

While guidelines like NCSC Secure AI System Development and The Open Standard for Responsible AI provide a valuable starting point, organizations must also develop their own customized best practices that align with their unique business requirements, risk appetite, and AI/ML use cases. For instance, a financial institution developing AI models for fraud detection might prioritize best practices around data governance and model explainability to ensure compliance with regulations and maintain transparency in decision-making processes.

Key considerations when developing these best practices include:

Ensuring secure data handling and governance throughout the AI lifecycle

  • Implementing robust access controls and identity management for AI/ML resources
  • Validating and monitoring AI models for potential biases, vulnerabilities, or anomalies
  • Establishing incident response and remediation processes for AI-specific threats
  • Maintaining transparency and explainability to understand and audit AI model behavior

Rapid7’s Approach to Securing AI Development

“At Rapid7, our InsightCloudSec solution offers real-time visibility into AI/ML resources running across major cloud providers, allowing security teams to continuously monitor for potential risks or misconfigurations,” says Aniket Menon, VP, Product Management. “Visibility is the foundation for effective security in any environment, and that’s especially true in the complex world of AI development. Without a clear view into your AI/ML assets and activities, you’re essentially operating blind, leaving your organization vulnerable to a range of threats.”

Here at Rapid7 our AI TRiSM (Trust, Risk, and Security Management) framework empowers our teams. The framework provides us with confidence not only in our operations but also in driving innovation. In their recent blog outlining the company’s AI principles, Laura Ellis and Sabeen Malik shared how Rapid7 tackles and addresses AI challenges. Centering on transparency, fairness, safety, security, privacy, and accountability, these principles are not just guidelines; they are integral to how Rapid7 builds, deploys, and manages AI systems.

Security and compliance are two key InsightCloudSec capabilities. Compliance Packs are out-of-the-box collections of related Insights focused on industry requirements and standards for all of your resources. Compliance packs may focus on security, costs, governance, or combinations of these across a variety of frameworks, e.g., HIPAA, PCI DSS, GDPR, etc.

Last year Rapid7 launched the Rapid7 AI/ML Security Best Practices compliance pack, the pack allows for real-time and continuous visibility into AI/ML resources running across your clouds with support for GenAI services across AWS, Azure and GCP. To empower you to assess this data in the context of your organizational requirements and priorities, you can then automatically prioritize AI/ML-related risk with Layered Context based on exploitability and potential business impact.

You can also leverage Identity Analysis in InsightCloudSec to collect and present the actions executed by a given user or role within a certain time period. These logged actions are collected and analyzed, providing you with a view across your organization of who can access AI/ML resources and automatically rightsize in accordance with the least privilege access (LPA) concept. This enables you to strategically inform your policies moving forward. Native automation allows you to then act on your assessments to alert on compliance drift, remediate AI/ML risk, and enact prevention mechanisms.

Rapid7’s Continued Dedication to AI Innovation

As an inaugural signer of the CISA Secure by Design Pledge, and through our partnership with Queen’s University Belfast Centre for Secure Information Technologies (CSIT), Rapid7 remains dedicated to collaborating with industry leaders and academic institutions to stay ahead of emerging threats and develop cutting-edge solutions for securing AI development.

As the adoption of AI and ML capabilities continues to accelerate, it’s imperative that organizations have the knowledge and tools to make informed decisions and build with confidence. By implementing robust best practices and leveraging advanced security tools like InsightCloudSec, organizations can harness the power of AI while mitigating the associated risks and ensuring their valuable data and intellectual property remain protected.

To learn more about how Rapid7 can help your organization develop and implement best practices for securing AI development, visit our website to request a demo.


Gartner, Forecast Analysis: Artificial Intelligence Software, 2023-2027, Worldwide, Alys Woodward, et al, 07 November 2023

Online Privacy and Overfishing

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/06/online-privacy-and-overfishing.html

Microsoft recently caught state-backed hackers using its generative AI tools to help with their attacks. In the security community, the immediate questions weren’t about how hackers were using the tools (that was utterly predictable), but about how Microsoft figured it out. The natural conclusion was that Microsoft was spying on its AI users, looking for harmful hackers at work.

Some pushed back at characterizing Microsoft’s actions as “spying.” Of course cloud service providers monitor what users are doing. And because we expect Microsoft to be doing something like this, it’s not fair to call it spying.

We see this argument as an example of our shifting collective expectations of privacy. To understand what’s happening, we can learn from an unlikely source: fish.

In the mid-20th century, scientists began noticing that the number of fish in the ocean—so vast as to underlie the phrase “There are plenty of fish in the sea”—had started declining rapidly due to overfishing. They had already seen a similar decline in whale populations, when the post-WWII whaling industry nearly drove many species extinct. In whaling and later in commercial fishing, new technology made it easier to find and catch marine creatures in ever greater numbers. Ecologists, specifically those working in fisheries management, began studying how and when certain fish populations had gone into serious decline.

One scientist, Daniel Pauly, realized that researchers studying fish populations were making a major error when trying to determine acceptable catch size. It wasn’t that scientists didn’t recognize the declining fish populations. It was just that they didn’t realize how significant the decline was. Pauly noted that each generation of scientists had a different baseline to which they compared the current statistics, and that each generation’s baseline was lower than that of the previous one.

What seems normal to us in the security community is whatever was commonplace at the beginning of our careers.

Pauly called this “shifting baseline syndrome” in a 1995 paper. The baseline most scientists used was the one that was normal when they began their research careers. By that measure, each subsequent decline wasn’t significant, but the cumulative decline was devastating. Each generation of researchers came of age in a new ecological and technological environment, inadvertently masking an exponential decline.

Pauly’s insights came too late to help those managing some fisheries. The ocean suffered catastrophes such as the complete collapse of the Northwest Atlantic cod population in the 1990s.

Internet surveillance, and the resultant loss of privacy, is following the same trajectory. Just as certain fish populations in the world’s oceans have fallen 80 percent, from previously having fallen 80 percent, from previously having fallen 80 percent (ad infinitum), our expectations of privacy have similarly fallen precipitously. The pervasive nature of modern technology makes surveillance easier than ever before, while each successive generation of the public is accustomed to the privacy status quo of their youth. What seems normal to us in the security community is whatever was commonplace at the beginning of our careers.

Historically, people controlled their computers, and software was standalone. The always-connected cloud-deployment model of software and services flipped the script. Most apps and services are designed to be always-online, feeding usage information back to the company. A consequence of this modern deployment model is that everyone—cynical tech folks and even ordinary users—expects that what you do with modern tech isn’t private. But that’s because the baseline has shifted.

AI chatbots are the latest incarnation of this phenomenon: They produce output in response to your input, but behind the scenes there’s a complex cloud-based system keeping track of that input—both to improve the service and to sell you ads.

Shifting baselines are at the heart of our collective loss of privacy. The U.S. Supreme Court has long held that our right to privacy depends on whether we have a reasonable expectation of privacy. But expectation is a slippery thing: It’s subject to shifting baselines.

The question remains: What now? Fisheries scientists, armed with knowledge of shifting-baseline syndrome, now look at the big picture. They no longer consider relative measures, such as comparing this decade with the last decade. Instead, they take a holistic, ecosystem-wide perspective to see what a healthy marine ecosystem and thus sustainable catch should look like. They then turn these scientifically derived sustainable-catch figures into limits to be codified by regulators.

In privacy and security, we need to do the same. Instead of comparing to a shifting baseline, we need to step back and look at what a healthy technological ecosystem would look like: one that respects people’s privacy rights while also allowing companies to recoup costs for services they provide. Ultimately, as with fisheries, we need to take a big-picture perspective and be aware of shifting baselines. A scientifically informed and democratic regulatory process is required to preserve a heritage—whether it be the ocean or the Internet—for the next generation.

This essay was written with Barath Raghavan, and previously appeared in IEEE Spectrum.

„Алеф” награди победителите в единадесетото издание на Международния литературен ученически конкурс „Който спаси един човешки живот, спасява цяла вселена”

Post Syndicated from Биволъ original https://bivol.bg/%D0%B0%D0%BB%D0%B5%D1%84-%D0%BD%D0%B0%D0%B3%D1%80%D0%B0%D0%B4%D0%B8-%D0%BF%D0%BE%D0%B1%D0%B5%D0%B4%D0%B8%D1%82%D0%B5%D0%BB%D0%B8%D1%82%D0%B5-%D0%B2-%D0%B5%D0%B4%D0%B8%D0%BD%D0%B0.html

сряда 5 юни 2024


Младежи от осем страни казаха „НЕ!” на войната и насилието. От сцената на Културен дом НХК в Бургас, където се проведе церемонията по награждаване на победителите в Международния литературен ученически…

Уроци по творчество и памет в Третия международен литературен фестивал „Приятелството – смисъл и спасение”

Post Syndicated from Екип на Биволъ original https://bivol.bg/%D1%83%D1%80%D0%BE%D1%86%D0%B8-%D0%BF%D0%BE-%D1%82%D0%B2%D0%BE%D1%80%D1%87%D0%B5%D1%81%D1%82%D0%B2%D0%BE-%D0%B8-%D0%BF%D0%B0%D0%BC%D0%B5%D1%82-%D0%B2-%D1%82%D1%80%D0%B5%D1%82%D0%B8%D1%8F-%D0%BC%D0%B5.html

сряда 5 юни 2024


За трети пореден път международният литературен младежки фестивал „Приятелството – смисъл и спасение” даде възможност на 30 млади автори от 8 държави да се срещнат със забележителни личности и да…

Profile-guided optimisation (PGO) on Grab services

Post Syndicated from Grab Tech original https://engineering.grab.com/profile-guided-optimisation

Profile-guided optimisation (PGO) is a technique where CPU profile data for an application is collected and fed back into the next compiler build of Go application. The compiler then uses this CPU profile data to optimise the performance of that build by around 2-14% currently (future releases could likely improve this figure further).

High level view of how PGO works

PGO is a widely used technique that can be implemented with many programming languages. When it was released in May 2023, PGO was introduced as a preview in Go 1.20.

Enabling PGO on a service

Profile the service to get pprof file

First, make sure that your service is built using Golang version v1.20 or higher, as only these versions support PGO.

Next, enable pprof in your service.

If it’s already enabled, you can use the following command to capture a 6-minute profile and save it to /tmp/pprof.

curl 'http://localhost:6060/debug/pprof/profile?seconds=360' -o /tmp/pprof

Enabled PGO on the service

TalariaDB: TalariaDB is a distributed, highly available, and low latency time-series database for Presto open sourced by Grab.

It is a service that runs on an EKS cluster and is entirely managed by our team, we will use it as an example here.

Since the cluster deployment relies on a Docker image, we only need to update the Docker image’s go build command to include -PGO=./talaria.PGO. The talaria.PGO file is a pprof profile collected from production services over a span of 360 seconds.

If you’re utilising a go pluginas we do in TalariaDB, it’s crucial to ensure that the PGO is also applied to the plugin.

Here’s our Dockerfile, with the additions to support PGO.

FROM arm64v8/golang:1.21 AS builder

ARG GO111MODULE="on"
ARG GOOS="linux"
ARG GOARCH="arm64"
ENV GO111MODULE=${GO111MODULE}
ENV GOOS=${GOOS}
ENV GOARCH=${GOARCH}

RUN mkdir -p /go/src/talaria
COPY . src/talaria
#RUN cd src/talaria && go mod download  && go build && test -x talaria
RUN cd src/talaria && go mod download  && go build -PGO=./talaria.PGO && test -x talaria

RUN mkdir -p /go/src/talaria-plugin
COPY ./talaria-plugin  src/talaria-plugin
RUN cd src/talaria-plugin && make plugin && test -f talaria-plugin.so
FROM arm64v8/debian:latest AS base

RUN apt-get update && apt-get install -y ca-certificates && rm -rf /var/cache/apk/*

WORKDIR /root/ 
ARG GO_BINARY=talaria
COPY  --from=builder /go/src/talaria/${GO_BINARY} .
COPY  --from=builder /go/src/talaria-plugin/talaria-plugin.so .

ADD entrypoint.sh . 
RUN mkdir /etc/talaria/ && chmod +x /root/${GO_BINARY} /root/entrypoint.sh
ENV TALARIA_RC=/etc/talaria/talaria.rc 
EXPOSE 8027
ENTRYPOINT ["/root/entrypoint.sh"]

Result on enabling PGO on one GrabX service

It’s important to mention that the pprof utilised for PGO was not captured during peak hours and was limited to a duration of 360 seconds.

Service TalariaDB has three clusters and the time we enabled PGO for these clusters are:

  • We enabled PGO on cluster 0, and deployed on 4 Sep 11.16 AM.
  • We enabled PGO on cluster 1, and deployed on 5 Sep 15:00 PM.
  • We enabled PGO on cluster 2, and deployed on 6 Sep 16:00 PM.

The size of the instances, their quantity, and all other dependencies remained unchanged.

CPU metrics on cluster

Cluster CPU usage before enabling PGO
Cluster CPU usage after enabling PGO

It’s evident that enabling PGO resulted in at least a 10% reduction in CPU usage.

Memory metrics on cluster

Memory usage of the cluster before enabling PGO
Percentage of free memory after enabling PGO

It’s clear that enabling PGO led to a reduction of at least 10GB (30%) in memory usage.

Volume metrics on cluster

Persistent volume usage on cluster before enabling PGO
Volume usage after enabling PGO

Enabling PGO resulted in a reduction of at least 7GB (38%) in volume usage. This volume is utilised for storing events that are queued for ingestion.

Ingested event count/CPU metrics on cluster

To gauge the enhancements, I employed the metric of ingested event count per CPU unit (event count / CPU). This approach was adopted to account for the variable influx of events, which complicates direct observation of performance gains.

Count of ingested events on cluster after enabling PGO

Upon activating PGO, there was a noticeable increase in the ingested event count per CPU, rising from 1.1 million to 1.7 million, as depicted by the blue line in the cluster screenshot.

How we enabled PGO on a Catwalk service

We also experimented with enabling PGO on certain orchestrators in a Catwalk service. This section covers our findings.

Enabling PGO on the test-golang-orch-tfs orchestrator

Attempt 1: Take pprof for 59 seconds

  • Just 1 pod running with a constant throughput of 420 QPS.
  • Load test started with a non-PGO image at 5:39 PM SGT.
  • Take pprof for 59 seconds.
  • Image with PGO enabled deployed at 5:49 PM SGT.

Observation: CPU usage increased after enabling PGO with pprof for 59 seconds.

We suspected that taking pprof for just 59 seconds may not be sufficient to collect accurate metrics. Hence, we extended the duration to 6 minutes in our second attempt.

Attempt 2 : Take pprof for 6 minutes

  • Just 1 pod running with a constant throughput of 420 QPS.
  • Deployed non PGO image with custom pprof server at 6:13 PM SGT.
  • pprof taken at 6:19 PM SGT for 6 minutes.
  • Image with PGO enabled deployed at 6:29 PM SGT.

Observation: CPU usage decreased after enabling PGO with pprof for 6 minutes.

CPU usage after enabling PGO on Catwalk
Container memory utilisation after enabling PGO on Catwalk

Based on this experiment, we found that the impact of PGO is around 5% but the effort involved to enable PGO outweighs the impact. To enable PGO on Catwalk, we would need to create Docker images for each application through CI pipelines.

Additionally, the Catwalk team would require a workaround to pass the pprof dump, which is not a straightforward task. Hence, we decided to put off the PGO application for Catwalk services.

Looking into PGO for monorepo services

From the information provided above, enabling PGO for a service requires the following support mechanisms:

  • A pprof service, which is currently facilitated through Jenkins.
  • A build process that supports PGO arguments and can attach or retrieve the pprof file.

For services that are hosted outside the monorepo and are self-managed, the effort required to experiment is minimal. However, for those within the monorepo, we will require support from the build process, which is currently unable to support this.

Conclusion/Learnings

Enabling PGO has proven to be highly beneficial for some of our services, particularly TalariaDB. By using PGO, we’ve observed a clear reduction in both CPU usage and memory usage to the tune of approximately 10% and 30% respectively. Furthermore, the volume used for storing queued ingestion events has been reduced by a significant 38%. These improvements definitely underline the benefits and potential of utilising PGO on services.

Interestingly, applying PGO resulted in an increased rate of ingested event count per CPU unit on TalariaDB, which demonstrates an improvement in the service’s efficiency.

Experiments with the Catwalk service have however shown that the effort involved to enable PGO might not always justify the improvements gained. In our case, a mere 5% improvement did not appear to be worth the work required to generate Docker images for each application via CI pipelines and create a solution to pass the pprof dump.

On the whole, it is evident that the applicability and benefits of enabling PGO can vary across different services. Factors such as application characteristics, current architecture, and available support mechanisms can influence when and where PGO optimisation is feasible and beneficial.

Moving forward, further improvements to go-build and the introduction of PGO support for monorepo services may drive greater adoption of PGO. In turn, this has the potential to deliver powerful system-wide gains that translate to faster response times, lower resource consumption, and improved user experiences. As always, the relevance and impact of adopting new technologies or techniques should be considered on a case-by-case basis against operational realities and strategic objectives.

Join us

Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 428 cities in eight countries.

Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!