A locally exploitable glibc vulnerability

Post Syndicated from corbet original https://lwn.net/Articles/960289/

Qualys has disclosed
a vulnerability in the GNU C Library that can be exploited by a local
attacker for root access. It was introduced in the 2.37 release, and also
backported to 2.36.

For example, we confirmed that Debian 12 and 13, Ubuntu 23.04 and
23.10, and Fedora 37 to 39 are vulnerable to this buffer
overflow. Furthermore, we successfully exploited an up-to-date,
default installation of Fedora 38 (on amd64): a Local Privilege
Escalation, from any unprivileged user to full root. Other
distributions are probably also exploitable.

Vulnerable systems with untrusted users should probably be updated in a
timely manner.

Възходът на крайната десница в Германия – как и защо?

Post Syndicated from Марина Лякова original https://www.toest.bg/vuzhodut-na-kraynata-desnitsa-v-germaniya-kak-i-zashto/

Възходът на крайната десница в Германия – как и защо?

През последната седмица в Германия се проведоха десетки демонстрации, в които се включиха над един милион души от цялата страна. Те бяха подкрепени от политици от целия партиен спектър, неправителствени организации, представители на църквите и на профсъюзите. Причината – на тайна сбирка на дяснорадикални лидери в Потсдам с участието на представители на „Алтернатива за Германия“ (АзГ) и на „Съюза на ценностите“ (фракция на Християндемократическия съюз) е обсъждана идеята за насилствена масова „ремиграция“. Зад тази сложно звучаща дума всъщност се прикрива идеята за повсеместно насилствено прогонване на чужденци и дори на германски граждани с чуждестранен произход от Германия. Срещата е разкрита и документирана от неправителствената организация „Коректив“

Днес в Германия живеят 23 млн. души с миграционен произход – мигранти или техни деца. Преобладаващата част от децата имат германско гражданство и са преминали през германската образователна система. Сред тези 23 милиона са и около 450 000 български граждани, както и българи с германско или с двойно гражданство. Въпреки че заплахата да се реализират такива планове граничи с абсурд, събитието предизвика мощен отпор. Като опасност се отчита дори самото говорене за насилствено екстрадиране. 

Добре известно е, че думите трасират пътя към действията

Точно тези думи, но и споменът за радикалните целенасочени убийства на девет чужденци в периода от 2000 до 2007 г. от организацията „Националсоциалистически ъндърграунд“ повдигна и множество въпроси. Защо в Германия, една държава, в която споменът за националсоциализма е травматичен и жив и която от края на Втората световна война е инвестирала милиони в борбата с различни екстремистки течения, радикалните нагласи продължават да процъфтяват? Те, разбира се, винаги са съществували. Но избуяването им днес e необичайно силно.

Според актуалните социологически данни АзГ би спечелила 22% от подадените гласове, ако изборите за германски парламент се състояха тази неделя. С това тя би станала втора политическа сила в страната, измествайки социалдемократи, либерали и зелени. В Източна Германия се очаква партията да бъде дори първа политическа сила, изпреварвайки Християндемократическия съюз.

Предвид предстоящите избори за Европейски парламент и за регионални парламенти в три източногермански провинции, тези данни нямат единствено прогнозна стойност, а са индикатор за сериозна промяна в обществените настроения, но и във властовия баланс. Това се потвърждава и от публичния годишен доклад на Службата за защита на Конституцията. Става дума за една по-широка тенденция, обхванала много страни от Западна Европа.

Но какви са причините?

Не е тайна, че от няколко години най-мощната европейска икономика – тази на Федералната република – се задъхва. COVID-19, Зелената сделка и войната в Украйна са основните фактори, които оскъпиха производството и съответно намалиха износа на германски стоки в чужбина. Трудностите с глобалните доставки на стоки и суровини – преди три години заради пандемията, а днес поради кризата в Червено море – допълнително усложняват ситуацията. 

И ако по време на COVID-19 германската държава все още можеше да си позволи чрез различни програми да облекчава товара върху производители и потребители, то последвалата руска инвазия в Украйна и самоограниченията, наложени от Партията на зелените (затваряне на атомните и на въглищните централи, скъпоструващите програми за повишаване на енергийната ефективност), както и повишената цена на газа вследствие вноса на LPG сякаш ограничиха възможностите за проактивни компенсиращи мерки.

На този фон много по-ясно изпъква един сериозен проблем, съществуващ от поне три десетилетия – нарастването на социалните неравенства в германското общество и намаляването на покупателната способност на средните и по-ниските социални прослойки. В множество свои изследвания и доклади известният социолог проф. Михаел Хартман показва как през последните 30 години вследствие на данъчни промени, които облагодетелстват предимно заможните и имотни граждани, социалното разслоение в Германия непрекъснато се увеличава. 

За да задържат повече фирми в Германия, правителствата след епохата на Хелмут Кол, независимо от идеологическата си обагреност, намаляват корпоративния данък или го поддържат на ниско ниво. С цел предотвратяване на емиграцията и съответно преместването на мястото на данъчната задълженост на заможните прослойки, се намалява и данък общ доход. С това драстично спадат приходите в хазната и съответно – възможността за инвестиции, но и за поддържане на по-слабо социално разслоение. А инфлацията, особено висока през последните четири години, на практика изяжда увеличенията на заплатите и фактически намалява покупателната способност.

Разбира се, средното ниво на доходите в Германия продължава да бъде едно от най-високите в света. Но също така с просто око се вижда, че инфраструктурата на страната е далеч от блясъка, с който се отличаваше в началото на 90-те. Държавните железници хронично закъсняват – едва 66% от всички влакове през 2023 г. са пристигнали по разписание. Много общини са потънали в дългове и нямат възможност да инвестират в нова инфраструктура. Места в детските градини в големите градове няма, а опашките пред социалните кухни стават по-дълги. 

Социалното разслоение е видимо и сред пенсионерите – средната брутна пенсия е около 1543 евро на месец. В България тези доходи все още изглеждат твърде високи, но съотнесени към разходите, необходими за живот в Германия, те са проблематично ниски. Настаняване в дом за възрастни хора струва между 3500 и 4700 евро на месец, като само част от тях се покриват от застраховката, а двустайно жилище в голям град само с много усилия може да бъде наето на цена под 1300 евро. На практика един пенсионер трудно може да си позволи тези разходи – на помощ идват децата или социалните служби. 

Парадоксът обаче е, че подкрепилите тайната сбирка на АзГ принадлежат към съвършено различна социална прослойка. Сред тях са материално добре обезпечени лица, например предприемачи и политици, които са далеч от социалната нищета. Очевидно не само икономическото развитие е причина за нарастване на социалното напрежение в германското общество. 

Демографията и миграцията 

Поради ниската раждаемост населението на Германия от десетилетия застарява. От години броят на жителите е устойчив – около 84 млн., но тази стабилност се дължи предимно на имиграцията, а не на естествения прираст. Вследствие на демографското развитие редица длъжности, които не са от предпочитаните, изпитват хроничен недостиг на кадри. Въпреки сравнително доброто заплащане в Германските държавни железници има около 20 000 незаети работни места. Отчаяно се търсят водопроводчици, медицински работници, лекари, продавачи в магазини, шофьори на автобуси и разбира се, ИТ специалисти. 

Може ли обаче миграцията да бъде решение? Вероятно само отчасти. Зависи от мигриращите, но и от възможността на едно общество да интегрира бързо и успешно новодошли хора. Германските политици години наред упорито отказваха да признаят, че Германия е имиграционна държава и че процесът, започнал с набирането на гастарбайтери в края на 50-те години, е необратим. Пред желаещите да имигрират и работят през 90-те години източноевропейци се поставяха непреодолими бюрократични и визови бариери. Един неизползван потенциал, намерил нов дом отвъд океана.

Вследствие на войните в Сирия, Афганистан и Ирак и на едно замислено като хуманитарно и временно решение през 2015 г. Германия отвори границите си за милиони, търсещи убежище. По силата на Дъблинското споразумение, но и на националното законодателство, германската държава нямаше законово задължение да приеме тези хора, но го направи в знак на солидарност с претоварените държави на външните граници на ЕС. А вероятно и с убеждението, че това отваряне би могло да бъде част от решението на демографския проблем. 

И макар днес, близо 10 години след събитията от лятото на 2015 г., 56% от имигриралите през 2015–2016 г. да имат постоянно работно място, интегрирането на тези хора на пазара на труда все още е свързано с препятствия. Твърде големи са психическите травми на бягащите от войни, насилие и глад, невинаги достатъчна е квалификацията им. В много случаи отнема години тези хора да бъдат обучени.

Пропагандата и въпросите за идентичността

Дяснорадикалните екстремисти се фокусират именно върху проблемите. Те трудно биха признали случаите на успешна интеграция. Факт е обаче, че днес Германия е дом на хиляди успешно интегрирани лекари, учители, икономисти, юристи, а дори и министри с миграционен произход. Да си припомним, че една от най-популярните ваксини срещу COVID-19 – „Пфайзер/Бионтех“ – беше създадена именно от деца на турски мигранти в германския град Майнц. 

Но АзГ са впили поглед в тъмната страна, а тя съществува – Германия има сериозен проблем с екстрадирането на неполучилите бежански статут. Хиляди, подали молби за убежище, подлежат на екстрадиция след отклоняване на молбите им. Въпреки че нямат право да живеят в страната, те не могат да бъдат изведени извън Германия – документите им за самоличност липсват и родните им държави отказват да ги припознаят като свои граждани. Единствената възможност е те да продължават да получават социални помощи и да чакат деня, в който екстрадицията би могла да се реализира. Левите пледират за легализацията и интеграцията им, а преобладаващата част от десните са за ограничаване на престоя им и настояват, че Германия по принцип има нужда от миграция, но не от такава миграция. 

Сериозно препятствие се оказват и културните различия. Чрез масирана интернет пропаганда, на която и германското общество е изложено, се акцентира върху сексуалните насилия, извършвани от мюсюлмани, и върху пренебрежителното им отношение към жените – сякаш в обществата, в които няма мигранти от ислямски страни, такива проблеми не съществуват. 

АзГ обаче не спира да повтаря, че Германия губи идентичността си с приемането на толкова много мигранти. И още изказвания в този дух, които могат да бъдат перифразирани най-общо така: „Ние трябва да бъдем себе си, да изгоним чуждите. Политиците ви предадоха, изберете нас – ние ще направим всичко по-различно и по-добре. Няма да плащаме за чужди войни и за скъп газ. Ако екстрадираме милиони мигранти, цените на наемите ще спаднат, защото търсенето на жилища ще намалее.“ 

В това популистко говорене се включва и крайнолевият „Съюз на Сара Вагенкнехт“, който се отцепи от левицата именно поради несъгласие с миграционната политика. И зад двете партии се прокрадва сянката на Москва. Въпреки това думите им звучат примамливо, особено на хора, които чувстват, че стандартът им на живот е спаднал, без да могат да си обяснят причините. 

На този фон е лесно за предизвикателствата да бъдат обвинявани традиционните партии и мигрантите. Ако обаче някой някога реализира абсурдната и престъпна идея да прогони милиони хора с миграционен произход от страната, в Германия няма да има кой да ти продаде дори хляб. 

А ние, българите?

Ако се съди по изборните резултати, мнозина гласуващи в Германия българи сякаш са склонни да се подведат по популистките тези на АзГ и на българските ѝ аналози. Те обаче забравят – ако някой ден бъдат въведени ограничения за мигрантите, ако „излезем“ от Европейския съюз, за да бъдем „суверенни“ и „независими“, както пледират АзГ, а и някои български партии, например „Възраждане“, то нашите сънародници, работещи в Германия, ще бъдат най-силно засегнати. На теория може би е примамливо да сочиш чужденците с пръст, докато ти самият не се окажеш част от тях.

Security updates for Wednesday

Post Syndicated from corbet original https://lwn.net/Articles/960248/

Security updates have been issued by Debian (bind9 and glibc), Fedora (ncurses), Gentoo (containerd, libaom, and xorg-server, xwayland), Mageia (python-pillow and zlib), Oracle (grub2 and tomcat), Red Hat (avahi, c-ares, container-tools:3.0, curl, firefox, frr, kernel, kernel-rt, kpatch-patch, libfastjson, libmicrohttpd, linux-firmware, oniguruma, openssh, perl-HTTP-Tiny, python-pip, python-urllib3, python3, rpm, samba, sqlite, tcpdump, thunderbird, tigervnc, and virt:rhel and virt-devel:rhel modules), SUSE (python-Pillow, slurm, slurm_20_02, slurm_20_11, slurm_22_05, slurm_23_02, and xen), and Ubuntu (libde265, linux-nvidia, mysql-8.0, openldap, pillow, postfix, and xorg-server, xwayland).

LangChain Support for Workers AI, Vectorize and D1

Post Syndicated from Ricky Robinett http://blog.cloudflare.com/author/ricky/ original https://blog.cloudflare.com/langchain-support-for-workers-ai-vectorize-and-d1


During Developer Week, we announced LangChain support for Cloudflare Workers. Langchain is an open-source framework that allows developers to create powerful AI workflows by combining different models, providers, and plugins using a declarative API — and it dovetails perfectly with Workers for creating full stack, AI-powered applications.

Since then, we’ve been working with the LangChain team on deeper integration of many tools across Cloudflare’s developer platform and are excited to share what we’ve been up to.

Today, we’re announcing five new key integrations with LangChain:

  1. Workers AI Chat Models: This allows you to use Workers AI text generation to power your chat model within your LangChain.js application.
  2. Workers AI Instruct Models: This allows you to use Workers AI models fine-tuned for instruct use-cases, such as Mistral and CodeLlama, inside your Langchain.js application.
  3. Text Embeddings Models: If you’re working with text embeddings, you can now use Workers AI text embeddings with LangChain.js.
  4. Vectorize Vector Store: When working with a Vector database and LangChain.js, you now have the option of using Vectorize, Cloudflare’s powerful vector database.
  5. Cloudflare D1-Backed Chat Memory: For longer-term persistence across chat sessions, you can swap out LangChain’s default in-memory chatHistory that backs chat memory classes like BufferMemory for a Cloudflare D1 instance.

With the addition of these five Cloudflare AI tools into LangChain, developers have powerful new primitives to integrate into new and existing AI applications. With LangChain’s expressive tooling for mixing and matching AI tools and models, you can use Vectorize, Cloudflare AI’s text embedding and generation models, and Cloudflare D1 to build a fully-featured AI application in just a few lines of code.

This is a full persistent chat app powered by an LLM in 10 lines of code–deployed to @Cloudflare Workers, powered by @LangChainAI and @Cloudflare D1.

You can even pass in a unique sessionId and have completely user/session-specific conversations 🤯 https://t.co/le9vbMZ7Mc pic.twitter.com/jngG3Z7NQ6

— Kristian Freeman (@kristianf_) September 20, 2023

Getting started with a Cloudflare + LangChain + Nuxt Multi-source Chatbot template

You can get started by using LangChain’s Cloudflare Chatbot template: https://github.com/langchain-ai/langchain-cloudflare-nuxt-template

This application shows how various pieces of Cloudflare Workers AI fit together and expands on the concept of retrieval augmented generation (RAG) to build a conversational retrieval system that can route between multiple data sources, choosing the one more relevant based on the incoming question. This method helps cut down on distraction from off-topic documents getting pulled in by a vector store’s similarity search, which could occur if only a single database were used.

The base version runs entirely on the Cloudflare Workers AI stack with the Llama 2-7B model. It uses:

  • A chat variant of Llama 2-7B run on Cloudflare Workers AI
  • A Cloudflare Workers AI embeddings model
  • Two different Cloudflare Vectorize DBs (though you could add more!)
  • Cloudflare Pages for hosting
  • LangChain.js for orchestration
  • Nuxt + Vue for the frontend

The two default data sources are a PDF detailing some of Cloudflare’s features and a blog post by Lilian Weng at OpenAI that talks about autonomous agents.

The bot will classify incoming questions as being about Cloudflare, AI, or neither, and draw on the corresponding data source for more targeted results. Everything is fully customizable – you can change the content of the ingested data, the models used, and all prompts!

And if you have access to the LangSmith beta, the app also has tracing set up so that you can easily see and debug each step in the application.

We can’t wait to see what you build

We can’t wait to see what you all build with LangChain and Cloudflare. Come tell us about it in discord or on our community forums.

CFPB’s Proposed Data Rules

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/01/cfpbs-proposed-data-rules.html

In October, the Consumer Financial Protection Bureau (CFPB) proposed a set of rules that if implemented would transform how financial institutions handle personal data about their customers. The rules put control of that data back in the hands of ordinary Americans, while at the same time undermining the data broker economy and increasing customer choice and competition. Beyond these economic effects, the rules have important data security benefits.

The CFPB’s rules align with a key security idea: the decoupling principle. By separating which companies see what parts of our data, and in what contexts, we can gain control over data about ourselves (improving privacy) and harden cloud infrastructure against hacks (improving security). Officials at the CFPB have described the new rules as an attempt to accelerate a shift toward “open banking,” and after an initial comment period on the new rules closed late last year, Rohit Chopra, the CFPB’s director, has said he would like to see the rule finalized by this fall.

Right now, uncountably many data brokers keep tabs on your buying habits. When you purchase something with a credit card, that transaction is shared with unknown third parties. When you get a car loan or a house mortgage, that information, along with your Social Security number and other sensitive data, is also shared with unknown third parties. You have no choice in the matter. The companies will freely tell you this in their disclaimers about personal information sharing: that you cannot opt-out of data sharing with “affiliate” companies. Since most of us can’t reasonably avoid getting a loan or using a credit card, we’re forced to share our data. Worse still, you don’t have a right to even see your data or vet it for accuracy, let alone limit its spread.

The CFPB’s simple and practical rules would fix this. The rules would ensure people can obtain their own financial data at no cost, control who it’s shared with and choose who they do business with in the financial industry. This would change the economics of consumer finance and the illicit data economy that exists today.

The best way for financial services firms to meet the CFPB’s rules would be to apply the decoupling principle broadly. Data is a toxic asset, and in the long run they’ll find that it’s better to not be sitting on a mountain of poorly secured financial data. Deleting the data is better for their users and reduces the chance they’ll incur expenses from a ransomware attack or breach settlement. As it stands, the collection and sale of consumer data is too lucrative for companies to say no to participating in the data broker economy, and the CFPB’s rules may help eliminate the incentive for companies to buy and sell these toxic assets. Moreover, in a free market for financial services, users will have the option to choose more responsible companies that also may be less expensive, thanks to savings from improved security.

Credit agencies and data brokers currently make money both from lenders requesting reports and from consumers requesting their data and seeking services that protect against data misuse. The CFPB’s new rules—and the technical changes necessary to comply with them—would eliminate many of those income streams. These companies have many roles, some of which we want and some we don’t, but as consumers we don’t have any choice in whether we participate in the buying and selling of our data. Giving people rights to their financial information would reduce the job of credit agencies to their core function: assessing risk of borrowers.

A free and properly regulated market for financial services also means choice and competition, something the industry is sorely in need of. Equifax, Transunion and Experian make up a longstanding oligopoly for credit reporting. Despite being responsible for one of the biggest data breaches of all time in 2017, the credit bureau Equifax is still around—illustrating that the oligopolistic nature of this market means that companies face few consequences for misbehavior.

On the banking side, the steady consolidation of the banking sector has resulted in a small number of very large banks holding most deposits and thus most financial data. Behind the scenes, a variety of financial data clearinghouses—companies most of us have never heard of—get breached all the time, losing our personal data to scammers, identity thieves and foreign governments.

The CFPB’s new rules would require institutions that deal with financial data to provide simple but essential functions to consumers that stand to deliver security benefits. This would include the use of application programming interfaces (APIs) for software, eliminating the barrier to interoperability presented by today’s baroque, non-standard and non-programmatic interfaces to access data. Each such interface would allow for interoperability and potential competition. The CFPB notes that some companies have tried to claim that their current systems provide security by being difficult to use. As security experts, we disagree: Such aging financial systems are notoriously insecure and simply rely upon security through obscurity.

Furthermore, greater standardization and openness in financial data with mechanisms for consumer privacy and control means fewer gatekeepers. The CFPB notes that a small number of data aggregators have emerged by virtue of the complexity and opaqueness of today’s systems. These aggregators provide little economic value to the country as a whole; they extract value from us all while hindering competition and dynamism. The few new entrants in this space have realized how valuable it is for them to present standard APIs for these systems while managing the ugly plumbing behind the scenes.

In addition, by eliminating the opacity of the current financial data ecosystem, the CFPB is able to add a new requirement of data traceability and certification: Companies can only use consumers’ data when absolutely necessary for providing a service the consumer wants. This would be another big win for consumer financial data privacy.

It might seem surprising that a set of rules designed to improve competition also improves security and privacy, but it shouldn’t. When companies can make business decisions without worrying about losing customers, security and privacy always suffer. Centralization of data also means centralization of control and economic power and a decline of competition.

If this rule is implemented it will represent an important, overdue step to improve competition, privacy and security. But there’s more that can and needs to be done. In time, we hope to see more regulatory frameworks that give consumers greater control of their data and increased adoption of the technology and architecture of decoupling to secure all of our personal data, wherever it may be.

This essay was written with Barath Raghavan, and was originally published in Cyberscoop.

ASRock Rack ALTRAD8UD-1L2T Review This is the Ampere Arm Motherboard You Want

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/asrock-rack-altrad8ud-1l2t-review-this-is-the-ampere-arm-motherboard-you-want/

In our ASRock Rack ALTRAD8UD-1L2T review, we see why this is the Ampere Arm motherboard that so many have been waiting for

The post ASRock Rack ALTRAD8UD-1L2T Review This is the Ampere Arm Motherboard You Want appeared first on ServeTheHome.

Rethinking Stream Processing: Data Exploration

Post Syndicated from Grab Tech original https://engineering.grab.com/rethinking-streaming-processing-data-exploration

Introduction

In this digital age, companies collect multitudes of data that enable the tracking of business metrics and performance. Over the years, data analytics tools for data storage and processing have evolved from the days of Excel sheets and macros to more advanced Map Reduce model tools like Spark, Hadoop, and Hive. This evolution has allowed companies, including Grab, to perform modern analytics on the data ingested into the Data Lake, empowering them to make better data-driven business decisions. This form of data will be referenced within this document as “Offline Data”.

With innovations in stream processing technology like Spark and Flink, there is now more interest in unlocking value from streaming data. This form of continuously-generated data in high volume will be referenced within this document as “Online Data”. In the context of Grab, the streaming data is usually materialised as Kafka topics (“Kafka Stream”) as the result of stream processing in its framework. This data is largely unexplored until they are eventually sunk into the Data Lake as Offline Data, part of the data journey (see Figure 1 below). This induces some data latency before the data can be used by data analysts to inform decisions.

Figure 1. Simplified data journey for Offline Data vs. Online Data, from data generation to data analysis.

As seen in Figure 1 above, the Time to Value (“TTV”) of Online Data is shorter as compared to that of Offline Data in a simplified data journey from data generation to data analysis where complexities of data cleaning and transformation have been removed. This is because the role of the data analyst or data scientist (“Data End User”) has been enabled forward to the Kafka stage for Online Data instead of the Data Lake stage for Offline Data. We recognise that allowing earlier data exploration on Online Data allows Data End Users to build context around the data inputs they are using in an earlier stage. This can help them process Offline Data more meaningfully in subsequent stages. We are interested in opening up the possibility for Data End Users to at least explore the Online Data before they architect a full solution to clean and/or process the data directly or more efficiently post-ingestion into the Data Lake. After their data exploration, the users would have more information to decide whether to spin up a stream processing pipeline for Online Data, or to continue processing Offline Data with their current solution, but with a more refined understanding and logic strategy against their source data inputs. However, of course, in this blog, we acknowledge that not all analysis on Online Data could be done in this manner.

Problem statement

Online Data is underutilised within Grab mainly because of, among other reasons, difficulty in performing data exploration on data that is not yet properly stored in the Data Lake.

For the purpose of this blog post, we will focus only on the problem of exploration of Online Data because this problem is the precursor to allowing us to fully democratise such data.

The problem of data exploration manifests itself when Data End Users need to find the proper data inputs to base and develop their data models. These users would then often need to parse through a multitude of documentation and connect with multiple upstream data producers, to know the range of data signals that are currently available and understand what each data signal is trying to measure.

Given the ephemeral nature of Online Data, this implies that the lack of correct tool adoption to seamlessly perform quick tests with application logic on Online Data disincentivises the Data End Users to work on these Online Data. Testing such logic on Offline Data is generally much easier since iteration testing on the exact same dataset is possible.

This difficulty in performing data exploration including ad hoc queries on Online Data has therefore made development of stream processing applications hard for Data End Users, creating headwinds in Grab’s aim to evolve from making data-driven business decisions to also making data-driven operation decisions. Doing both would allow Grab to react much quicker to abrupt changes in its business landscape.

Adoption of Zeppelin notebook environment

To address the difficulty in performing data exploration on Online Data, we have adopted Apache Zeppelin, a web-based notebook that enables data-drive, interactive data analytics with the support of multiple interpreters to work with various data processing backends e.g. Spark, Flink. The full solution of the adopted Zeppelin notebook environment is enabled seamlessly within our internal data-streaming platform, through its control plane. If you are interested, you may check out our previous blog post titled An elegant platform for more details on the abovementioned streaming platform and its control plane.

Figure 2. Zeppelin login page via web-based notebook environment.

As seen from Figure 2 above, after successful creation of the Zeppelin cluster, users can log in with their generated credentials delivered to them via the integrated instant messenger, and start using the notebook environment.

Figure 3. Zeppelin programme flow in the notebook environment.

Figure 3 above explains the Zeppelin notebook programme flow as follows:

  • The users enter their queries into the notebook session and run querying statements interactively with the established web-based notebook session.
  • The queries are passed to the Flink interpreter within the cluster to generate the Flink job as a Jar file, to be then submitted to a Flink session cluster.
  • When the Flink session cluster job manager receives the job, it would spin up the corresponding Flink task managers (workers) to run the application and retrieve the results.
  • The query results would then be piped back to the notebook session, to be displayed back to the user on the notebook session.

Data query and visualisation

Figure 4. Example of simple select query of data on Kafka.
Note: All variable names, schema, values, and other details used in this article are only created for use as examples.

Flink has a planned roadmap to create a unified streaming language for both stream processing and data analytics. In line with the roadmap, we have based our Zeppelin solution on supporting Structured Query Language (“SQL”) as the query language of choice as seen in Figure 4 above. Data End Users can now write queries in SQL, which is a language that they are comfortable with, and perform adequate data exploration.

As discussed in this section, data exploration on streaming data at the Kafka stage by adopting the right tool enables Data End Users to seamlessly have visibility to quickly understand the current schema of a Kafka topic (explained more in the next section. This kind of data exploration also enables Data End Users to understand the type of data the Kafka topic represents, such as the ability to determine if a country code data field is in alpha-2 or alpha-3 format while the data is still part of streaming data. This might seem inconsequential and immediately identifiable even in Offline Data, but by enabling data exploration at an earlier stage in the data journey for Online Data, Data End Users have the opportunity to react much more quickly. For example, a change of expected country code format from the data producer would usually lead to errors in the downstream joins or other stream processing pipelines due to incompatible parsing or filtering of the modified country codes. Instead of waiting for the data to be ingested to Offline Data, users can investigate the issue with Online Data retrieved from Kafka.

Figure 5. Simple visualisation of queried data on Zeppelin’s notebook environment.
Note: All variable names, schema, values, and other details used in this article are only created for use as examples.

Besides query features, Zeppelin notebook provides simple visualisation and analytics of the off-the-shelf data as presented above in Figure 5. Furthermore, users are now able to perform interactive ad hoc queries on Online Data. These queries will eventually become much more advanced and/or effective SQL queries to be deployed as a streaming pipeline later on in the data journey. This reduces the inertia in setting up a separate development environment or learning other programming languages like Java or Scala during the development of streaming pipelines. With Zeppelin’s notebook environment, our Data End Users are more empowered to quickly derive value from Online Data.

Need for a more dynamic table schema derivation process

For the Data End Users performing data exploration on Online Data, we see a need for these users to derive the Data Definition Language (“DDL”) associated with a Kafka stream at an earlier stage of the data journey. Within Grab, even though Kafka streams are transmitted in Protobuf format and are thus structured, both the schema and the corresponding DDL changes are added over time as new fields. Typically, the data producer (service owners) and the data engineers responsible for the data ingestion pipeline coordinate to perform such updates. Since the Data End Users are not involved in such schema update processes nor do they directly interact with the data producers, many of them find the discovery of changes in the current Kafka stream schema an issue. Granted that this is an issue our metadata platform is actively solving using Datahub, we hope to also solve the challenge by being able to derive the DDL more dynamically within the tooling, for data exploration on Online Data to reduce friction.

Figure 6. Common functions to derive DDL of a Kafka Stream in SQL.
Note: All variable names, schema, values, and other details used in this article are only created for use as examples.

As seen from Figure 6 above, we have an integrated tooling for Data End Users to derive the DDL associated with a Kafka stream using SQL language. A Kafka stream in Grab’s context is a logical concept describing a Kafka topic, associating it with its metadata like Kafka bootstrap servers and associated Java class created by Protoc. This tool maps the Protobuf schema definition of a Kafka stream to a DDL, allowing it to be expressed and used in SQL language. This reduces the manual effort involved in creating these table definitions from scratch based on the associated Protobuf schema. Users can now derive the DDL associated with a Kafka stream more easily.

Mitigating risks arising from data exploration on Online Data – data access authorisation/audit

While we rethink stream processing and are open to options that enable data exploration on Online Data as mentioned above, we realised that new security requirements related to data access authorisation and maintaining proper audit trail have emerged. Even with Personally Identifiable Information (PII) obfuscation enforcement by our streaming pipeline, it means we need to implement stricter guardrails in place along with audit trails to ensure users only have access to what they are allowed to, and this access can be removed in a break-glass scenario. If you are interested, you may check out our previous blog post titled PII masking for privacy-grade machine learning for more details about how we enforce PII masking on machine learning data streaming pipelines.

To enable data access authorisation, we utilised Strimzi, the operator of running Kafka on Kubernetes. We integrated Strimzi’s Open Policy Agent (OPA) with Kafka to define policies that authorise specific read-only user access to specific Kafka Topics. The identification of users is done via mutualTLS (mTLS) connection with our Kafka clusters, where their user details are part of the SSL certificate details used for authentication.

With these tools in place, each user’s request to explore Online Data would be properly logged, and each data access can be controlled by an OPA policy managed by a central team.

If you are interested, you may check out our previous post Zero trust with Kafka where we discussed our efforts to continue strengthening the security of our data-streaming platform.

Impact

With the proliferation of our data-streaming platform, we expect to see improvements in the way our data becomes gradually democratised. We have already been receiving use cases from the Data End Users who are interested in validating a chain of events on Online Data, i.e. retrieving information of all events associated with a particular booking, which is not currently something that can be done easily.

More importantly, the tools in place for data exploration on Online Data form the foundation required for us to embark on our next step of the stream processing journey. This foundation makes the development and validation of the stream processing logic much quicker. This occurs when ad hoc queries in a notebook environment are possible, removing the need for local developer environment setups and the need to go through the whole pipeline deployment process for eventual validation of the developed logic. We believe that this would prove to reduce our lead time in creating stream processing pipelines significantly.

What’s next?

Our next step is to rethink further how our stream processing pipelines are defined and start to provision SQL as the unified streaming language of our pipelines. This helps facilitate better discussion between upstream data producers, data engineers, and Data End Users, since SQL is the common language among these stakeholders.

We will also explore handling schema discovery in a more controlled manner by utilising a Hive catalogue to store our Kafka table definitions. This removes the need for users to retrieve and run the table DDL statement for every session, making the data exploration experience even more seamless.

References

[1] Apache Zeppelin | Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala, Python, R and more.

[2] An elegant platform | Grab engineering blog.

[3] Apache Flink | Roadmap on Unified SQL Platform.

[4] ISO | ISO 3166 Country Codes.

[5] Protobuf (Protocol Buffers)| Language-neutral, platform-neutral extensible mechanisms for serializing structured data.

[6] Datahub | Extensible metadata platform that enables data discovery, data observability and federated governance to help tame the complexity of your data ecosystem.

[7] Protoc | Protocol buffer compiler installation.

[8] PII masking for privacy-grade machine learning | Grab engineering blog.

[9] Zero trust with Kafka | Grab engineering blog.

[10] Open Policy Agent (OPA) | Policy-based control for cloud native environments.

[11] Strimzi | Using Open Policy Agent with Strimzi and Apache Kafka.

[12] Confluent Documentation | Configure mTLS authentication and RBAC for kafka brokers.

Join us

Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 428 cities in eight countries.

Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

[$] Looking ahead to Emacs 30

Post Syndicated from jake original https://lwn.net/Articles/959931/

EmacsConf 2023 was, like its
recent predecessors, an online conference with lots of talks about various
aspects of the Emacs
editor
—though, of course, it is way more than just an editor. Last year’s
edition was held in early December. One of the
talks that looked interesting was on Emacs
development
, which was given live by John Wiegley. In it, he briefly
described some
of the biggest features coming in Emacs 30, which is the next major version
coming for the tool.

New chat experience for AWS Glue using natural language – Amazon Q data integration in AWS Glue (Preview)

Post Syndicated from Irshad Buchh original https://aws.amazon.com/blogs/aws/new-chat-experience-for-aws-glue-using-natural-language-amazon-q-data-integration-in-aws-glue-preview/

Today we’re previewing a new chat experience for AWS Glue that will let you use natural language to author and troubleshoot data integration jobs.

Amazon Q data integration in AWS Glue will reduce the time and effort you need to learn, build, and run data integration jobs using AWS Glue data integration engines. You can author jobs, troubleshoot issues, and get instant answers to questions about AWS Glue and anything related to data integration. The chat experience is powered by Amazon Bedrock.

You can describe your data integration workload and Amazon Q will generate a complete ETL script. You can troubleshoot your jobs by asking Amazon Q to explain errors and propose solutions. Amazon Q provides detailed guidance throughout the entire data integration workflow. Amazon Q helps you learn and build data integration jobs using AWS Glue. Amazon Q can help you connect to common AWS sources such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, and Amazon DynamoDB.

Let me show you some capabilities of Amazon Q data integration in AWS Glue.

1. Conversational Q&A capability

To start using this feature, I can select the Amazon Q icon on the right-hand side of the AWS Management Console.

AmazonQ-1

For example, I can ask, “What is AWS Glue,” and Amazon Q provides concise explanations along with references I can use to follow up on my questions and validate the guidance.

AmazonQ-2

With Amazon Q, I can elaborate on my use cases in more detail to provide context. For example, I can ask Amazon Q, “How do I create an AWS Glue job?”

AmazonQ-2

Next let me ask Amazon Q, “How do I optimize memory management in my AWS Glue job?”

AmazonQ-41

2. AWS Glue job creation

To use this feature, I can tell Amazon Q, “Write a Glue ETL job that reads from Redshift, drops null fields, and writes to S3 as parquet files.”

AmazonQ-Parq

I can copy code into the script editor or notebook with a simple click on the Copy button. I can also tell Amazon Q, “Help me with a Glue job that reads my DynamoDB table, maps the fields, and writes the results to Amazon S3 in Parquet format”.

AmazonQ-DynamoDB

Get started with Amazon Q today
With Amazon Q, you have an artificial intelligence (AI) expert by your side to answer questions, write code faster, troubleshoot issues, optimize workloads, and even help you code new features. These capabilities simplify every phase of building applications on AWS. Amazon Q data integration in AWS Glue is available in every region where Amazon Q is supported. To learn more, see the Amazon Q pricing page.

Learn more

— Irshad

Build real-time applications with Amazon EventBridge and AWS AppSync

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/build-real-time-applications-with-amazon-eventbridge-and-aws-appsync/

This post is written by Josh Kahn, Tech Leader, Serverless.

Amazon EventBridge now supports publishing events to AWS AppSync GraphQL APIs as native targets. The new integration enables builders to publish events easily to a wider variety of consumers and simplifies updating clients with near real-time data. You can use EventBridge and AWS AppSync to build resilient, subscription-based event-driven architectures across consumers.

To illustrate using EventBridge with AWS AppSync, consider a simplified airport operations scenario. In this example, airlines publish flight events (for example, boarding, push back, gate changes, and delays) to a service that maintains flight status on in-airport displays. Airlines also publish events that are useful for other entities at the airport, such as baggage handlers and maintenance, but not to passengers. This depicts a conceptual view of the system:

Conceptual view of the system

Passengers want the in-airport displays to be up-to-date and accurate. There are a number of ways to design the display application so that data remains up-to-date. Broadly, these include the application polling some API or the application subscribing to data changes.

Subscriptions for this scenario are better as the data changes are small and incremental relative to the large amount of information displayed. In a delay, for example, the display updates the status and departure time but no other details of a single flight among a larger list of flight information.

Flight board

AWS AppSync can enable clients to listen for real-time data changes through the use of GraphQL subscriptions. These are implemented using a WebSocket connection between the client and the AWS AppSync service. The display application client invokes the GraphQL subscription operation to establish a secure connection. AWS AppSync will automatically push data changes (or mutations) via the GraphQL API to subscribers using that connection.

Previously, builders could use EventBridge API Destinations to wire events published and routed through EventBridge to AWS AppSync, as described in an earlier blog post, and available in Serverless Land patterns (API Key, OAuth). The approach is useful for dealing with “out-of-band” updates in which data changes outside of an AWS AppSync mutation. Out-of-band updates generally require a NONE data source in AWS AppSync to notify subscribers of changes, as described in the AWS re:Post Knowledge Center. The addition of AWS AppSync as a target for EventBridge simplifies these use cases as you can now trigger a mutation in response to an event without additional code.

Airport Operations Events

Expanding the scenario, airport operations events look like this:

{
  "flightNum": 123,
  "carrierCode": "JK",
  "date": "2024-01-25",
  "event": "FlightDelayed",
  "message": "Delayed 15 minutes, late aircraft",
  "info": "{ \"newDepTime\": \"2024-01-25T13:15:00Z\", \"delayMinutes\": 15 }"
}

The event field identifies the type of event and if it is relevant to passengers. The event details provide further information about the event, which varies based on the type of event. The airport publishes a variety of events but the airport displays only need a subset of those changes.

AWS AppSync GraphQL APIs start with a GraphQL schema that defines the types, fields, and operations available in that API. AWS AppSync documentation provides an overview of schema and other GraphQL essentials. The partial GraphQL schema for the airport scenario is as follows:


type DelayEventInfo implements EventInfo {
	message: String
	delayMinutes: Int
	newDepTime: AWSDateTime
}

interface EventInfo {
	message: String
}

enum StatusEvent {
	FlightArrived
	FlightBoarding
	FlightCancelled
	FlightDelayed
	FlightGateChanged
	FlightLanded
	FlightPushBack
	FlightTookOff
}

type StatusUpdate {
	num: Int!
	carrier: String!
	date: AWSDate!
	event: StatusEvent!
	info: EventInfo
}

input StatusUpdateInput {
	num: Int!
	carrier: String!
	date: AWSDate!
	event: StatusEvent!
	message: String
	extra: AWSJSON
}

type Mutation {
	updateFlightStatus(input: StatusUpdateInput!): StatusUpdate!
}

type Query {
	listStatusUpdates(by: String): [StatusUpdate]
}

type Subscription {
	onFlightStatusUpdate(date: AWSDate, carrier: String): StatusUpdate
		@aws_subscribe(mutations: ["updateFlightStatus"])
}

schema {
	query: Query
	mutation: Mutation
	subscription: Subscription
}

Connect EventBridge to AWS AppSync

EventBridge allows you to filter, transform, and route events to a number of targets. The airport display service only needs events that directly impact passengers. You can define a rule in EventBridge that routes only those events (included in the preceding GraphQL schema) to the AWS AppSync target. Other events are routed elsewhere, as defined by other rules, or dropped. Details on creating EventBridge rules and the event matching pattern format can be found in EventBridge documentation.

The previous flight delayed event would be delivered using EventBridge as follows:

{
  "id": "b051312994104931b0980d1ad1c5340f",
  "detail-type": "Operations: Flight delayed",
  "source": "airport-operations",
  "time": "2024-01-25T16:58:37Z",
  "detail": {
    "flightNum": 123,
    "carrierCode": "JK",
    "date": "2024-01-25",
    "event": "FlightDelayed",
    "message": "Delayed 15 minutes, late aircraft",
    "info": "{ \"newDepTime\": \"2024-01-25T13:15:00Z\", \"delayMinutes\": 15 }"
  }
}

In this scenario, there is a specific list of events of interest, but EventBridge provides a flexible set of operations to match patterns, inspect arrays, and filter by content using prefix, numerical, or other matching. Some organizations will also allow subscribers to define their own rules on an EventBridge event bus, allowing targets to subscribe to events via self-service.

The following event pattern matches on the events needed for the airport display service:

{
  "source": [ "airport-operations" ],
  "detail": {
    "event": [ "FlightArrived", "FlightBoarding", "FlightCancelled", ... ]
  }
}

To create a new EventBridge rule, you can use the AWS Management Console or infrastructure as code. You can find the CloudFormation definition for the completed rule, with the AWS AppSync target, later in this post.

Console view

Create the AWS AppSync target

Now that EventBridge is configured to route selected events, define AWS AppSync as the target for the rule. The AWS AppSync API must support IAM authorization to be used as an EventBridge target. AWS AppSync supports multiple authorization types on a single GraphQL type, so you can also use OpenID Connect, Amazon Cognito User Pools, or other authorization methods as needed.

To configure AWS AppSync as an EventBridge target, define the target using the AWS Management Console or infrastructure as code. In the console, select the Target Type as “AWS Service” and Target as “AppSync.” Select your API. EventBridge parses the GraphQL schema and allows you to select the mutation to invoke when the rule is triggered.

When using the AWS Management Console, EventBridge will also configure the necessary AWS IAM role to invoke the selected mutation. Remember to create and associate a role with an appropriate trust policy when configuring with IaC.

EventBridge target types

EventBridge supports input transformation to customize the contents of an event before passing the information as input to the target. Configure the input transformer to extract needed values from the event using JSON path and a template in the input format expected by the AWS AppSync API. EventBridge provides a handy utility in the Console to pass and test the output of a sample event.

Target input transformer

Finally, configure the selection set to include the response from the AWS AppSync API. These are the fields that will be returned to EventBridge when the mutation is invoked. While the result returned to EventBridge is not overly useful (aside from troubleshooting), the mutation selection set will also determine the fields available to subscribers to the onFlightStatusUpdate subscription.

Configuring the selection set

Define the EventBridge to AWS AppSync rule in CloudFormation

Infrastructure as code templates, including AWS CloudFormation and AWS CDK, are useful for codifying infrastructure definitions to deploy across Regions and accounts. While you can write CloudFormation by hand, EventBridge provides a useful CloudFormation export in the AWS Management Console. You can use this feature to export the definition for a defined rule.

Export definition

This is the CloudFormation for the previous configured rule and AWS AppSync target. This snippet includes both the rule definition and the target configuration.

PassengerEventsToDisplayServiceRule:
    Type: AWS::Events::Rule
    Properties:
      Description: Route passenger related events to the display service endpoint
      EventBusName: eb-to-appsync
      EventPattern:
        source:
          - airport-operations
        detail:
          event:
            - FlightArrived
            - FlightBoarding
            - FlightCancelled
            - FlightDelayed
            - FlightGateChanged
            - FlightLanded
            - FlightPushBack
            - FlightTookOff
      Name: passenger-events-to-display-service
      State: ENABLED
      Targets:
        - Id: 12344535353263463
          Arn: <AppSync API GraphQL API ARN>
          RoleArn: <EventBridge Role ARN (defined elsewhere)>
          InputTransformer:
            InputPathsMap:
              carrier: $.detail.carrierCode
              date: $.detail.date
              event: $.detail.event
              extra: $.detail.info
              message: $.detail.message
              num: $.detail.flightNum
            InputTemplate: |-
              {
                "input": {
                  "num": <num>,
                  "carrier": <carrier>,
                  "date": <date>,
                  "event": <event>,
                  "message": "<message>",
                  "extra": <extra>
                }
              }
          AppSyncParameters:
            GraphQLOperation: >-
              mutation
              UpdateFlightStatus($input:StatusUpdateInput!){updateFlightStatus(input:$input){
                event
                date
                carrier
                num
                info {
                  __typename
                  ... on DelayEventInfo {
                    message
                    delayMinutes
                    newDepTime
                  }
                }
              }}

The ARN of the AWS AppSync API follows the form arn:aws:appsync:<AWS_REGION>:<ACCOUNT_ID>:endpoints/graphql-api/<GRAPHQL_ENDPOINT_ID>. The ARN is available in CloudFormation (see GraphQLEndpointArn return value) or can be created using the identifier found in the AWS AppSync GraphQL endpoint. The ARN included in the EventBridge execution role policy is the AWS AppSync API ARN (a different ARN).

The AppSyncParameters field includes the GraphQL operation for EventBridge to invoke on the AWS AppSync API. This must be well formatted and match the GraphQL schema. Include any fields that must be available to subscribers in the selection set.

Testing subscriptions

AWS AppSync is now configured as a target for the EventBridge rule. The real-life display application would use a GraphQL library, such as AWS Amplify, to subscribe to real-time data changes. The AWS Management Console provides a useful utility to test. Navigate to the AWS AppSync console and select Queries in the menu for your API. Enter the following query and choose Run to subscribe for data changes:

subscription MySubscription {
  onFlightStatusUpdate {
    carrier
    date
    event
    num
    info {
      __typename
      … on DelayEventInfo {
        message
        delayMinutes
        newDepTime
      }
    }
  }
}

In a separate browser tab, navigate to the EventBridge console, and choose Send events. On the Send events page, select the required event bus and set the Event source to “airport-operations.” Then enter a detail type of your choice. Finally, paste the following as the Event detail, then choose Send.

{
  "id": "b051312994104931b0980d1ad1c5340f",
  "detail-type": "Operations: Flight delayed",
  "source": "airport-operations",
  "time": "2024-01-25T16:58:37Z",
  "detail": {
    "flightNum": 123,
    "carrierCode": "JK",
    "date": "2024-01-25",
    "event": "FlightDelayed",
    "message": "Delayed 15 minutes, late aircraft",
    "info": "{ \"newDepTime\": \"2024-01-25T13:15:00Z\", \"delayMinutes\": 15 }"
  }
}

Return to the AWS AppSync tab in your browser to see the changed data in the result pane:

Result pane

Conclusion

Directly invoking AWS AppSync GraphQL API targets from EventBridge simplifies and streamlines integration between these two services, ideal for notifying a variety of subscribers of data changes in event-driven workloads. You can also take advantage of other features available from the two services. For example, use AWS AppSync enhanced subscription filtering to update only airport displays in the terminal in which they are located.

To learn more about serverless, visit Serverless Land for a wide array of reusable patterns, tutorials, and learning materials. Newly added to the pattern library is an EventBridge to AWS AppSync pattern similar to the one described in this post. Visit EventBridge documentation for more details.

For more serverless learning resources, visit Serverless Land.

The collective thoughts of the interwebz