Tor Browser 13.5 released

Post Syndicated from corbet original https://lwn.net/Articles/979177/

Version
13.5
of the privacy-focused Tor browser has been released.

Regular readers of our release posts will know that for the past
two years we’ve been gradually increasing our capacity to not only
maintain, but bring tangible improvements to Tor Browser for
Android. In that respect, Tor Browser 13.5 feels like a milestone:
in addition to the dozens of bug fixes and minor improvements noted
in the changelog below, this release features major changes to
Android’s connection experience in preparation for the future
addition of Connection Assist, including full access to Settings
before connecting and a new, permanent home for Tor logs.

The release also features desktop user-interface improvements and enhanced
fingerprinting protection.

[$] A capability set for user namespaces

Post Syndicated from corbet original https://lwn.net/Articles/978846/

User namespaces in Linux create an
environment in which all privileges are granted, but their effect is
contained within the namespace; they have become an important tool for the
implementation of containers. They have also become a significant source
of worries for people who do not like the increased attack surface they
create for the kernel. Various attempts have been made to restrict that
attack surface over the years; the latest is user namespace
capabilities
, posted by Jonathan Calmels.

[$] Updates to pahole

Post Syndicated from daroc original https://lwn.net/Articles/978727/

Arnaldo Carvalho de Melo spoke at the 2024
Linux Storage,
Filesystem, Memory Management, and BPF Summit

about his work on

Poke-a-hole
(pahole),
a program that has expanded greatly over the years, but which was relevant to the
BPF track because it produces BPF Type Format (BTF) information from DWARF
debugging information. He covered some small changes to the program, and then
went into detail about the new support for data-type profiling. His
slides include
several examples.

Stream multi-tenant data with Amazon MSK

Post Syndicated from Emanuele Levi original https://aws.amazon.com/blogs/big-data/stream-multi-tenant-data-with-amazon-msk/

Real-time data streaming has become prominent in today’s world of instantaneous digital experiences. Modern software as a service (SaaS) applications across all industries rely more and more on continuously generated data from different data sources such as web and mobile applications, Internet of Things (IoT) devices, social media platforms, and ecommerce sites. Processing these data streams in real time is key to delivering responsive and personalized solutions, and maximizes the value of data by processing it as close to the event time as possible.

AWS helps SaaS vendors by providing the building blocks needed to implement a streaming application with Amazon Kinesis Data Streams and Amazon Managed Streaming for Apache Kafka (Amazon MSK), and real-time processing applications with Amazon Managed Service for Apache Flink.

In this post, we look at implementation patterns a SaaS vendor can adopt when using a streaming platform as a means of integration between internal components, where streaming data is not directly exposed to third parties. In particular, we focus on Amazon MSK.

Streaming multi-tenancy patterns

When building streaming applications, you should take the following dimensions into account:

  • Data partitioning – Event streaming and storage needs to be isolated at the appropriate level, physical or logical, based on tenant ownership
  • Performance fairness – The performance coupling of applications processing streaming data for different tenants must be controlled and limited
  • Tenant isolation – A solid authorization strategy needs to be put in place to make sure tenants can access only their data

Underpinning all interactions with a multi-tenant system is the concept of SaaS identity. For more information, refer to SaaS Architecture Fundamentals.

SaaS deployment models

Tenant isolation is not optional for SaaS providers, and tenant isolation approaches will differ depending on your deployment model. The model is influenced by business requirements, and the models are not mutually exclusive. Trade-offs must be weighed across individual services to achieve a proper balance of isolation, complexity, and cost. There is no universal solution, and a SaaS vendor needs to carefully weigh their business and customer needs against three isolation strategies: silo, pool and bridge (or combinations thereof).

In the following sections, we explore these deployment models across data isolation, performance fairness, and tenant isolation dimensions.

Silo model

The silo model represents the highest level of data segregation, but also the highest running cost. Having a dedicated MSK cluster per tenant increases the risk of overprovisioning and requires duplication of management and monitoring tooling.

Having a dedicated MSK cluster per tenant makes sure tenant data partitioning occurs at the disk level when using an Amazon MSK Provisioned model. Both Amazon MSK Provisioned and Serverless clusters support server-side encryption at rest. Amazon MSK Provisioned further allows you to use a customer managed AWS Key Management Service (AWS KMS) key (see Amazon MSK encryption).

In a silo model, Kafka ACL and quotas is not strictly required unless your business requirements require them. Performance fairness is guaranteed because only a single tenant will be using the resources of the entire MSK cluster and are dedicated to applications producing and consuming events of a single tenant. This means spikes of traffic on a specific tenant can’t impact other tenants, and there is no risk of cross-tenant data access. As a drawback, having a provisioned cluster per tenant requires a right-sizing exercise per tenant, with a higher risk of overprovisioning than in the pool or bridge models.

You can implement tenant isolation the MSK cluster level with AWS Identity and Access Management (IAM) policies, creating per-cluster credentials, depending on the authentication scheme in use.

Pool model

The pool model is the simplest model where tenants share resources. A single MSK cluster is used for all tenants with data split into topics based on the event type (for example, all events related to orders go to the topic orders), and all tenant’s events are sent to the same topic. The following diagram illustrates this architecture.

Image showing a single streaming topic with multiple producers and consumers

This model maximizes operational simplicity, but reduces the tenant isolation options available because the SaaS provider won’t be able to differentiate per-tenant operational parameters and all responsibilities of isolation are delegated to the applications producing and consuming data from Kafka. The pool model also doesn’t provide any mechanism of physical data partitioning, nor performance fairness. A SaaS provider with these requirements should consider either a bridge or silo model. If you don’t have requirements to account for parameters such as per-tenant encryption keys or tenant-specific data operations, a pool model offers reduced complexity and can be a viable option. Let’s dig deeper into the trade-offs.

A common strategy to implement consumer isolation is to identify the tenant within each event using a tenant ID. The options available with Kafka are passing the tenant ID either as event metadata (header) or part of the payload itself as an explicit field. With this approach, the tenant ID will be used as a standardized field across all applications within both the message payload and the event header. This approach can reduce the risk of semantic divergence when components process and forward messages because event headers are handled differently by different processing frameworks and could be stripped when forwarded. Conversely, the event body is often forwarded as a single object and no contained information is lost unless the event is explicitly transformed. Including the tenant ID in the event header as well may simplify the implementation of services allowing you to specify tenants that need to be recovered or migrated without requiring the provider to deserialize the message payload to filter by tenant.

When specifying the tenant ID using either a header or as a field in the event, consumer applications will not be able to selectively subscribe to the events of a specific tenant. With Kafka, a consumer subscribes to a topic and receives all events sent to that topic of all tenants. Only after receiving an event will the consumer will be able to inspect the tenant ID to filter the tenant of interest, making access segregation virtually impossible. This means sensitive data must be encrypted to make sure a tenant can’t read another tenant’s data when viewing these events. In Kafka, server-side encryption can only be set at the cluster level, where all tenants sharing a cluster will share the same server-side encryption key.

In Kafka, data retention can only be set on the topic. In the pool model, events belonging to all tenants are sent to the same topic, so tenant-specific operations like deleting all data for a tenant will not be possible. The immutable, append-only nature of Kafka only allows an entire topic to be deleted, not selective events belonging to a specific tenant. If specific customer data in the stream requires the right to be forgotten, such as for GDPR, a pool model will not work for that data and silo should be considered for that specific data stream.

Bridge model

In the bridge model, a single Kafka cluster is used across all tenants, but events from different tenants are segregated into different topics. With this model, there is a topic for each group of related events per tenant. You can simplify operations by adopting a topic naming convention such as including the tenant ID in the topic name. This will practically create a namespace per tenant, and also allows different administrators to manage different tenants, setting permissions with a prefix ACL, and avoiding naming clashes (for example, events related to orders for tenant 1 go to tenant1.orders and orders of tenant 2 go to tenant2.orders). The following diagram illustrates this architecture.

Image showing multiple producers and consumers each publishing to a stream-per-tenant

With the bridge model, server-side encryption using a per-tenant key is not possible. Data from different tenants is stored in the same MSK cluster, and server-side encryption keys can be specified per cluster only. For the same reason, data segregation can only be achieved at file level, because separate topics are stored in separate files. Amazon MSK stores all topics within the same Amazon Elastic Block Store (Amazon EBS) volume.

The bridge model offers per-tenant customization, such as retention policy or max message size, because Kafka allows you to set these parameters per topic. The bridge model also simplifies segregating and decoupling event processing per tenant, allowing a stronger isolation between separate applications that process data of separate tenants.

To summarize, the bridge model offers the following capabilities:

  • Tenant processing segregation – A consumer application can selectively subscribe to the topics belonging to specific tenants and only receive events for those tenants. A SaaS provider will be able to delete data for specific tenants, selectively deleting the topics belonging to that tenant.
  • Selective scaling of the processing – With Kafka, the maximum number of parallel consumers is determined by the number of partitions of a topic, and the number of partitions can be set per topic, and therefore per tenant.
  • Performance fairness – You can implement performance fairness using Kafka quotas, supported by Amazon MSK, preventing the services processing a particularly busy tenant to consume too many cluster resources, at the expense of other tenants. Refer to the following two-part series for more details on Kafka quotas in Amazon MSK, and an example implementation for IAM authentication.
  • Tenant isolation – You can implement tenant isolation using IAM access control or Apache Kafka ACLs, depending on the authentication scheme that is used with Amazon MSK. Both IAM and Kafka ACLs allow you to control access per topic. You can authorize an application to access only the topics belonging to the tenant it is supposed to process.

Trade-offs in a SaaS environment

Although each model provides different capabilities for data partitioning, performance fairness, and tenant isolation, they also come with different costs and complexities. During planning, it’s important to identify what trade-offs you are willing to make for typical customers, and provide a tier structure to your client subscriptions.

The following table summarizes the supported capabilities of the three models in a streaming application.

. Pool Bridge Silo
Per-tenant encryption at rest No No Yes
Can implement right to be forgotten for single tenant No Yes Yes
Per-tenant retention policies No Yes Yes
Per-tenant event size limit No Yes Yes
Per-tenant replayability Yes (must implement with logic in consumers) Yes Yes

Anti-patterns

In the bridge model, we discussed tenant segregation by topic. An alternative would be segregating by partition, where all messages of a given type are sent to the same topic (for example, orders), but each tenant has a dedicated partition. This approach has many disadvantages and we strongly discourage it. In Kafka, partitions are the unit of horizontal scaling and balancing of brokers and consumers. Assigning partitions per tenants can introduce unbalancing of the cluster, and operational and performance issues that will be hard to overcome.

Some level of data isolation, such as per-tenant encryption keys, could be achieved using client-side encryption, delegating any encryption or description to the producer and consumer applications. This approach would allow you to use a separate encryption key per tenant. We don’t recommend this approach because it introduces a higher level of complexity in both the consumer and producer applications. It may also prevent you from using most of the standard programming libraries, Kafka tooling, and most Kafka ecosystem services, like Kafka Connect or MSK Connect.

Conclusion

In this post, we explored three patterns that SaaS vendors can use when architecting multi-tenant streaming applications with Amazon MSK: the pool, bridge, and silo models. Each model presents different trade-offs between operational simplicity, tenant isolation level, and cost efficiency.

The silo model dedicates full MSK clusters per tenant, offering a straightforward tenant isolation approach but incurring a higher maintenance and cost per tenant. The pool model offers increased operational and cost-efficiencies by sharing all resources across tenants, but provides limited data partitioning, performance fairness, and tenant isolation capabilities. Finally, the bridge model offers a good compromise between operational and cost-efficiencies while providing a good range of options to create robust tenant isolation and performance fairness strategies.

When architecting your multi-tenant streaming solution, carefully evaluate your requirements around tenant isolation, data privacy, per-tenant customization, and performance guarantees to determine the appropriate model. Combine models if needed to find the right balance for your business. As you scale your application, reassess isolation needs and migrate across models accordingly.

As you’ve seen in this post, there is no one-size-fits-all pattern for streaming data in a multi-tenant architecture. Carefully weighing your streaming outcomes and customer needs will help determine the correct trade-offs you can make while making sure your customer data is secure and auditable. Continue your learning journey on SkillBuilder with our SaaS curriculum, get hands-on with an AWS Serverless SaaS workshop or Amazon EKS SaaS workshop, or dive deep with Amazon MSK Labs.


About the Authors

Emmanuele Levi is a Solutions Architect in the Enterprise Software and SaaS team, based in London. Emanuele helps UK customers on their journey to refactor monolithic applications into modern microservices SaaS architectures. Emanuele is mainly interested in event-driven patterns and designs, especially when applied to analytics and AI, where he has expertise in the fraud-detection industry.

Lorenzo Nicora is a Senior Streaming Solution Architect helping customers across EMEA. He has been building cloud-native, data-intensive systems for over 25 years, working across industries, in consultancies and product companies. He has leveraged open-source technologies extensively and contributed to several projects, including Apache Flink.

Nicholas Tunney is a Senior Partner Solutions Architect for Worldwide Public Sector at AWS. He works with Global SI partners to develop architectures on AWS for clients in the government, nonprofit healthcare, utility, and education sectors.  He is also a core member of the SaaS Technical Field Community where he gets to meet clients from all over the world who are building SaaS on AWS.

За природата на поезията и политиката в думите. Разговор с Линда Грегърсън и Ник Леърд

Post Syndicated from Стефан Иванов original https://www.toest.bg/linda-gregerson-nick-laird-interview/

За природата на поезията и политиката в думите. Разговор с Линда Грегърсън и Ник Леърд

Линда Грегърсън (САЩ) и Ник Леърд (Северна Ирландия) гостуваха на тазгодишния международен фестивал „СтолицаЛитература“, организиран от Фондация „Елизабет Костова“. Двамата поети имаха литературно четене на 13 юни в софийската галерия „Прегърни ме“. Малко преди събитието Стефан Иванов разговаря с тях.

Как виждате ролята на поезията в съвременната политика? Може ли поезията да повлияе на политическия контекст и на общественото мнение?

Линда Грегърсън: Циничният отговор е, че поезията няма роля, поетите нямаме влияние, това е съвсем друга сфера. Но всъщност смятам, че светът на поезията е изключително важен, защото в тези наши разбити общества виждаме как хората вече не могат да си представят другия като човешко същество. Налице са тези дълбоки провали на въображението, на умението да се слуша и да се обръща внимание. И именно поезията е мястото, където можем едновременно да експериментираме с питания за това, което мислим и чувстваме, но и където можем да се вслушаме и да чуем, макар и накъсано, звука на човешкия глас.

Струва ми се, че това трябва да бъде основата за възстановяването на разговора в обществото. В липсата на тази основа има трагедия от световен мащаб. Независимо дали е в класната стая, в семейството, в градчето, на работното място, в държавата. Езикът e изкоренен. Той е толкова съзнателно манипулиран и изпразнен от съдържание съвсем нарочно от политици като нашия Доналд Тръмп, чиято стратегическа нечленоразделност е плашещо ефективен инструмент. Има верига на недобросъвестност в езика, която наистина е отровна. Предполагам, че на това ниво чрез поезията възстановяваме добросъвестността в използването на думите.

Ник Леърд: Мисля, че не, поезията не може да влияе върху политиката, но има странни случаи, в които го прави. Особено в Северна Ирландия, където творчеството на Шеймъс Хийни постоянно се цитира. Когато Бил Клинтън дойде на посещение покрай сключването на Белфасткото споразумение, той обичаше да цитира стихове на Хийни. Така че поезията може да изрази определени чувства по един рекламен начин. 

Не съм сигурен, че съвременната поезия има някаква роля в политиката освен тази, наистина. Според мен политиците обичат да избират определени стихове и да ги използват за свои собствени цели. Но не смятам, че поетите са „непризнатите законодатели на този свят“, както казва Пърси Шели. Не мисля, че е вече вярно. Искам да кажа, че това определено е спорт за малцина. Уинстън Одън например е интересен поет и човек, защото смята, че поезията е най-важното нещо на света – но и че е напълно безсмислена. Това не само е забавно, но е и вярно.

Какво мислите за настоящите културни политики, които засягат изкуствата и литературата? Има ли конкретни политики, които според вас са оказали значително влияние върху поетичната общност?

Линда Грегърсън: Това, разбира се, е различно на различните места, особено заради финансирането и начина, по който то се разпределя. От една страна, ние сме изключително привилегировани в Съединените щати. Съществуват множество източници на финансиране. Има организации като Академията на американските поети например, която има чудесен уебсайт, както и Фондация „Поезия“ с онлайн хранилището си на творби, биографии на поети и образователни материали.

Структурата на Академията на американските поети включва Съвет на ректорите, избирани с шестгодишен мандат, и по този начин организацията винаги се променя и развива. Това, с което се гордеех най-много по време на мандата си там, бяха образователните текстови, видео- и аудиоматериали, които създадохме за деца от детската градина до 12-ти клас. Тези материали бяха достъпни в целите Съединени щати, за всеки учител и всяко училище, което искаше да ги използва. По един проект, наречен „Скъпи поете“, подканихме ученици да пишат писма на поети. На всяко писмо се отговаряше. Започваха се и се продължаваха разговори.

Това са инициативи, които ми вдъхват надежда.

Ник Леърд: Всъщност не знам. Моите издатели „Фейбър енд Фейбър“ имат икономически интереси и печелят много пари от мюзикъла по Т. С. Елиът „Котките“. Той издържа издателството от години. Но знам, че много от издателите на поезия са държавно субсидирани. Това често е начинът, по който се издава поезия. Всяка година съм в жури за награда за дебютна поезия и всеки път се изненадвам от количеството публикувани книги. Може би много от тях са публикувани твърде бързо. Субсидираното от държавата издаване има своите недостатъци. То е несъмнено нещо добро, разбира се, аз не съм срещу държавната политика за влагане на пари в изкуството. Та боже мой, културата печели колкото може, а са нужни повече средства. Но не вярвам, че всяко произведение на изкуството трябва да печели пари. Нужно е да има място и за авангардно изкуство.

Всъщност нямам отговор на този въпрос. Знам, че поетите трябва да правят и други неща, а не само да пишат поезия. Аз съм на 48 години и известно време бях адвокат, а след това пишех романи. Сега пиша сценарии и преподавам в Белфаст. Може би поезията винаги трябва да бъде хоби. Смятам нейното професионализиране за странно. Тя не работи точно по този начин. Мисля, че има добри работни места за поетите, както Тед Хюз говореше за идеята поетите да са нощни пазачи, които могат просто да седят и да четат книги по цели нощи. Това е шега, но не съвсем, защото човек наистина има нужда от работа. Например да бъде градинар или нещо подобно. Алис Осуалд е градинарка. Тя има възможност да мисли през цялото време, но и ръцете ѝ са заети и е навън сред природата. Ако работата ви е да седите по цял ден и да се опитвате да напишете стихотворение, главата ви ще се пръсне. 

Сблъсквали ли сте се някога с цензура или с ограничаване на свободата на словото? Как трябва да се ориентират поетите в този социален климат?

Линда Грегърсън: Ще ми се да знаех. Трябва да започнем с това да бъдем добри един към друг, да бъдем щедри, да се съмняваме в радикалната си правота и да вярваме, че има начини да общуваме. Много често и много лесно войните в културата стават реални войни. Трябва само да се огледаме. Понякога съм благодарна, че съм стара, защото скоро може да бъде още по-ужасяващо. 

Едно от нещата, които най-много ме притесняват през последните десетилетия, е режимът на атака в социалните мрежи, където читатели и поети се цензурират един друг, намират се за виновни за някакви ужасни и понякога недействителни неща. Преподавам поезия на млади хора, които се страхуват да пишат. Това наистина ме тревожи, защото има много неща, които трябва да се изследват смело. Но за младите поети това е като минно поле.

Ник Леърд: Няма да навлизам в подробности, но да, сблъсквал съм се с цензура. Когато излезе първият ми роман в Северна Ирландия, разстроих някои хора, които звъняха на родителите ми и ги тормозеха. Бях говорил за протестантски паравоенни формирования в моя град и подразних някои хора. Но никога не се е стигало до насилие. Казвал съм това, което мисля. 

Живеем обаче в такова време – не бива да пишеш нищо, което може лесно да бъде изтълкувано погрешно или да бъде манипулирано. Сложно време. И глупаво, много глупаво. Живях в Ню Йорк 12 години и накрая бях доста щастлив да го напусна, защото градът се поляризира и стана толкова нелеп. На всичко трябваше да се слага етикет. А аз произхождам от работническата класа. Родителите ми не са завършили университет. Учил съм в държавно училище, а преподавах на много деца, завършили частни училища, и разговорите бяха твърде глупави. В Белфаст е различно. Той е много по-беден в сравнение с Ню Йорк. Хората не плащат по 100 000 долара на година, за да бъдат обучавани. И разговорите са по-скоро за поезия и по-малко за привилегии, което е облекчение. В момента Америка преживява свой собствен спазъм и трябва да видим какво ще се случи, но се радвам, че вече не съм в центъра ѝ. 

Що се отнася до свободата на словото – дадох интервю за Irish Times, в което говорихме за Брекзит и нарекох Борис Джонсън лайно. И той е лайно, разбира се. Помислих си, че ще имам неприятности за това, но на никого не му пукаше, защото всички също смятаха, че той е лайно. Можеш да кажеш много неща и може би ще предизвикаш малък скандал за един ден, но думите вече рядко водят до последствия и реакции. Това не е добро време. Не е добро време за нищо.

Във връзка с това смятате ли, че поетите наистина имат отговорността да се ангажират с политически и социални въпроси в творчеството си?

Линда Грегърсън: Зависи как се тълкува това. Със сигурност не смятам, че поезията може да предложи политически прозрения. Поезията не е добра в аргументираните спорове, защото се превръща в дидактика и агитация, в нещо друго, което просто не е в специалните ѝ правомощия. Поезията според мен е много добра в задаването на въпроси. А въпросите може да се задават и добронамерено. Не реторично и не като атака срещу другите, а като спор със самия себе си, като разпит, при който човекът и съвестта му са в риск, защото това може да бъде и опасен диалог. Мисля, че поезията е много добра в откриването на симптомите – в идентифицирането им, в тревожното им осъзнаване и в събуждането на желание за промяна.

Ник Леърд: Поетите не са длъжни да се занимават с политически и социални теми. Не мисля, че поетите имат по-голяма или по-малка отговорност от всички останали граждани. Те трябва да гласуват и трябва да казват какво мислят. Но не бих искал да казвам на поет, че трябва да пише за определени каузи, отчасти защото поезията не работи по този начин. Трябва просто да седиш и да чакаш нещо да се появи, а след това да го следваш. Стихотворения, в които няма никаква изненада, не представляват интерес за мен. Трябва да е изненада за автора, преди да е изненада за читателя. И мисля, че проблемът с писането на стихотворения за някаква кауза е, че то просто става дидактично и скучно. Поезията, както знаете, е много странно животно и обича свободата.

Как неотдавнашните глобални събития, пандемията, климатичните промени, движенията за социална справедливост повлияха на поезията ви и на нещата, които изследвате по някакъв начин?

Линда Грегърсън: Изключително много. Човек не може да бъде жив и осъзнат, без да му тежат пагубните щети, които сме нанесли на планетата, на климата и един на друг. Мисля, че това трябва да се признае, а не просто да е повод за хленч. Искам да кажа, че има начини поезията да се вълнува от това. Ако предпазваме поезията от всичко, това е просто лошо за нея. Не може да се каже: тази тема е подходяща, а тази – не.

Ник Леърд: Докато растях в Северна Ирландия, течеше кулминацията на т.нар. политика на идентичността, защото всеки гласуваше според това в какво семейство е роден и в какво е приобщен. Като резултат аз се интересувам по-малко от политиката на идентичността. Прекарах дълго време в очакване Северна Ирландия да се превърне в останалата част от света, а вместо това останалата част от света се превърна в Северна Ирландия. И това беше объркващо и болезнено. 

Не мисля, че това ми е повлияло по отношение на климатичните промени, освен че се върнах в Северна Ирландия и наскоро направих документално радиопредаване за унищожаването на най-голямото езеро на Британските острови. То просто е опустошено от замърсяване, от липса на законодателство и надзор. Освен това все още е собственост на потомък на човека, който го е откраднал преди повече от четири века. И това е експлоататорският капитализъм в най-лошия му вид. Той просто взема целия пясък от езерото и го продава. Вече векове в Северна Ирландия се живее с едни и същи проблеми. 

Предполагам, че поезията е начин да се опитаме да мислим за тези неща и да ги изразим. Но тя не е много добър метод за предаване на информация. 

Линда Грегърсън (р. 1950) е авторка на шест стихосбирки, последната от които е Canopy, публикувана през 2022 г. от нюйоркското издателство Ecco. Финалистка и носителка е на множество награди, сред които „Кингзли Тъфтс“, „Ленор Маршал“, Националната литературна награда на САЩ и др. Книгата с нейни избрани стихотворения „Машини за дишане“ в превод на Надежда Радулова е публикувана в България през 2018 г. от Издателство за поезия „ДА“. Грегърсън е член на Американската академия на изкуствата и науките, почетен ректор на Академията на американските поети и професор в Мичиганския университет, където ръководи и писателската програма „Хелън Зел“.

Ник Леърд (р. 1975) е поет, романист, сценарист, критик, автор на детски книги и бивш адвокат. Носител е на множество награди, сред които „Бети Траск“, „Руни“, „Джефри Фабер“, „Форуард“, „Съмърсет Моъм“. Преподавал е в Колумбийския, Принстънския и Нюйоркския университет, а в момента е професор по поезия в Университета „Куинс“ в Белфаст. Последната му стихосбирка Up Late е публикувана през 2023-та, а тази година предстои издаването на новата му детска книга със заглавие Weirdo Goes Wild. Съпруг е на писателката Зейди Смит.

Anthropic’s Claude 3.5 Sonnet model now available in Amazon Bedrock: Even more intelligence than Claude 3 Opus at one-fifth the cost

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/anthropics-claude-3-5-sonnet-model-now-available-in-amazon-bedrock-the-most-intelligent-claude-model-yet/

It’s been just 3 months since Anthropic launched Claude 3, a family of state-of-the-art artificial intelligence (AI) models that allows you to choose the right combination of intelligence, speed, and cost that suits your needs.

Today, Anthropic introduced Claude 3.5 Sonnet, its first release in the forthcoming Claude 3.5 model family. We are happy to announce that Claude 3.5 Sonnet is now available in Amazon Bedrock.

Claude 3.5 Sonnet raises the industry bar for intelligence, outperforming other generative AI models on a wide range of evaluations, including Anthropic’s previously most intelligent model, Claude 3 Opus. Claude 3.5 Sonnet is available with the speed and cost of the original Claude 3 Sonnet model. In fact, you can now get intelligence and speed better than Claude 3 Opus at one-fifth of the price because Claude 3.5 Sonnet is 80 percent cheaper than Opus.

Anthropic Claude 3.5 Sonnet Family

The frontier intelligence displayed by Claude 3.5 Sonnet combined with cost-effective pricing, makes the model ideal for complex tasks such as context-sensitive customer support, orchestrating multi-step workflows, and streamlining code translations.

Claude 3.5 Sonnet sets new industry benchmarks for undergraduate-level expert knowledge (MMLU), graduate-level expert reasoning (GPQA), code (HumanEval), and more. As you can see in the following table, according to Anthropic, Claude 3.5 Sonnet outperforms OpenAI’s GPT-4o and Google’s Gemini 1.5 Pro in nearly every benchmark.

Anthropic Claude 3.5 Sonnet Benchmarks

Claude 3.5 Sonnet is also Anthropic’s strongest vision model yet, performing an average of 10 percent better than Claude 3 Opus across the majority of vision benchmarks. According to Anthropic, Claude 3.5 Sonnet also outperforms other generative AI models in nearly every category.

Anthropic Claude 3.5 Sonnet Vision Benchmarks

Anthropic’s Claude 3.5 Sonnet key improvements
The release of Claude 3.5 Sonnet brings significant improvements across multiple domains, empowering software developers and businesses with new generative AI-powered capabilities. Here are some of the key strengths of this new model:

Visual processing and understanding – Claude 3.5 Sonnet demonstrates remarkable capabilities in processing images, particularly in interpreting charts and graphs. It accurately transcribes text from imperfect images, a core capability for industries such as retail, logistics, and financial services, to gather more insights from graphics or illustrations than from text alone. Use Claude 3.5 Sonnet to automate visual data processing tasks, extract valuable information, and enhance data analysis pipelines.

Writing and content generation – Claude 3.5 Sonnet represents a significant leap in its ability to understand nuance and humor. The model produces high-quality written content with a more natural, human tone that feels more authentic and relatable. Use the model to generate engaging and compelling content, streamline your writing workflows, and enhance your storytelling capabilities.

Customer support and natural language processing – With its improved understanding of context and multistep workflow orchestration, Claude 3.5 Sonnet excels at handling intricate customer inquiries. This capability enables round-the-clock support, faster response times, and more natural-sounding interactions, ultimately leading to improved customer satisfaction. Use this model to automate and enhance customer support processes and provide a seamless experience for end users. For an example of a similar implementation, see how DoorDash built a generative AI self-service contact center solution using Anthropic’s Claude 3 models in Amazon Bedrock.

Analytics and insights – Claude 3.5 Sonnet augments human expertise in data science by effortlessly navigating unstructured data and using multiple tools to generate insights. It produces high-quality statistical visualizations and actionable predictions, ranging from business strategies to real-time product trends. Use Claude 3.5 Sonnet to simplify data analysis workflows, uncover valuable insights, and drive data-driven decision-making.

Coding and software development – Claude 3.5 Sonnet can independently write, edit, and execute code with sophisticated reasoning and troubleshooting capabilities when provided with the relevant tools. Use Claude 3.5 Sonnet to streamline developer workflows, accelerate coding tasks, reduce manual effort, and enhance overall productivity.

Using Anthropic’s Claude 3.5 Sonnet in the Amazon Bedrock console
Because this is a new model, I go to the Amazon Bedrock console and choose Model access from the navigation pane to enable access to Claude 3.5 Sonnet.

Console screenshot.

Now that I have access to the model, I’d like to use it to extract information from the following image from the Our World in Data website:

Primary energy consumption from hydropower, 2022

I choose Chat in the Playgrounds section of the navigation pane and select Anthropic and Claude 3.5 Sonnet as the model. I then choose the Image upload button to upload the previous image, type the following prompt, and choose Run:

Which countries consume more than 1000 TWh from hydropower? Think step by step and look at all regions. Output in JSON.

Here’s the result and an animated image recapping what I did.

To determine which countries consume more than 1,000 TWh from hydropower in 2022, I need to carefully examine the map and color-coding. The scale at the bottom shows that dark blue represents consumption between 1,000 TWh and 2,000 TWh.

Let's look at each region systematically:

1. North America: Canada is dark blue, indicating it consumes between 1,000-2,000 TWh from hydropower.
2. South America: Brazil is dark blue, also consuming between 1,000-2,000 TWh.
3. Europe: No country is dark blue; all are light green or have no data.
4. Africa: No data available for any country.
5. Asia: China is dark blue, indicating consumption between 1,000-2,000 TWh.
6. Oceania: No country is dark blue; Australia is light green.

In summary, there are three countries that consume more than 1,000 TWh from hydropower in 2022:
1. Canada (North America)
2. Brazil (South America)
3. China (Asia)

Here's the response in JSON format:

```json
{
  "countries_consuming_over_1000TWh_hydropower": [
    {
      "name": "Canada",
      "region": "North America",
      "consumption_range": "1,000-2,000 TWh"
    },
    {
      "name": "Brazil",
      "region": "South America",
      "consumption_range": "1,000-2,000 TWh"
    },
    {
      "name": "China",
      "region": "Asia",
      "consumption_range": "1,000-2,000 TWh"
    }
  ]
}
```

These three countries stand out as the highest consumers of hydropower energy, each using between 1,000 and 2,000 terawatt-hours in 2022.

Anthropic's Claude 3.5 Sonnet demo in the Amazon Bedrock console.

The model’s ability to reliably extract information from unstructured data, like images, opens up a world of new possibilities.

I choose the three small dots in the corner of the playground window and then View API request to see code examples using the model in the AWS Command Line Interface (AWS CLI) and AWS SDKs. Let’s have a better look at the code syntax.

Using Claude 3.5 Sonnet with AWS SDKs
You can use Claude 3.5 Sonnet with any AWS SDK using the new Amazon Bedrock Converse API or Anthropic Claude Messages API.

To update code already using a Claude 3 model, I just need to replace the model ID with:

anthropic.claude-3-5-sonnet-20240620-v1:0

Here’s a sample implementation with the AWS SDK for Python (Boto3) using the same image as before to show how to use images and text with the Converse API.

import boto3
from botocore.exceptions import ClientError

MODEL_ID = "anthropic.claude-3-5-sonnet-20240620-v1:0"

IMAGE_NAME = "primary-energy-hydro.png"

bedrock_runtime = boto3.client("bedrock-runtime", region_name="us-east-1")

with open(IMAGE_NAME, "rb") as f:
    image = f.read()

user_message = "Which countries consume more than 1000 TWh from hydropower? Think step by step and look at all regions. Output in JSON."

messages = [
    {
        "role": "user",
        "content": [
            {"image": {"format": "png", "source": {"bytes": image}}},
            {"text": user_message},
        ],
    }
]

response = bedrock_runtime.converse(
    modelId=MODEL_ID,
    messages=messages,
)
response_text = response["output"]["message"]["content"][0]["text"]
print(response_text)

When I run it, I get a similar output as in the console:

Let's approach this step-by-step:

1. First, I'll examine the color scale at the bottom of the map. The darkest blue color represents consumption of 2,000 TWh or more.

2. Now, I'll scan the map region by region:

   North America: Canada is dark blue, indicating over 1,000 TWh.
   South America: Brazil is also dark blue, over 1,000 TWh.
   Europe: No country appears to be dark blue.
   Africa: No country appears to be dark blue.
   Asia: China stands out as dark blue, indicating over 1,000 TWh.
   Oceania: No country appears to be dark blue.

3. To be thorough, I'll double-check for any medium blue countries that might be close to or over 1,000 TWh, but I don't see any that appear to reach that threshold.

4. Based on this analysis, there are three countries that clearly consume more than 1,000 TWh from hydropower.

Now, I'll format the answer in JSON:

```json
{
  "countries_consuming_over_1000TWh_hydropower": [
    "Canada",
    "Brazil",
    "China"
  ]
}
```

This JSON output lists the three countries that visually appear to consume more than 1,000 TWh of primary energy from hydropower according to the 2022 data presented in the map.

Because I didn’t specify a JSON syntax, the two answers use a different format. In your applications, you can describe in the prompt the JSON properties you want or provide a sample to get a standard format in output.

For more examples, see the code samples in the Amazon Bedrock User Guide. For a more advanced use case, here’s a fully functional tool use demo illustrating how to connect a generative AI model with a custom tool or API.

Using Claude 3.5 Sonnet with the AWS CLI
There are times when nothing beats the speed of the command line. This is how you can use the AWS CLI with the new model:

aws bedrock-runtime converse \
    --model-id anthropic.claude-3-5-sonnet-20240620-v1:0 \
    --messages '{"role": "user", "content": [{"text": "Alice has N brothers and she also has M sisters. How many sisters does Alice’s brother have?"}]}' \
    --region us-east-1
    --query output.message.content

In the output, I use the query option to only get the content of the output message:

[
    {
        "text": "Let's approach this step-by-step:\n\n1. First, we need to understand the relationships:\n   - Alice has N brothers\n   - Alice has M sisters\n\n2. Now, let's consider Alice's brother:\n   - He is one of Alice's N brothers\n   - He has the same parents as Alice\n\n3. This means that Alice's brother has:\n   - The same sisters as Alice\n   - One sister more than Alice (because Alice herself is his sister)\n\n4. Therefore, the number of sisters Alice's brother has is:\n   M + 1\n\n   Where M is the number of sisters Alice has.\n\nSo, the answer is: Alice's brother has M + 1 sisters."
    }
]

I copy the text into a small Python program to see it printed on multiple lines:

print("Let's approach this step-by-step:\n\n1. First, we need to understand the relationships:\n   - Alice has N brothers\n   - Alice has M sisters\n\n2. Now, let's consider Alice's brother:\n   - He is one of Alice's N brothers\n   - He has the same parents as Alice\n\n3. This means that Alice's brother has:\n   - The same sisters as Alice\n   - One sister more than Alice (because Alice herself is his sister)\n\n4. Therefore, the number of sisters Alice's brother has is:\n   M + 1\n\n   Where M is the number of sisters Alice has.\n\nSo, the answer is: Alice's brother has M + 1 sisters.")
Let's approach this step-by-step:

1. First, we need to understand the relationships:
   - Alice has N brothers
   - Alice has M sisters

2. Now, let's consider Alice's brother:
   - He is one of Alice's N brothers
   - He has the same parents as Alice

3. This means that Alice's brother has:
   - The same sisters as Alice
   - One sister more than Alice (because Alice herself is his sister)

4. Therefore, the number of sisters Alice's brother has is:
   M + 1

   Where M is the number of sisters Alice has.

So, the answer is: Alice's brother has M + 1 sisters.

Even if this was a quite nuanced question, Claude 3.5 Sonnet got it right and described its reasoning step by step.

Things to know
Anthropic’s Claude 3.5 Sonnet is available in Amazon Bedrock today in the US East (N. Virginia) AWS Region. More information on Amazon Bedrock model support by Region is available in the documentation. View the Amazon Bedrock pricing page to determine the costs for your specific use case.

By providing access to a faster and more powerful model at a lower cost, Claude 3.5 Sonnet makes generative AI easier and more effective to use for many industries, such as:

Healthcare and life sciences – In the medical field, Claude 3.5 Sonnet shows promise in enhancing imaging analysis, acting as a diagnostic assistant for patient triage, and summarizing the latest research findings in an easy-to-digest format.

Financial services – The model can provide valuable assistance in identifying financial trends and creating personalized debt repayment plans tailored to clients’ unique situations.

Legal – Law firms can use the model to accelerate legal research by quickly surfacing relevant precedents and statutes. Additionally, the model can increase paralegal efficiency through contract analysis and assist with drafting standard legal documents.

Media and entertainment – The model can expedite research for journalists, support the creative process of scriptwriting and character development, and provide valuable audience sentiment analysis.

Technology – For software developers, Claude 3.5 Sonnet offers opportunities in rapid application prototyping, legacy code migration, innovative feature ideation, user experience optimization, and identification of friction points.

Education – Educators can use the model to streamline grant proposal writing, develop comprehensive curricula incorporating emerging trends, and receive research assistance through database queries and insight generation.

It’s an exciting time for for generative AI. To start using this new model, see the Anthropic Claude models section of the Amazon Bedrock User Guide. You can also visit our community.aws site to find deep-dive technical content and to discover how our Builder communities are using Amazon Bedrock in their solutions. Let me know what you do with these enhanced capabilities!

Danilo

Security updates for Thursday

Post Syndicated from jake original https://lwn.net/Articles/979153/

Security updates have been issued by AlmaLinux (ghostscript and thunderbird), Debian (chromium, composer, libndp, and sendmail), Fedora (composer), Mageia (flatpak and python-scikit-learn), Red Hat (curl, ghostscript, and thunderbird), SUSE (hdf5 and opencc), and Ubuntu (gdb and php7.4, php8.1, php8.2, php8.3).

A Recap of the Data Engineering Open Forum at Netflix

Post Syndicated from Netflix Technology Blog original https://netflixtechblog.com/a-recap-of-the-data-engineering-open-forum-at-netflix-6b4d4410b88f

A summary of sessions at the first Data Engineering Open Forum at Netflix on April 18th, 2024

The Data Engineering Open Forum at Netflix on April 18th, 2024.

At Netflix, we aspire to entertain the world, and our data engineering teams play a crucial role in this mission by enabling data-driven decision-making at scale. Netflix is not the only place where data engineers are solving challenging problems with creative solutions. On April 18th, 2024, we hosted the inaugural Data Engineering Open Forum at our Los Gatos office, bringing together data engineers from various industries to share, learn, and connect.

At the conference, our speakers share their unique perspectives on modern developments, immediate challenges, and future prospects of data engineering. We are excited to share the recordings of talks from the conference with the rest of the world.

Opening Remarks

Recording

Speaker: Max Schmeiser (Vice President of Studio and Content Data Science & Engineering)

Summary: Max Schmeiser extends a warm welcome to all attendees, marking the beginning of our inaugural Data Engineering Open Forum.

Evolving from Rule-based Classifier: Machine Learning Powered Auto Remediation in Netflix Data Platform

Recording

Speakers:

Summary: At Netflix, hundreds of thousands of workflows and millions of jobs are running every day on our big data platform, but diagnosing and remediating job failures can impose considerable operational burdens. To handle errors efficiently, Netflix developed a rule-based classifier for error classification called “Pensive.” However, as the system has increased in scale and complexity, Pensive has been facing challenges due to its limited support for operational automation, especially for handling memory configuration errors and unclassified errors. To address these challenges, we have developed a new feature called “Auto Remediation,” which integrates the rules-based classifier with an ML service.

Automating the Data Architect: Generative AI for Enterprise Data Modeling

Recording

Speaker: Jide Ogunjobi (Founder & CTO at Context Data)

Summary: As organizations accumulate ever-larger stores of data across disparate systems, efficiently querying and gaining insights from enterprise data remain ongoing challenges. To address this, we propose developing an intelligent agent that can automatically discover, map, and query all data within an enterprise. This “Enterprise Data Model/Architect Agent” employs generative AI techniques for autonomous enterprise data modeling and architecture.

Tulika Bhatt, Senior Data Engineer at Netflix, shared how her team manages impression data at scale.

Real-Time Delivery of Impressions at Scale

Recording

Speaker: Tulika Bhatt (Senior Data Engineer at Netflix)

Summary: Netflix generates approximately 18 billion impressions daily. These impressions significantly influence a viewer’s browsing experience, as they are essential for powering video ranker algorithms and computing adaptive pages, With the evolution of user interfaces to be more responsive to in-session interactions, coupled with the growing demand for real-time adaptive recommendations, it has become highly imperative that these impressions are provided on a near real-time basis. This talk will delve into the creative solutions Netflix deploys to manage this high-volume, real-time data requirement while balancing scalability and cost.

Reflections on Building a Data Platform From the Ground Up in a Post-GDPR World

Recording

Speaker: Jessica Larson (Data Engineer & Author of “Snowflake Access Control”)

Summary: The requirements for creating a new data warehouse in the post-GDPR world are significantly different from those of the pre-GDPR world, such as the need to prioritize sensitive data protection and regulatory compliance over performance and cost. In this talk, Jessica Larson shares her takeaways from building a new data platform post-GDPR.

Unbundling the Data Warehouse: The Case for Independent Storage

Recording

Speaker: Jason Reid (Co-founder & Head of Product at Tabular)

Summary: Unbundling a data warehouse means splitting it into constituent and modular components that interact via open standard interfaces. In this talk, Jason Reid discusses the pros and cons of both data warehouse bundling and unbundling in terms of performance, governance, and flexibility, and he examines how the trend of data warehouse unbundling will impact the data engineering landscape in the next 5 years.

Clark Wright, Staff Analytics Engineer at Airbnb, talked about the concept of Data Quality Score at Airbnb.

Data Quality Score: How We Evolved the Data Quality Strategy at Airbnb

Recording

Speaker: Clark Wright (Staff Analytics Engineer at Airbnb)

Summary: Recently, Airbnb published a post to their Tech Blog called Data Quality Score: The next chapter of data quality at Airbnb. In this talk, Clark Wright shares the narrative of how data practitioners at Airbnb recognized the need for higher-quality data and then proposed, conceptualized, and launched Airbnb’s first Data Quality Score.

Data Productivity at Scale

Recording

Speaker: Iaroslav Zeigerman (Co-Founder and Chief Architect at Tobiko Data)

Summary: The development and evolution of data pipelines are hindered by outdated tooling compared to software development. Creating new development environments is cumbersome: Populating them with data is compute-intensive, and the deployment process is error-prone, leading to higher costs, slower iteration, and unreliable data. SQLMesh, an open-source project born from our collective experience at companies like Airbnb, Apple, Google, and Netflix, is designed to handle the complexities of evolving data pipelines at an internet scale. In this talk, Iaroslav Zeigerman discusses challenges faced by data practitioners today and how core SQLMesh concepts solve them.

Last but not least, thank you to the organizers of the Data Engineering Open Forum: Chris Colburn, Xinran Waibel, Jai Balani, Rashmi Shamprasad, and Patricia Ho.

Until next time!

If you are interested in attending a future Data Engineering Open Forum, we highly recommend you join our Google Group to stay tuned to event announcements.


A Recap of the Data Engineering Open Forum at Netflix was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Дискусия на Центъра за изследване на демокрацията Незаконни са ½ от сечите у нас. Бизнесът обвинява политиците

Post Syndicated from Николай Марченко original https://bivol.bg/cid-nezakonna-sech.html

четвъртък 20 юни 2024


Около или малко над 50% от дърводобива в България представлява незаконната или “узаконената” по всякакви начини сеч. Това констатира председателят на Асоциацията на парковете в България и бивш зам.-министър на…

CISPE Data Protection Code of Conduct Public Register now has 113 compliant AWS services

Post Syndicated from Gokhan Akyuz original https://aws.amazon.com/blogs/security/cispe-data-protection-code-of-conduct-public-register-now-has-113-compliant-aws-services/

We continue to expand the scope of our assurance programs at Amazon Web Services (AWS) and are pleased to announce that 113 services are now certified as compliant with the Cloud Infrastructure Services Providers in Europe (CISPE) Data Protection Code of Conduct. This alignment with the CISPE requirements demonstrates our ongoing commitment to adhere to the heightened expectations for data protection by cloud service providers. AWS customers who use AWS certified services can be confident that their data is processed in adherence with the European Union’s General Data Protection Regulation (GDPR).

The CISPE Code of Conduct is the first pan-European, sector-specific code for cloud infrastructure service providers, which received a favorable opinion that it complies with the GDPR. It helps organizations across Europe accelerate the development of GDPR compliant, cloud-based services for consumers, businesses, and institutions.

The accredited monitoring body EY CertifyPoint evaluated AWS on May 16, 2024, and successfully audited 104 certified services. AWS added nine additional services to the current scope in May 2024. As of the date of this blog post, 113 services are in scope of this certification. The Certificate of Compliance that illustrates AWS compliance status is available on the CISPE Public Register. For up-to-date information, including when additional services are added, search the CISPE Public Register by entering AWS as the Seller of Record; or see the AWS CISPE Data Protection Code of Conduct page.

AWS strives to bring additional services into the scope of its compliance programs to help you meet your architectural and regulatory needs. If you have questions or feedback about AWS compliance with CISPE, reach out to your AWS account team.

To learn more about our compliance and security programs, see AWS Compliance ProgramsAWS General Data Protection Regulation (GDPR) Center, and the EU data protection section of the AWS Cloud Security website. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.

If you have feedback about this post, submit comments in the Comments section below.

Gokhan Akyuz

Gokhan Akyuz

Gokhan is a Security Audit Program Manager at AWS, based in Amsterdam. He leads security audits, attestations, and certification programs across Europe and the Middle East. He has 17 years of experience in IT and cybersecurity audits, IT risk management, and controls implementation in a wide range of industries.

Patrick Finn: why I joined Cloudflare as VP Sales for the Americas

Post Syndicated from Patrick S. Finn original https://blog.cloudflare.com/patrick-finn


I’m delighted to be joining Cloudflare as Vice President of Sales in the US, Canada, and Latin America.

I’ve had the privilege of leading sales for some of the world’s most iconic tech companies, including IBM and Cisco. During my career I’ve led international teams numbering in the thousands and driving revenue in the billions of dollars while serving some of the world’s largest enterprise customers. I’ve seen first-hand the evolution of technology and what it can achieve for businesses, from robotics, automation, and data analytics, to cloud computing, cybersecurity, and AI.

I firmly believe Cloudflare is well on its way to being one of the next iconic tech companies.

Why Cloudflare

Cloudflare has a unique opportunity to help businesses navigate an enduring wave of technological change. There are few companies in the world that operate in the three most exciting fields of innovation that will continue to shape our world in the coming years: cloud computing, AI, and cybersecurity. Cloudflare is one of those companies. When I was approached for this role, I spoke to a wide range of connections across the financial sector, private companies, and government. The feedback was unanimous that Cloudflare is poised on the edge of exhilarating growth.

Driving predictable, profitable revenue

I was fortunate to join Cisco two years after its annual revenue passed the $1 billion mark and had the privilege of helping scale the business to more than $49 billion in revenue the year I left. Cloudflare passed the $1 billion milestone just last year, and I see the same potential for growth here as I saw at Cisco.

Cloudflare’s global sales organization is growing. I’m excited to help accelerate that process in a way that delivers recurring revenue for the business while ensuring we retain a very high bar in terms of the talent we bring onto the team. My experience leading complex, cross-functional sales organizations within large global companies has taught me a great deal about the common traits among highly effective sales functions.

The groups of individuals that come together to make true teams are the ones that successfully focus on a unifying goal and develop skills like communication, attitude, process, organization, consistency, collaboration, partnership, and accountability.  These teams embrace diversity and bring out of each other the best expertise, creativity, and skills, making the team stronger and keeping the goal in focus.

Making our customers our north star

We will achieve the opportunity ahead of us only as long as we have our customers as our north star. Today, the Americas represent more than half of Cloudflare’s revenue worldwide and are home to some of our largest and most strategic customers – both in the private and public sectors – including 30% of the Fortune 1000. Brands from Zendesk to Shopify and from Colgate-Palmolive to Mars rely on Cloudflare to operate their businesses in a fast, secure, and reliable way.

Whatever the technology, there are three common fundamentals I’ve found essential to creating value for customers: being the expert on their challenges, understanding how to pick the right combination of products, services, and solutions from those available, and knowing your competition.

Cloudflare already has an incredible and growing range of products and services that are helping millions of individuals and organizations maximize the opportunities presented by cloud computing and generative AI, all while staying safe from the threat of cyberattacks.

What helping to build a better Internet means to me

If it were needed, one additional deciding factor behind my excitement in joining Cloudflare is its ambitious mission to help build a better Internet. As a father, I want the Internet to be a safe and valuable resource for my family and friends and for generations to come. I don’t want my daughter to have to worry about her personal data and privacy as she’s buying Billie Eilish concert tickets online (and, yes, I’m going too).

Today Cloudflare’s connectivity cloud protects nearly 20% of all websites online and stops 209 billion cyber attacks daily. In addition to its growing customer base, Cloudflare is living up to its mission by offering its services for free to millions more individuals and small businesses, including the most vulnerable voices online through its Project Galileo initiative.

The combination of a strong mission, genuine values, a great team, and incredible technology isn’t a given in every company, but is evident at Cloudflare. I’m excited to play a part as Cloudflare continues to scale its business and help build a better Internet for everyone.

If you’re interested in learning more about what Cloudflare can do for your organization, please get in touch here. If you’re an ambitious, talented sales professional looking for your next challenging and rewarding career move, check out our open positions here.

Introducing Stream Generated Captions, powered by Workers AI

Post Syndicated from Mickie Betz original https://blog.cloudflare.com/stream-automatic-captions-with-ai


With one click, customers can now generate video captions effortlessly using Stream’s newest feature: AI-generated captions for on-demand videos and recordings of live streams. As part of Cloudflare’s mission to help build a better Internet, this feature is available to all Stream customers at no additional cost.

This solution is designed for simplicity, eliminating the need for third-party transcription services and complex workflows. For videos lacking accessibility features like captions, manual transcription can be time-consuming and impractical, especially for large video libraries. Traditionally, it has involved specialized services, sometimes even dedicated teams, to transcribe audio and deliver the text along with video, so it can be displayed during playback. As captions become more widely expected for a variety of reasons, including ethical obligation, legal compliance, and changing audience preferences, we wanted to relieve this burden.

With Stream’s integrated solution, the caption generation process is seamlessly integrated into your existing video management workflow, saving time and resources. Regardless of when you uploaded a video, you can easily add automatic captions to enhance accessibility. Captions can now be generated within the Cloudflare Dashboard or via an API request, all within the familiar and unified Stream platform.

This feature is designed with utmost consideration for privacy and data protection. Unlike other third-party transcription services that may share content with external entities, your data remains securely within Cloudflare’s ecosystem throughout the caption generation process. Cloudflare does not utilize your content for model training purposes. For more information about data protection, review Your Data and Workers AI.

Getting Started

Starting June 20th, 2024, this beta is available for all Stream customers as well as subscribers of the Professional and Business plans, which include 100 minutes of video storage.

To get started, upload a video to Stream (from the Cloudflare Dashboard or via API).

Next, navigate to the “Captions” tab on the video, click “Add Captions,” then select the language and “Generate captions with AI.” Finally, click save and within a few minutes, the new captions will be visible in the captions manager and automatically available in the player, too. Captions can also be generated via the API.

Captions are usually generated in a few minutes. When captions are ready, the Stream player will automatically be updated to offer them to users. The HLS and DASH manifests are also updated so third party players that support text tracks can display them as well.

On-demand videos and recordings of live streams, regardless of when they were created, are supported. While in beta, only English captions can be generated, and videos must be shorter than 2 hours. The quality of the transcription is best on videos with clear speech and minimal background noise.

We’ve been pleased with how well the AI model transcribes different types of content during our tests. That said, there are times when the results aren’t perfect, and another method might work better for some use cases. It’s important to check if the accuracy of the generated captions are right for your needs.

Technical Details

Built using Workers AI

The Stream engineering team built this new feature using Workers AI, allowing us to access the Whisper model – an open source Automatic Speech Recognition model – with a single API call. Using Workers AI radically simplified the AI model deployment, integration, and scaling with an out-of-the-box solution. We eliminated the need for our team to handle infrastructure complexities, enabling us to focus solely on building the automated captions feature.

Writing software that utilizes an AI model can involve several challenges. First, there’s the difficulty of configuring the appropriate hardware infrastructure. AI models require substantial computational resources to run efficiently and require specialized hardware, like GPUs, which can be expensive and complex to manage. There’s also the daunting task of deploying AI models at scale, which involve the complexities of balancing workload distribution, minimizing latency, optimizing throughput, and maintaining high availability. Not only does Workers AI solve the pain of managing underlying infrastructure, it also automatically scales as needed.

Using Workers AI transformed a daunting task into a Worker that transcribes audio files with less than 30 lines of code.

import { Ai } from '@cloudflare/ai'


export interface Env {
 AI: any
}


export type AiVTTOutput = {
 vtt?: string
}


export default {
 async fetch(request: Request, env: Env) {
   const blob = await request.arrayBuffer()


   const ai = new Ai(env.AI)
   const input = {
     audio: [...new Uint8Array(blob)],
   }


   try {
     const response: AiVTTOutput = (await ai.run(
       '@cf/openai/whisper-tiny-en',
       input
     )) as any
     return Response.json({ vtt: response.vtt })
   } catch (e) {
     const errMsg =
       e instanceof Error
         ? `${e.name}\n${e.message}\n${e.stack}`
         : 'unknown error type'
     return new Response(`${errMsg}`, {
       status: 500,
       statusText: 'Internal error',
     })
   }
 },
}

Quickly captioning videos at scale

The Stream team wanted to ensure this feature is fast and performant at scale,   which required engineering work to process a high volume of videos regardless of duration.

First, our team needed to pre-process the audio prior to running AI inference to ensure the input is compatible with Whisper’s input format and requirements.

There is a wide spectrum of variability in video content, from a short grainy video filmed on a phone to a multi-hour high-quality Hollywood-produced movie. Videos may be silent or contain an action-driven cacophony. Also, Stream’s on-demand videos include recordings of live streams which are packaged differently from videos uploaded as whole files. With this variability, the audio inputs are stored in an array of different container formats, with different durations, and different file sizes. We ensured our audio files were properly formatted to be compatible with Whisper’s requirements.

One aspect for pre-processing is ensuring files are a sensible duration for optimized inference.  Whisper has an “sweet spot” of 30 seconds for the duration of audio files for transcription. As they note in this Github discussion: “Too short, and you’d lack surrounding context. You’d cut sentences more often. A lot of sentences would cease to make sense. Too long, and you’ll need larger and larger models to contain the complexity of the meaning you want the model to keep track of.” Fortunately, Stream already splits videos into smaller segments to ensure fast delivery during playback on the web. We wrote functionality to concatenate those small segments into 30-second batches prior to sending to Workers AI.

To optimize processing speed, our team parallelized as many operations as possible. By concurrently creating the 30-second audio batches and sending requests to Workers AI, we take full advantage of the scalability of the Workers AI platform. Doing this greatly reduces the time it takes to generate captions, but adds some additional complexity. Because we are sending requests to Workers AI in parallel, transcription responses may arrive out-of-order. For example, if a video is one minute in duration, the request to generate captions for the second 30 seconds of a video may complete before the request for the first 30 seconds of the video. The captions need to be sequential to align with the video, so our team had to maintain an understanding of the audio batch order to ensure our final combined WebVTT caption file is properly synced with the video. We sort the incoming Workers AI responses and re-order timestamps for a final accurate transcript.

The end result is the ability to generate captions for longer videos quickly and efficiently at scale.

Try it now

We are excited to bring this feature to open beta for all of our subscribers as well as Pro and Business plan customers today! Get started by uploading a video to Stream. Review our documentation for tutorials and current beta limitations. Up next, we will be focused on adding more languages and supporting longer videos.