How SmugMug Increased Data Modeling Productivity with Amazon Q Developer

Post Syndicated from Will Matos original https://aws.amazon.com/blogs/devops/how-smugmug-increased-data-modeling-productivity-with-amazon-q-developer/

This post is co-written with Dr. Geoff Ryder, Manager, at SmugMug.

Introduction

SmugMug operates two very large online photo platforms: SmugMug and Flickr. These platforms enable more than 100 million customers to safely store, search, share, and sell tens of billions of photos every day. However, the data science and engineering team at SmugMug and Flickr often faces complex data modeling challenges that require significant time to resolve.

These challenges arise due to several factors. First, the team has to contend with diverse datasets from different sources. Additionally, the database schema and tables are highly complex, and the team needs to quickly understand application (PHP) code and database table structures in order to generate the necessary complex database queries. Specifically, SmugMug uses Amazon Redshift as its cloud data warehouse to analyze patterns in petabyte-scale data stored in Amazon S3, as well as transactional data in Amazon Aurora and Amazon DynamoDB. This allows them to generate dozens of business reports daily.

However, the complexity increases further as many database tables also need to be imported from third-party organizations into Amazon Redshift, where they are joined with SmugMug and Flickr’s internal tables. In extreme cases, properly modeling all these database tables and handling issues like granularity, cardinality, timestamps and missing data could take years – an impractical timeline for the business. We are excited to walk through SmugMug’s data modeling use cases and how SmugMug uses Amazon Q Developer to improve the data science and engineering team’s productivity.

Discovering Amazon Q Developer

SmugMug was one of the first customers to pilot Amazon Q Developer (previously Amazon CodeWhisperer), the most capable AI-powered assistant for software development that re-imagines the experience across the entire software development lifecycle, making it easier and faster to build, secure, manage, optimize, operate, and transform applications on AWS. There are multiple Amazon Q Developer use cases at SmugMug and Flickr, such as using Amazon Q Developer agent (/dev) for software development (i.e. generating implementation plans and the accompanying code), generating inline code suggestions, asking Amazon Q Developer in chat about AWS services and best practices, and analyzing AWS usage and costs for Cloud Financial Management (CFM) needs. For the data science and engineering team specifically, the key feature is chatting with Amazon Q Developer in integrated development environments (IDEs) like Intellij DataGrip. The data analysts and data scientists at SmugMug and Flickr ask questions in Amazon Q Developer chat to analyze database schemas, generate data model diagrams from DDL (Data Definition Language) statements, convert queries between languages, automatically generate complex database queries for data analysis, generate code to validate table contents, and predict trends using ML (Machine Learning).

Implementing Amazon Q Developer

To solve the data modeling challenges SmugMug faced, the team collaborated closely with their AWS Account Team, AWS Professional Services, and the Amazon Q Developer service team to create and test a data modeling assistant solution using Amazon Q Developer.

As a first step, the data modeler needs to bring the right metadata to bear. For simpler cases, the commands “show view myschema.v” or “show table myschema.t“ retrieve DDL schema information about the specified view or table from Amazon Redshift into the IDE console.

Here’s an example using simulated data for a hypothetical company. For this typical company that handles orders for products, the result of typing “show table sample.orderinfo” and “show table sample.skuinfo”might be:

Image of SQL statement generated by the show table statement. "CREATE TABLE sample.skuinfo ( sku_id bigint ENCODE raw, sku_vendor bigint ENCODE az64, sku_category character varying(18) ENCODE lzo, sku_description character varying(255) ENCODE lzo, date_sku_created timestamp without time zone ENCODE az64, date_sku_updated timestamp without time zone ENCODE az64, pipeline_inserted_at timestamp without time zone ENCODE az64 ) DISTSTYLE KEY SORTKEY ( sku_id );"

Image of SQL statement generated by the show table statement. "CREATE TABLE sample.orderinfo ( order_id bigint ENCODE raw, shipper_id bigint ENCODE az64 distkey, product_id bigint ENCODE az64, quantity_ordered integer ENCODE az64, date_order_placed timestamp without time zone ENCODE az64 ) DISTSTYLE KEY SORTKEY ( order_id );"

This DDL text is now in the open tab. By selecting the text to highlight it, that DDL text becomes part of the context that Amazon Q Developer sees. The modeler can start asking questions about them in the Amazon Q Developer chat window in the IDE.

Diagram showing what is considered part of the context included in a request including the RAG query result, related documents when using the at-workspace key word, the highlighted text in the IDE open tab,the chat history, and the prompt.

In complex scenarios, establishing the correct modeling context requires a combination of schema information, legacy SQL, application source code in various programming languages, sample values, and natural language documentation. Amazon Q Developer addresses this by creating a local index of relevant files and content. When a question is asked using @workspace, this index is consulted to identify and include pertinent sections of code and information in the request. (See this article for additional details on workspace). The prompt plays a crucial role in measuring similarity, so providing comprehensive context within it is essential. To optimize this process, the IDE settings feature a tunable workspace index function, allowing for enhanced performance in identifying and incorporating relevant context.

Image showing the Amazon Q Settings window where you enable the Workspace feature by checking the "Workspace index" box. You can also change the number of worker threads used, and the maximum workspace index size in MB.

Workspace Index Settings

By adopting Amazon Q Developer as a team, we are able to jointly develop and share proprietary prompt text to address the four steps in our modeling process, as follows.

Step 1. Define the goal for the data modeling project

From prior knowledge, sketch a high-level goal for a data model. Gather the data for it manually, or by e.g. querying a vector database and adding its documents to the project.

For this example, we choose as the goal to compute aggregated metrics from a new table or view composed of two existing tables, sample.orderinfo and sample.skuinfo. These contain simulated data about product sales that are common to many companies. The order table is in the style of a fact table that logs customer orders, and the stock keeping unit (SKU) table is a dimension table that provides additional data points of interest about each order. The order and SKU information need to be combined by a join operation before we can compute the metrics. We would like Amazon Q Developer to tell us how to write that SQL join statement.

Step 2. Conduct an exploratory analysis and generate candidates

Next, prompt Amazon Q Developer for candidate foreign keys to join the tables, and for SQL code to execute those joins. Generate an entity-relationship diagram (ERD) as a visual aid. Prompts do not have to be complicated. For example:

@workspace What columns of database tables sample.orderinfo and sample.skuinfo 
would be best to join the two tables? Provide SQL code for the join. Draw an 
entity relationship diagram that shows the joins between the two tables, and 
includes only the fields involved in the join. Add a crow's foot cardinality 
marker to indicate a 1:many relationship, and add it next to the high 
cardinality table.

Image with the first part of the response to the prompt with the following text: "Based on the table schemas, sku_id is the appropriate column to join these tables. The relationship is likely one-to-many (1:M) where one SKU can appear in multiple orders. Here's the SQL join: SELECT o.order_id, o.sku_id, s.sku_description FROM sample.orderinfo o JOIN sample.skuinfo s ON o.sku_id = s.sku_id;

Image with the second part of the response to the prompt with the ASCII relationship diagram showing the join relationship.

Each time tables are joined together, new aggregated metrics become available to drive business insights. Now, for instance, we can find the top selling SKUs in October thanks to our results:

Image shows the top 5 results from the prior query showing the top skus in October.

Sometimes we need to look at code written in languages other than SQL to complete the data model. For example, the names of some vendors this company works with happen to appear in application PHP code as human readable strings, but are saved in the application database as numbers. The analytics data staged in Redshift only contain the numbers. So, we pull a copy of the PHP text file into @workspace, and ask Amazon Q Developer to translate the relevant string-integer mappings into a SQL case statement.

Image shows the selected PHP code with a switch statement mapping Vendor Ids to Vendor Names.

PHP Switch statement showing the mapping of Vendor Ids to String Names.

I am a Redshift database administrator and I am working on a data modeling 
problem. I would like to write SQL statements to join tables sample.orderinfo 
and sample.skuinfo. Please write that SQL to join the two tables. Also, I 
would like to write a SQL case statement to recover all string values defined 
in PHP that are represented as integer values in the database table.

The output of that prompt is shown below.

Image showing the updated SQL query that maps the Vendor Id to the Vendor Name.

Amazon Q Developer automatically detected the PHP switch case statement, converted to SQL, and added it to the final query. Many other programming languages are supported, and modelers should try this technique with other kinds of source code. Note that data scientists and analysts may not know where to look in complex application code for these details, so this discovery-plus-code translation step is a net new benefit to our company that is only possible thanks to Amazon Q Developer.

Step 3. Create code to test the analysis

Now we request SQL source code for a battery of small test queries. These can return cardinality, grain, arithmetic, and null count results.

Please write a short SQL test to compute counts of the key fields that are used 
in the joins, which will verify the cardinality assignments indicated in the 
entity relationship diagram above. The SQL test should compare distinct counts 
to total counts and null counts when it verifies the cardinality.

Image of resulting SQL queries to check cardinality.

Step 4. Validate the results of the analysis

Run the test queries to see if the candidate solution from step 2 meets our goals. The “Insert at cursor” button at the bottom of the response is handy for this. The data modeler can easily spot an error in the join logic and ERD from inspecting the output of the test query. (Or, if it’s hard to interpret the results, keep making the test queries simpler.) If errors arise from the AI misinterpreting or miscalculating a result, or from a vaguely worded prompt, simply adjust the prompt in step 2 to fix the known errors, and repeat steps 2 – 4.

Image showing the query results from the cardinality query.

After a few iterations, taking from seconds to at most tens of minutes each, the modeling errors have been worked out and we arrive at a valid production query.

Key Benefits and Results

With this Amazon Q Developer powered solution and iterative approach, SmugMug has achieved highly accurate data modeling results across numerous database tables. Once the correct modeling configuration is established, various useful outputs may become available.

We already described production SQL, unit tests, and ERDs for documentation. By the end of the process, because Amazon Q Developer has a good understanding of the data it just modeled in its chat history, it will also generate useful Python machine learning programs to predict business trends. Here is a prompt for that, and a partial screenshot of the Python output:

Please write Python code to implement a linear regression that predicts the 
quantity_ordered value based on other fields in the data set. Choose predictor 
variables that are less likely to cause multi-collinearity problems.

Image showing the python code generated to predict quantity_ordered value.

This only shows the model training step, but the full response included all library imports, a Redshift query, feature engineering steps, ML performance metrics, and code for plotting the metrics. And the AI can produce other types of predictive models. For example, you can try:

Please write Python code to implement an XGBoost model that predicts the 
quantity_ordered value based on other fields in the data set.

Ultimately, the solution has improved team productivity for both existing and new team members, while maintaining legacy knowledge needed to onboard new team members more efficiently. Key benefits include:

  1. Reducing SmugMug data analyst and scientist’s time spent on data modeling tasks from days to hours, allowing them to reallocate this time to other high-priority projects.
  2. Automating the generation of BI documentation and predictive ML, also saving crucial time.
  3. Providing net new value by translating application code constant definitions into SQL. Due to organizational boundaries, we would not have achieved this without an assist from the AI.

Future Plans and Expansion

SmugMug conducted the initial data modeling use case testing with over a dozen data science team members and analysts. We are moving on to analyze more complex tables and data schemas, and generating Python code in Amazon SageMaker for ML tasks like data preparation, training, inference, and MLOps. From our experience, Amazon Q Developer has become a preferred internal tool for development that has a data modeling component, and its use continues to expand to different groups around the company.

For SmugMug’s data modeling projects, we continue to enhance the four-step process described above. In order to gather the most relevant context to solve a problem, we build vector database collections to pull from schemas, older SQL code, application source code, BI tool content, and curated documentation. The vector search operation surfaces the right content, and spares data modelers from manually searching in different code archives. We use ChromaDB to do the searches, and bring the results from ChromaDB into the workspace as additional files.

Conclusion

Using Amazon Q Developer for data modeling use cases, SmugMug has managed to increase data science and engineering team productivity by up to 100% when compared to prior workflows. To explore how Amazon Q Developer can benefit your organization, get started here. If you have questions or suggestions, please leave a comment below.

About the Authors

Image of Dr. Geoffrey Ryder

Dr. Geoffrey Ryder

Dr. Geoff Ryder serves as the Manager of Data Science and Engineering at SmugMug, where he leads Team Prophecy in managing the company’s cloud-based data warehouse and analytics platforms. With a focus on leveraging the best AI tools, his team empowers photography clients to enhance their sales of both physical and digital photographic products. Geoff brings over two decades of experience in technical and business roles across Silicon Valley companies, and holds a PhD in Computer Engineering from UC-Santa Cruz.

Will Matos

Will Matos is a Principal Specialist Solutions Architect at AWS, revolutionizing developer productivity through Generative AI, AI-powered chat interfaces, and code generation. With 25 years of tech experience, and over 9 years with AWS, he collaborates with product teams to create intelligent solutions that streamline workflows and accelerate software development cycles. A thought leader engaging early adopters, Will bridges innovation and real-world needs.

Sreenivas Adiki

Sreenivas Adiki is a Sr. Customer Delivery Architect in ProServe, with a focus on data and analytics. He ensures success in designing, building, optimizing, and transforming in the area of Big Data/Analytics. Ensuring solutions are well-designed for successful deployment, Sreenivas participates in deep architectural discussions and design exercises. He has also published several AWS assets, such as whitepapers and proof-of-concept papers.

Kevin Bell

Kevin Bell is a Sr. Solutions Architect at AWS based in Seattle. He has been building things in the cloud for about 10 years. You can find him online as @bellkev on GitHub.

Corey Keane

Corey Keane is a Media and Entertainment (M&E) Sr. Account Manager at AWS. Corey has held a number of positions at Amazon and AWS throughout his 8 years with the company across M&E—including technical business development for strategic partnerships with international game developers, in addition to his current role managing AWS customers in the Media vertical. He leans on his pan-Amazon experience from working on other teams to identify new partnerships between our customers and other Amazon businesses to bring disruptive products to market.

Security updates for Tuesday

Post Syndicated from corbet original https://lwn.net/Articles/999744/

Security updates have been issued by Debian (pypy3), Fedora (chromium, cobbler, and libsoup3), Oracle (kernel), SUSE (glib2, govulncheck-vulndb, javapackages-tools, xmlgraphics-batik, xmlgraphics- commons, xmlgraphics-fop, libblkid-devel, opentofu, php8, postgresql, postgresql16, postgresql17, thunderbird, traefik, and ucode-intel), and Ubuntu (needrestart and rapidjson).

Open Source: The Option for a Connected and Collaborative World

Post Syndicated from Luciano Alves original https://blog.zabbix.com/open-source-the-option-for-a-connected-and-collaborative-world/29237/

In my previous article, where we explored the TCO and ROI of open-source software, I raised topics that sparked substantive discussions, new research, and renewed insights. It is undeniable that we live in an era where collaboration and connectivity go beyond trends. They represent the foundation of current technology, especially in a world based on APIs.

In this context, open-source software stands out and positions itself as a logical and natural choice for companies and organizations (both public and private) that seek innovation, flexibility, security, and agility. Over the last two decades, the technology sector has validated this direction. Recently, the Open Source Program Office (OSPO) appeared in Gartner’s Hype Cycle for Emerging Technologies report, reinforcing its relevance and emerging as a maturing trend within 2 to 5 years.

Open Source in Gartner’s Hype Cycle

Gartner’s Hype Cycle for Emerging Technologies is a well-known tool for illustrating the phases of maturity, adoption, and impact of new technologies. In the current cycle, the Open Source Program Office (OSPO) appears as an emerging technology with the potential for corporate transformation in the coming years.

This highlights that it is not only a viable alternative to proprietary software, but an engine of innovation within organizations. The OSPO is, essentially, an internal structure in companies dedicated to promoting and managing the use of open-source software, ensuring compliance and governance.

With the strengthening of these structures, organizations not only maximize the benefits of open source but also foster a culture of continuous innovation and active collaboration with communities, whether through service contracts, participation in working groups, or even funding new functionalities.

A Natural Strategic Choice

Experience shows that open source is a strategic path for organizations aiming to thrive in an increasingly interconnected and competitive market. The transparency, flexibility, and scalability offered by such solutions surpass the limitations of proprietary solutions, facilitating a more adaptable and agile adoption.

Additionally, the collaborative approach of this model aligns with today’s reality, where knowledge sharing and co-creation are essential for technological development within organizations. Companies like Google, Microsoft, and Red Hat have already recognized this reality and invest in their own Open Source Program Offices. These initiatives not only underline the commitment to open innovation but also highlight tangible benefits in terms of efficiency, cost reduction, and speed in the development of innovations.

The Future is Open Source

The inclusion of OSPO in Gartner’s Hype Cycle indicates that companies that have not yet embarked on this journey need to reconsider their strategies. In an environment where constant adaptation and innovation are essential for growth and efficiency, open source has ceased to be optional and has become a necessity. As adoption expands across various sectors and applications, companies that build a solid framework for evaluating and maximizing the benefits of these technologies will be in a privileged position to lead their markets.

At Zabbix, we understand the importance of open source not just as a technological solution, but as a philosophy aimed at democratizing technology, fostering continuous innovation, and cultivating a culture of collaboration—a vision that OSPOs have been solidifying in companies across multiple industries. The discussion about the Total Cost of Ownership (TCO) and Return on Investment (ROI) in open-source solutions is just the starting point.

Tools like Zabbix prove that this is an effective strategy for monitoring and maintaining critical environments. Open source is, and will continue to be, the driving force behind the innovations that will transform the way companies sustain their businesses and interact with customers and users. The future is already open source, and the time to embrace this transformation is now.

The post Open Source: The Option for a Connected and Collaborative World appeared first on Zabbix Blog.

What Graykey Can and Can’t Unlock

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/11/what-graykey-can-and-cant-unlock.html

This is from 404 Media:

The Graykey, a phone unlocking and forensics tool that is used by law enforcement around the world, is only able to retrieve partial data from all modern iPhones that run iOS 18 or iOS 18.0.1, which are two recently released versions of Apple’s mobile operating system, according to documents describing the tool’s capabilities in granular detail obtained by 404 Media. The documents do not appear to contain information about what Graykey can access from the public release of iOS 18.1, which was released on October 28.

More information:

Meanwhile, Graykey’s performance with Android phones varies, largely due to the diversity of devices and manufacturers. On Google’s Pixel lineup, Graykey can only partially access data from the latest Pixel 9 when in an “After First Unlock” (AFU) state—where the phone has been unlocked at least once since being powered on.

Особеностите на българския консерватизъм

Post Syndicated from original https://www.toest.bg/osobenostite-na-bulgarskiya-konservatizum/

Аз съм леберал, от лебералната партия (…) Абе, бай Иречек, я ми кажи твоя милост леберал ли си, консерватор ли си? Май-май, че си консерва, както виждам.


Особеностите на българския консерватизъм

Из „Бай Ганьо“ от Алеко Константинов

Признавам си, че от малък изпитвам силни симпатии към консерватизма и вината за това донякъде е на горния цитат от Алеко Константинов, който наред със семейната среда създаде превратна представа за идеологията в още юношеската ми психика – на добре възпитани, добре образовани, тихи хора, мъчещи се да донесат просвещението сред широките простонародни маси. По-късно през живота си оцених вече на рационално ниво ползата от консервативното мислене, понякога служещо като сито на здравия разум срещу някои по-дръзки и невъздържани идеи, макар на емоционално ниво да се дразнех от постоянното поклащане с пръст към едно или друго. 

За съжаление обаче, в последните години забелязах, че в разговора, който водим за консерватизма, съдържанието на понятието се измени драстично и семиотичното му значение отскочи от Константин Иречек към Бай Ганьо. Започна да се приема за аксиома, че българите са консервативен народ, като това се изразява в привързаността към техните традиции и в отхвърлянето на новото, а изискванията за прилично поведение и интелектуална подготовка някак отпаднаха като демоде, едва ли не като либерални, да не кажа джендърски. (Някой сполучливо беше написал във Facebook, че у нас добрите обноски вече са толкова редки, че жените ги възприемат като флирт, а мъжете – за проява на хомосексуалност.) 

Разбира се, нормално е значението на думите да се променя с времето, като най-болезненият за мен пример е с думата експертиза, която от оценка от специалист започна да става синоним на компетентност, не без усилията и на Бойко Борисов, вероятно използвал думата, подведен от значението ѝ в английския език. Затова ми се ще да разгледам какво точно представлява българският консерватизъм, дали наистина нашенци са чак толкова привързани към тази, общо взето, строга и изискваща доктрина, или отново говорим някак наизуст, „алангро“, както веднъж изтърси Кирил Петков.

Кратка история на българския консерватизъм

Българските консерватори всъщност се появяват след Освобождението на България като политическа партия, която се отнася със скептицизъм към компетентността на още девствения откъм демократични порядки народ да се управлява напълно самостоятелно, и се опитва да ограничи силата му с различни предложения, повечето от които отхвърлени. Мисля, че е емблематичен примерът за красноречивата защита на Константин Стоилов за двукамарен парламент, посечена от Петко Р. Славейков, умело успял да ѝ се присмее със звучна българска пословица – ако не се лъжа за жабата, която видяла как подковават вола, и също подала крак. Това обрича консерваторите да са партия на малцинството, разчитаща на взаимодействие с княза, и донякъде затова доскоро бях останал с впечатлението, че е нормално хората с такива убеждения да се родеят по-скоро с образованото малцинство, а не с широките (просто)народни маси. Днес „инак е наредено“, ако мога отново да цитирам Алеко, но за това – след малко.

Българските партии доста се изменят в историята си до Девети септември, а след това са забранени от комунистите – злощастен край, който получава ново начало след Десети ноември. Но и тогава първоначално с консерватизма по-скоро свързвахме сякаш Демократическата партия на покойния Стефан Савов (иначе произхождаща исторически от либералите) и по-късно ДСБ на бившия премиер Иван Костов, която целенасочено търсеше прилика с британските тори и също толкова упорито си бе изградила имидж на сила, включваща знаещи и можещи хора над средното ниво, макар че така трудно се харесваха от избирателите.

Ситуацията започна да се променя с появата на политическа партия ГЕРБ, която, въпреки уверенията на Бойко Борисов, че е „центристко-дясна“, започна донякъде да се отъждествява с консерватизма. Спомням си, че навремето това провокира една наша журналистка да напише подигравателна статия как българските тори празнуват след победата си на избори с чалга. Оттам започна промяната на същностното разбиране що е то консерватизъм, като към него по-късно се присъединиха политически сили като „Ред, законност и справедливост“ на Яне Янев (мутирала от НС–БЗНС, на която Янев беше председател – наистина втрещяваща метаморфоза) и ВМРО на Красимир Каракачанов, харесала си „Европа на нациите“ на паметните европейски избори, когато се яви в коалиция с „България без цензура“ на Николай Бареков – още едно интересно решение, тъй като консерватизмът всъщност предполага известна цензура. Но млъкни, сърце!

Днес нещата са се видоизменили напълно и като консерватори се възприемат донякъде ГЕРБ и най-вече крайнодесните партии, като „Величие“, „Възраждане“ и МЕЧ, а понякога и Българската социалистическа партия, което вече действително е български принос към световната теория на политиката. Единственото изключение е партия КОД на Петър Москов, която изглежда старомодно консервативна, с високообразовани кадри и кандидати, но поради това и обречена на изключително мижав интерес от страна на избирателите. Понякога изглежда, че тя е урочасала сама себе си, целейки се в изключително тънката прослойка на интелигентни хомофоби – хора, за които не съм сигурен, че наистина съществуват. Неслучайно споменавам хомофобията, тъй като тя всъщност е основен аргумент за консерватизма на една или друга политическа ценност, наред с омразата към либералите. Тези неща обаче идват от Америка.

Либерали срещу консерватори, либерали и консерватори, либерали = консерватори?!

САЩ е страната, където либералите се отъждествяват с лявото и са съставна част на Демократическата партия, като се борят за правата на жените и различни маргинализирани групи – афроамериканци, испаноговорещи, ЛГБТ+. Консерваторите са главно в Републиканската партия и се отъждествяват с пазенето на традицията, защитата на семейните ценности и уповаването на християнската вяра за напътствия в политиката. Именно в тази страна съществува ожесточено противопоставяне между либерали и консерватори, което вдъхновява ситуацията и в други държави, например Южна Корея, където либералната и консервативната партия водят битка за властта и ожесточена културна война за и против феминизма, или пък Канада, където отново политически сили с тези две имена спорят за правото да управляват страната, макар всъщност идейно да не са чак толкова различни една от друга. Оттам сякаш се възприема и определението за либерали и консерватори у нас, но то дава дефекти, тъй като всъщност България не е част от тази политическа система.

В Европа под либерали се възприема друго – хора, които са за свобода в търговията и минимални регулации на държавата в икономиката, и това всъщност ги изпраща надясно. В редица държави тези партии са естествен съюзник на консерваторите в спора им със социалдемократите, заемащи ролята на левица в повечето страни на Стария континент. Либерали от този вид съществуват във Великобритания, в Германия, такава е и партията на Еманюел Макрон във Франция. В САЩ такива либерали биха били по-близки до малката партия на либертарианците – с уговорката, че в тази страна цялата политическа рамка е преместена доста по-надясно, макар и основно в икономически план. Оттам обаче идва любимият ми девиз на въпросния тип либерализъм – „вън от портфейла и вън от спалнята ви“.

Картината се усложнява допълнително от това, че в Австралия либералите са основна част от дясна коалиция и всъщност заемат ролята на консерватори, борещи се срещу левите лейбъристи. Управляващата в Япония консервативна партия „Джиминто“ също буквално се нарича Либералдемократическа партия. Впрочем това правеше и ултранационалистическата партия на Жириновски, макар последното да може да се определи по-скоро като куриоз. 

В България либерална партия от европейски тип поне на хартия беше старото единно Движение за права и свободи, тъй като се предполагаше, че то се бори за права на малцинствата и за малко регулация в икономиката. Това го правеше естествен член на европейската либерална партия АЛДЕ, ако успеем да се абстрахираме от националните специфики на българската политическа почва (по Димитър Ганев), включваща обръчи от фирми и клиентелизъм. Наскоро партия „Продължаваме промяната“ потърси членство в „Обнови Европа“ – групата на АЛДЕ в Европейския парламент. Може да се каже, че предвид по-социалната политика на Асен Василев, донякъде е възможно да говорим за либерална партия от американски тип – с уговорката, че Кирил Петков се самоопредели като традиционалист по отношение на въпроси като гей браковете, поради което не трябва да бързаме с категоризацията. След решението си за членство в ЕНП Движение „Да, България“ е по-скоро в център-дясното пространство, а ДСБ си остават консерватори, макар и, изглежда, от по-умерения тип, характерен например за Канада. Последното оголва българския консерватизъм на националистите.

Колко точно сме консервативни българите 

Един от споровете, предизвикал най-нажежени дискусии по оста либерално–консервативно през последните години, е за самоопределянето на транс хората в пол, различен от биологично зададения. В този ред на мисли е добре да се отбележи, че много хора у нас са само от консервативен джендър, без реално да са от това политическо семейство. Позволявам си тази оценка заради някои специфики на нашето общество, които следва да бъдат отбелязани.

На първо място, макар голям брой българи да се определят като православни, част от тях не вярват в Бога, а възприемат религията като белег на националното, нещо като духовна версия на кюфтетата по чирпански. Още по-малък е броят на хората, които редовно ходят на църква и споделят тайнството на евхаристията. Вместо това повечето българи са така наречените ВиК християни – посещават храма на Великден и Коледа, като, за съжаление, в случая с първия празник често се оставят следи от яйчени черупки, трохи от козунаци и дори пластмасови бутилки от бира в дворовете на църквите. Разбира се, аз не съм фанатик и приемам правото на всеки да вярва по свой начин, даже смятам, че в интуитивното познание за Твореца понякога има повече истина от суровите догми на едно или друго вероизповедание. Добре е обаче човек да има предвид, че ако просто смята как горе на небето „има нещо“, не е добре да се пише ревностен християнин и консерватор, тъй като се въвежда в самозаблуда.

На второ място, голям брой българи живеят в безбрачие или изобщо нямат връзка. Отново, това по никакъв начин не е упрек към хората. Живеем в XXI век, двама възрастни могат сами да преценят дали желаят да документират връзката си пред Църквата или държавата, решение на всеки е дали изобщо е създаден за моногамен живот. Отново обаче, малко ексцентрично е някой да се пъчи като заклет консерватор и в същото време да е стар ерген със слаб ангел или да живее щастливо без брак. За хората, които тайно са хомосексуални, но се пишат за консервативни и облъчват с хомофобия, няма да кажа нищо, само ще препоръчам филма „Американски прелести“ с Кевин Спейси – намирам го за доста поучителен.

На трето място, българите, ако наистина се считат за консерватори, трябва да се замислят за някои неща. У нас, според мен за щастие, абортите са легални и въпреки напъните на някои от идеолозите, торпилирали Истанбулската конвенция, не се очертава скоро да бъдат забранени. Хазартът процъфтява – хората редовно залагат на всевъзможни мачове, доскоро рекламите на различни компании от този отрасъл ни облъчваха ежечасно до степен, че на голямо първенство по футбол човек не можеше да види едно клипче за студена бира. Сега нещата са малко по-добре, но комарът все още остава опасно популярен. Както и чалгата – музика, която е доста разкрепостена като визия и послания, но се радва на любов от страна на населението, включително и от някои защитници на традиционните ценности. Ще се потретя: нека всеки слуша каквото си иска, но да се препоръчва после като строг консерватор е смехотворно.

Същината на така наречения консерватизъм

В действителност в бума на „консерватори“ у нас се коренят две неща, като едното донякъде произхожда от другото. Така нареченият ни консерватизъм всъщност си е чист национализъм, който не изисква абсолютно нищо от симпатизантите си като ниво, образование и възпитание, а просто им казва, че са по-добри от останалите заради това как са се родили – българи, хетеросексуални и православни. В резултат това ги учи да се противопоставят на всичко космополитно, което ни свързва с останалия свят – от съвременното кино до празници като Хелоуин, но също и на български творци като Господинов, получили признание навън, независимо от вкаменели нашенски институции, като Съюза на българските писатели. 

Второто нещо е хомофобията, представена като защита на традиционните ценности. Приемането на различните и самите различни са представяни като пъклен план на Запада срещу българщина̀та. Но това всъщност е символ на променящия се свят, който националистите сякаш вярват, че могат да замразят във времето с престорения си консерватизъм. Това идва, естествено, от пропагандата на руснаците, дърпащи конците на много от така наречените български националисти и консерватори, и е част от плана им да ни засмучат обратно в техния руски свят на преклонение пред авторитарния вожд. Но първо, няма как да успее – освен ако не дойде с танковете на нов Девети септември, и второ, хората нямат идея колко е прекалено, погледнато отвън.

Защото българите са хомофоби не само в сравнение с калифорнийците или скандинавците, но дори с тексасците, поляците или жители на страни като Аржентина, където нивата на приемане са по-високи от българските, независимо от ширещата се бедност. Дори републиканците в САЩ по-скоро се фокусират върху специфични и поне според мен действително спорни практики, като участието на транс атлети в категории, съответстващи на техния джендър (а не биологичен пол), и Джей Ди Ванс, тогава все още кандидат, а не избран вицепрезидент, весело обяви преди изборите, че очаква да спечели подкрепата на „нормалните гейове“.

Ние обаче следваме руската хомофобия и руската идея за консерватизъм, дело на един политически елит, който по всяка вероятност сам не вярва на глупостите, които бръщолеви – поне ако съдим по това къде живеят и какво говорят децата му. Наскоро дъщерята на кремълския говорител Песков жизнерадостно съобщи от Париж, че подкрепя еднополовите бракове.

За сметка на това президентът Путин, лидер на глобалната версия на новия националконсерватизъм (последното звучи малко като Oбществото на плоската Земя, което имало симпатизанти по целия глобус), смята, че най-голямата геополитическа трагедия в историята е разпадът на СССР – страната, в която рушаха църкви, избиваха свещеници и подготвяха процеси срещу Бога.

Какво пък, на този фон омразата към Хелоуин изглежда направо безобидна.

Implementing transactions using JMS2.0 in Amazon MQ for ActiveMQ

Post Syndicated from Chris McPeek original https://aws.amazon.com/blogs/compute/implementing-transactions-using-jms2-0-in-amazon-mq-for-activemq/

This post is written by Paras Jain, Senior Technical Account Manager and Vinodh Kannan Sadayamuthu, Senior Specialist Solutions Architect

This post describes the transactional capabilities of the ActiveMQ broker in Amazon MQ by using a producer client application written using the Java Messaging System(JMS) 2.0 API. The JMS 2.0 APIs are easier to use and have fewer interfaces than the previous version. To learn about ActiveMQ’s JMS 2.0 support, refer to the ActiveMQ documentation on JMS2.0. Also check out What’s New in JMS 2.0 to learn more about features in JMS2.0.

Amazon MQ now supports ActiveMQ 5.18. Amazon MQ also introduces a new semantic versioning system that displays the minor version (e.g., 5.18) and keeps your broker up-to-date with new patches (e.g., 5.18.4) within the same minor version. ActiveMQ 5.18 adds support for JMS 2.0, Spring 5.3.x, and several dependency updates and bug fixes. For the complete details, see release notes for the Active MQ 5.18.x release series.

Overview

Messaging Patterns in Distributed Systems

Implementing messaging in a message-broker based distributed messaging often involves a fire-and-forget mechanism. Message producers send the messages to the broker and it is message broker’s responsibility to ensure that the messages are delivered to the consumers. In non-transactional use cases, the messages are independent of each other. However, in some situations, a group of messages needs to be delivered to consumers as part of a single transaction. This means either all the messages in the group are to be delivered to the consumer or none of those messages are delivered.

ActiveMQ 5.18 provides two levels of transaction support — JMS transactions and XA transactions.

JMS transactions are used when multiple messages need to be sent to the ActiveMQ broker as a single atomic unit. This transactional behavior is enabled by invoking the commit() and rollback() methods on a Session (for JMS 1.x) or JMSContext (for JMS 2.0) object. If all the messages are successfully sent, the transaction can be committed, ensuring that the messages are processed as a unit. If any issues occur during the sending process, the transaction can be rolled back, preventing the partial delivery of messages. This transactional capability is crucial when maintaining data integrity and ensuring that complex messaging operations are executed reliably. See ActiveMQ FAQ – How Do Transactions work FAQ for more details on how transactions work in ActiveMQ.

XA transactions are used when two or more messages need to be sent to ActiveMQ brokers and other distributed resources in a transactional manner. This is achieved by using an XA Session, which acts as an XA resource. See ActiveMQ FAQ – Should I use XA transactions FAQ for more details on XA transactions.

Transactional use case in an Order Management System

The example in this blog post shows the transactional capabilities in an Order Management System (OMS) application, using ActiveMQ as the message broker. Upon receiving an order, the OMS application sends a message (message 1) to the warehouse queue to start the packing process. Then the application runs an internal business process. If this process is successful, the application sends another message (message 2) to the shipping queue to start the package pickup process. In the event of internal business process failure, it is necessary to prevent message 2 from being sent to the shipping queue and rollback message 1 from the warehouse queue.

The flowchart below illustrates the logic behind the transactional use case featured in this example.

Flowchart illustrating the logic behind the transactional use case in the code example. Demonstrates flow for successful as-well-as failed transaction.

Flowchart describing transactional use case.

The JMS client stores both messages in-memory until the transaction is committed or rolled back. The client achieves this by maintaining a Transacted Session between the message producer client and the broker. A transacted session is a session that uses transactions to ensure message delivery. In our example, transacted session is created using the following statement.

JMSContext jmsContext = connectionFactory.createContext(adminUsername, adminPassword, Session.SESSION_TRANSACTED);

In the example for this post, we have shown a transacted session between the message producer and the broker. We are not showing transactions between the broker and the message consumer. You can implement it using the similar pattern.

Creating ActiveMQ broker

The following prerequisites are required to create and configure ActiveMQ broker in Amazon MQ.

Prerequisites:

To create a broker (AWS CLI):

  1. Run the following command to create the broker. This creates a publicly accessible broker for testing only. When creating brokers for production use, adhere to the Security best practices for Amazon MQ.
    aws mq create-broker \
        --broker-name <broker-name> \
        --engine-type activemq \
        --engine-version 5.18 \
        --deployment-mode SINGLE_INSTANCE \
        --host-instance-type mq.t3.micro \
        --auto-minor-version-upgrade \
        --publicly-accessible \
        --users Username=<username>,Password=<password>,ConsoleAccess=true
    

    Replace <broker-name> with the name you want to give to the broker. Replace <username> and <password> as per the create-broker CLI documentation. After the successful execution of the command the BrokerArn and the BrokerId is displayed on the command line. Note down these values.Creation of the broker takes about 15 minutes.

  2. Run the following command to get the status
    aws mq describe-broker --broker-id <BrokerId> --query 'BrokerState'

    Proceed to next step once the broker state is Running.

  3. Get the console URL and other broker endpoints by running the following command
    aws mq describe-broker --broker-id <BrokerId> --query 'BrokerInstances[0]’

    Note the ConsoleURL and ssl endpoint from the output.

Configuring the message producer client

The sample code in this post uses a sample message producer client written using JMS 2.0 API to send messages to the ActiveMQ broker.

  • In case of a successful transaction, the producer client sends a message to the first queue and waits for 15 seconds. Then it sends the message to the second queue and waits for another 15 seconds. Finally, it commits the transaction.
  • In case of a failed transaction, the producer client sends the first message and waits for 15 seconds. Then the code introduces an artificial failure, causing the transaction rollback.The 15 seconds wait time provides you the opportunity to verify the number of messages at broker side as the program progresses through the transaction flow. Until the producer client commits the transaction, none of the messages are sent to the broker, even for a successful transaction.

To download and configure the sample client:

  1. Get the Amazon MQ Transactions Sample Jar from the GitHub repository.
  2. To run the sample client, use the java command with -jar option which runs the program encapsulated in a jar file. The syntax for running the sample client is:
    java -jar <path-to-jar-file>/<jar-filename> <username> <password> <ssl-endpoint> <first-queue> <second-queue> <message> <is-transaction-successful> 

    Usage:
    <path-to-jar-file> – path in your local machine where you have downloaded the jar file.
    <jar-filename> – name of the jar file.
    <username> – username you selected while creating the broker.
    <password> – password you selected while creating the broker.
    <ssl-endpoint> – ssl endpoint you noted down in the step above.
    <first-queue> – name of the first queue in the transaction.
    <second-queue> – name of the second queue in the transaction.
    <message> – message text.
    <is-transaction-successful> – flag to tell the producer client if the transaction has to be successful or not.

Testing successful transactions

Following are the steps to test successful transactions with ActiveMQ:

  1. List queues and message counts in ActiveMQ console
    1. Navigate to the Amazon MQ console and choose your ActiveMQ broker.
    2. Login to ActiveMQ Web Console from URLs in Connections panel.
    3. Click on Manage ActiveMQ broker.
    4. Provide username and password used for the user created when you created the broker.
    5. Click on Queues on the top navigation bar.
    6. Check warehouse-queue and shipping-queue are not listed.
  2. Run the following command to send messages for order1 to both the queues successfully:
    java -jar <path-to-jar-file>/<jar-filename> <username> <password> <ssl-endpoint> warehouse-queue shipping-queue order1 true

    Replace the placeholders as mentioned in the command instructions above.With this command, the example producer client sends the first message to the warehouse-queue and prints the following message to the console and waits for 15 seconds.

    Sending message: order1 to the warehouse-queue
    Message: order1 is sent to the queue: warehouse-queue but not yet committed.

    During the 15 seconds wait, refresh the browser and verify that the warehouse-queue is now listed but has no pending or enqueued messages.

    After 15 seconds, the producer client sends the second message to the shipping-queue and prints the following message to the console and waits for 15 more seconds.

    Sending message: order1 to the shipping-queue
    Message: order1 is sent to the queue: shipping-queue but not yet committed.
    

    During this 15-second wait, refresh the browser window again and verify that the shipping-queue is now listed, but like the warehouse-queue, it has no pending or enqueued messages.

    Finally, the producer client commits both the messages and prints:

    Committing
    Transaction for Message: order1 is now completely committed.
    

  3. Refresh the browser and verify warehouse-queue and shipping-queue have 1 pending and enqueued message each. The list will look like below:Image shows example of queues with message count.Image showing the shipping and warehouse queues

Repeat this process for testing more successful transactions.

Testing failed transactions

  1. Note down the beginning number of pending and enqueued messages in each of the queues.
  2. Run the following command and pass false for <is-transaction-successful> to introduce an artificial failure.
    java -jar <path-to-jar-file>/<jar-filename> <username> <password> <ssl-endpoint> warehouse-queue shipping-queue failedorder1 false

    Replace the placeholders as mentioned in the initial command instructions above.With this command, the example producer client sends the first message to the warehouse-queue and prints the following message to the console and waits for 15 seconds.

    Sending message: failedorder1 to the warehouse-queue
    Message: failedorder1 is sent to the queue: warehouse-queue but not yet committed.
    

    During the 15 seconds wait, refresh the browser and verify that the counts in the warehouse-queue and shipping-queue are unchanged.

    Finally, the client artificially introduces a failure and rolls back the transaction and prints:

    Message: failedorder1 cannot be delivered because of an unknown error. Hence the transaction is rolled back.

  3. Refresh the browser to confirm that the counts for both the queues are unchanged. This example starts with 1 message each in each queue which remained unchanged after the failed transaction.Image shows example of shopping and warehouse queues with failed messages.Image showing shipping and warehouse queues with unchanged counts.

Note that for both the successful and unsuccessful scenarios, the messages that are sent to the queues as part of a transaction are stored in-memory at the client side. These messages are sent to the broker only when the transaction is committed.

Cleanup

  1. Delete the broker by running the following command
    aws mq delete-broker --broker-id <BrokerId>

Conclusion

In this post, you created an Amazon MQ broker for ActiveMQ for version 5.18. You also learned about the new semantic versioning introduced by Amazon MQ. ActiveMQ 5.18.x brings support for JMS 2.0, Spring 5.3.x and dependency updates. Finally, you created a sample application using JMS 2.0 API showing transactional capabilities of the ActiveMQ 5.18.x broker.

To learn more about Amazon MQ, visit https://aws.amazon.com/amazon-mq/.

Announcing future-dated Amazon EC2 On-Demand Capacity Reservations

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/announcing-future-dated-amazon-ec2-on-demand-capacity-reservations/

Customers use Amazon Elastic Compute Cloud (Amazon EC2) to run every type of workload imaginable, including web hosting, big data processing, high-performance computing (HPC), virtual desktops, live event streaming, and databases. Some of these workloads are so critical that customers asked for the ability to reserve capacity for them.

To help customers flexibly reserve capacity, we launched EC2 On-Demand Capacity Reservations (ODCRs) in 2018. Since then, customers have used capacity reservations (CRs) to run critical applications like hosting consumer websites, streaming lives sporting events and processing financial transactions.

Today, we’re announcing the ability to get capacity for future workloads using CRs. Many customers have future events such as product launches, large migrations, or end-of-year sales events like Cyber Monday or Diwali. These events are critical, and customers want to ensure they have the capacity when and where they need it.

While CRs helped customers reserve capacity for these events, they were only available just-in-time. So customers either needed to provision the capacity ahead of time and pay for it or plan with precision to provision CRs just-in-time at the start of the event.

Now you can plan and schedule your CRs up to 120 days in advance. To get started you specify the capacity you need, the start date, delivery preference, and the minimum duration you commit to use the capacity reservation. There are no upfront charges to schedule a capacity reservation. After Amazon EC2 evaluates and approves the request, it will activate the reservation on the start date, and customers can use it to immediately launch instances.

Getting started with future-dated capacity reservations
To reserve your future-dated capacity, choose Capacity Reservations on the Amazon EC2 console and select Create On-Demand Capacity Reservation, and choose Get started.

To create a capacity reservation, specify the instance type, platform, Availability Zone, platform, tenancy, and number of instances you are requesting.

future-dated-2a

In the Capacity Reservation details section, choose At a future date in the Capacity Reservation starts option and choose your start date and commitment duration.

future-dated-1a

You can also choose to end the capacity reservation at a specific time or manually. If you select Manually, the reservation has no end date. It will remain active in your account and continue to be billed until you manually cancel it. To reserve this capacity, choose Create.

future-dated-4

After you create your capacity request, it appears in the dashboard with an Assessing status. During this state, AWS systems will work to determine if your request is supportable which is usually done within 5 days. Once the systems determine the request is supportable, the status will be changed to Scheduled. In rare cases, your request may be unsupported.

On your scheduled date, the capacity reservation will change to an Active state, the total instance count will be increased to the amount requested, and you can immediately launch instances.

After activation, you must hold the reservation for at least the commitment duration. After the commitment duration elapses, you can continue to hold and use the reservation if you’d like or cancel it if no longer needed.

Things to know
Here are some things that you should know about the future-dated CRs:

  • Evaluation – Amazon EC2 considers multiple factors when evaluating your request. Along with forecasted supply, Amazon EC2 considers how long you plan to hold the capacity, how early you create the Capacity Reservation relative to your start date, and the size of your request. To improve the ability of Amazon EC2 to support your request, create your reservation at least 56 days (8 weeks) before the start date. You need to submit a request for at least 100 vCPUs for only C, M, R, T, I instance types. The recommended minimum commitment for most requests will be 14 days.
  • Notification – We recommend monitoring the status of your request through the console or Amazon EventBridge You can use these notifications to trigger automation or send an email or text update. To learn more, visit Send an email when events happen using Amazon EventBridge in the Amazon EventBridge User Guide.
  • Pricing – Future dated capacity reservations are billed just like regular CRs. It is charged at the equivalent On-Demand rate whether you run instances in reserved capacity or not. For example, if you create a future dated CR for 20 instances and run 15 instances, you will be charged for 15 active instances and for 5 unused instances in the reservation including the minimum duration. Savings Plans apply to both unused reservations and instances running on the reservation. To learn more, visit Capacity Reservation pricing and billing in the Amazon EC2 User Guide.

Now available
Future dated EC2 Capacity Reservations are now available today in all AWS Regions where Amazon EC2 Capacity Reservations are available.

Give Amazon EC2 Capacity Reservations a try in the Amazon EC2 console. To learn more, visit On-Demand Capacity Reservations in the Amazon EC2 User Guide and send feedback to AWS re:Post for Amazon EC2 or through your usual AWS Support contacts.

Channy

Breaking down CPU speed: How utilization impacts performance

Post Syndicated from Andreas Strikos original https://github.blog/engineering/architecture-optimization/breaking-down-cpu-speed-how-utilization-impacts-performance/


Introduction ⛵

The GitHub Performance Engineering team regularly conducts experiments to observe how our systems perform under varying load conditions. A consistent pattern in these experiments is the significant impact of CPU utilization on system performance. We’ve observed that as CPU utilization rises, it can lead to increased latency, which provides an opportunity to optimize system efficiency. Addressing this challenge allows us to maintain performance levels while reducing the need for additional machines, ultimately preventing inefficiencies.

Although we recognized the correlation between higher CPU utilization and increased latency, we saw an opportunity to explore the specific thresholds and impacts at various stages in greater detail. With a diverse set of instance types powered by different CPU families, we focused on understanding the unique performance characteristics of each CPU model. This deeper insight empowered us to make smarter, data-driven decisions, enabling us to provision our infrastructure with greater efficiency and confidence.

With these goals in mind, we embarked on a new journey of exploration and experimentation to uncover these insights.

Experiment setup 🧰

Collecting accurate data for this type of experiment was no easy feat. We needed to gather data from workloads that were as close to our production as possible, while also capturing how the system behaves under different phases of load. Since CPU usage patterns vary across workloads, we focused primarily on our flagship workloads. However, increasing the load could introduce small performance discrepancies, so our goal was to minimize disruption for our users.

Fortunately, a year ago, the Performance Engineering team developed an environment designed to meet these requirements, codenamed Large Unicorn Collider (LUC). This environment operates within a small portion of our Kubernetes clusters, mirroring the same architecture and configuration as our flagship workloads. It also has the flexibility to be hosted on dedicated machines, preventing interference from or with other workloads. Typically, the LUC environment remains idle, but when needed, we can direct a small, adjustable amount of traffic towards it. Activating or deactivating this traffic takes only seconds, allowing us to react quickly if performance concerns arise.

To accurately assess the impact of CPU utilization, we first established a baseline by sending moderate production traffic to a LUC Kubernetes pod hosted on one of its dedicated machines. This provided us with a benchmark for comparison. Importantly, the number of requests handled by the LUC pods remained constant throughout the experiment, ensuring consistent CPU load over time.

Once the baseline was set, we gradually increased CPU utilization using a tool called “stress,” which artificially occupies a specified number of CPU cores by running random processing tasks. Each instance type has a different number of CPU cores, so we adjusted the steps accordingly. However, the common factor across all instances was the total CPU utilization.

Note: It’s important to recognize that this is not a direct 1:1 comparison to the load generated by actual production workloads. The stress tool continuously runs mathematical operations, while our production workloads involve I/O operations and interrupts, which place different demands on system resources. Nevertheless, this approach still offers valuable insights into how our CPUs perform under load.

With the environment set up and our plan in place, we proceeded to collect as much data as possible to analyze the impact.

Results 📃

With our experiment setup finalized, let’s examine the data we gathered. As previously mentioned, we repeated the process across different instance types. Each instance type showed unique behavior and varying thresholds where performance started to decline.

As anticipated, CPU time increased for all instance types as CPU utilization rose. The graph below illustrates the CPU time per request as CPU utilization increases.

CPU time per request vs CPU utilization
CPU time per request vs CPU utilization

The latency differences between instance types are expected due to the variations in CPU models. Focusing on the percentage increase in latency may provide more meaningful insights.

Latency percentage increase vs CPU utilization
Latency percentage increase vs CPU utilization

In both graphs, one line stands out by deviating more than the others. We’ll examine this case in detail shortly.

Turbo Boost effect

An interesting observation is how CPU frequency changes as utilization increases, which can be attributed to Intel’s Turbo Boost Technology. Since all the instances we used are equipped with Intel CPUs, the impact of Turbo Boost is noticeable across all of them. In the graph below, you can see how the CPU frequency decreases as the CPU utilization increases. The red arrows are showing the CPU utilization level.

CPU Cores Frequency
CPU Cores Frequency

When CPU utilization remains at lower levels (around 30% or below), we benefit from increased core frequencies, leading to faster CPU times and, consequently, lower overall latency. However, as the demand for more CPU cores rises and utilization increases, we are likely to reach the CPU’s thermal and power limits, causing frequencies to decrease. In essence, lower CPU utilization results in better performance, while higher utilization leads to a decline in performance. For instance, a workload running on a specific node with approximately 30% CPU utilization will report faster response times compared to the same workload on the same VM when CPU utilization exceeds 50%.

Hyper-Threading

Variations in CPU frequency are not the only factors influencing performance changes. All our nodes have Hyper-Threading enabled, an Intel technology that allows a single physical CPU core to operate as two virtual cores. Although there is only one physical core, the Linux kernel recognizes it as two virtual CPU cores. The kernel attempts to distribute the CPU load across these cores, aiming to keep only one hardware thread (virtual core) busy per physical core. This approach is effective until we reach a certain level of CPU utilization. Beyond this threshold, we cannot fully utilize both virtual CPU cores, resulting in reduced performance compared to normal operation.

Finding the “Golden Ratio” of CPU utilization

Underutilized nodes lead to wasted resources, power, and space in our data centers, while nodes that are excessively utilized also create inefficiencies. As noted, higher CPU utilization results in decreased performance, which can give a misleading impression that additional resources are necessary, resulting in a cycle of over-provisioning. This issue is particularly pronounced with blocking workloads that do not follow an asynchronous model. As CPU performance deteriorates, each process can manage fewer tasks per second, making existing capacity inadequate. To achieve the optimal balance—the “Golden Ratio” of CPU utilization—we must identify a threshold where CPU utilization is sufficiently high to ensure efficiency without significantly impairing performance. Striving to keep our nodes near this threshold will enable us to utilize our current hardware more effectively alongside our existing software.

Since we already have experimental data demonstrating how CPU time increases with rising utilization, we can develop a mathematical model to identify this threshold. First, we need to determine what percentage of CPU time degradation is acceptable for our specific use case. This may depend on user expectations or performance Service Level Agreements (SLAs). Once we establish this threshold, it will help us select a level of CPU utilization that remains within acceptable limits.

We can plot the CPU utilization vs. CPU time (latency) and find the point where:

  • CPU utilization is high enough to avoid resource underutilization.
  • CPU time degradation does not exceed your acceptable limit.

A specific example derived from the data above can be illustrated in the following graph.

Percentage Increase in P50 Latency vs CPU Utilization
Percentage Increase in P50 Latency vs CPU Utilization

In this example, we aim to achieve less than 40% CPU time degradation, which would correspond to a CPU utilization of 61% on the specific instance.

Outlier case

As previously mentioned, there was a specific instance that displayed some outlying data points. Our experiment confirmed an already recognized issue where certain instances were not achieving their advertised maximum Turbo Boost CPU frequency. Instead, we observed steady CPU frequencies that fell below the maximum advertised value under low CPU utilization. In the example below, you can see an instance from a CPU family that advertises Turbo Boost frequencies above 3 GHz, but it is only reporting a maximum CPU frequency of 2.8 GHz.

CPU cores frequency
CPU cores frequency

This issue turned out to be caused by a disabled CPU C-state, which prevented the CPU cores from halting even when they were not in use. As a result, these cores were perceived as “busy” by the turbo driver, limiting our ability to take advantage of Turbo Boost benefits with higher CPU frequencies. By enabling the C-state and allowing for optimization and power reduction during idle mode, we observed the expected Turbo Boost behavior. This change had an immediate impact on the CPU time spent by our test workloads. The images below illustrate the prompt changes in CPU frequencies and latency reported following the C-state adjustment.

CPU cores frequency
CPU cores frequency
P50 CPU time on a request
P50 CPU time on a request

Upon re-evaluating the percentage change in CPU time, we now observe similar behavior across all instances.

Percentage Increase in P50 Latency vs CPU Utilization
Percentage Increase in P50 Latency vs CPU Utilization

Wrap-up

As we anticipated many of these insights, our objective was to validate our theories using data from our complex system. While we confirmed that performance lowers as CPU utilization increases across different CPU families, by identifying optimal CPU utilization thresholds, we can achieve a better balance between performance and efficiency, ensuring that our infrastructure remains both cost-effective and high performing. Going forward, these insights will inform us of our resource provisioning strategies and help us maximize the effectiveness of our hardware investments.


Thank you for sticking with us until the end!! A special shout-out to @adrmike, @schlubbi, @terrorobe, the @github/compute-platform and finally the @github/performance-engineering team for their invaluable assistance throughout these experiments, data analysis, and for reviewing the content for accuracy and consistency. ❤️

The post Breaking down CPU speed: How utilization impacts performance appeared first on The GitHub Blog.

AWS Weekly Roundup: 197 new launches, AI training partnership with Anthropic, and join AWS re:Invent virtually (Nov 25, 2024)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-197-new-launches-ai-training-partnership-with-anthropic-and-join-aws-reinvent-virtually-nov-25-2024/

Last week, I saw an astonishing 197 new service launches from AWS. This means we are getting closer to AWS re:Invent 2024! Our News Blog team is also finalizing blog posts for re:Invent to introduce some awesome launches from service teams for your reading pleasure.

The most interesting news is that we’re expanding our strategic collaboration with Anthropic as our primary training partner for development of our AWS Trainium chips. This is in addition to being their primary cloud provider for deploying Anthropic’s Claude models in Amazon Bedrock. We’ll keep pushing the boundaries of what customers can achieve with generarive AI technologies with these kinds of collaborations.

Last week’s launches
Here are some AWS bundled feature launches:

Amazon Aurora – Amazon Aurora Serverless v2 now supports scaling to 0 Aurora Capacity Units (ACUs). With 0 ACUs, you can now save cost during periods of database inactivity. Instead of scaling down to 0.5 ACUs, the database can now scale down to 0 ACUs. Amazon Aurora is now compatible with MySQL 8.0.39 and PostgreSQL 17.0 in the Amazon RDS Database preview environment.

Amazon Bedrock – You can quickly build and execute complex generative AI workflows without writing code with the general availability of Amazon Bedrock Flows (previously known as Prompt Flows). Amazon Bedrock Knowledge Bases now supports binary vector embeddings for building Retrieval Augmented Generation (RAG) applications. Amazon Bedrock also introduce a preview launch of Prompt Optimization to rewrite prompts for higher quality responses from foundational models (FMs). You can use AWS Amplify AI kit to easily leverage your data to get customized responses from Bedrock AI models to build web apps with AI capabilities such as chat, conversational search, and summarization.

Amazon CloudFront – You can use gRPC applications in Amazon CloudFront that allows bidirectional communication between a client and a server over HTTP/2 connections. Amazon CloudFront introduces Virtual Private Cloud (VPC) origins to deliver content from applications hosted in VPC private subnets, and Anycast Static IPs to provide you with a dedicated list of IP addresses for connecting to all CloudFront edge locations worldwide. You can also conditionally change or update origin servers on each request with origin modification within CloudFront Functions, and use new log configuration and delivery options.

Amazon CloudWatch – You can use field indexes and log transformation to improve log analytics at scale in the CloudWatch Logs. You can also use enhanced search and analytics experience and runtime metrics support with CloudWatch Application Signals, and percentile aggregation and simplified events-based troubleshooting directly from the web vitals anomaly in CloudWatch Real User Monitoring (RUM).

Amazon Cognito – You can secure user access to your applications with passwordless authentication, including sign-in with passkeys, email, and text message. Amazon Cognito introduces Managed Login, hosted sign-in and sign-up experience that customers can personalize to align with their company or application branding. Cognito launches new user pool feature tiers: Essentials and Plus as well as a new developer-focused console experience. To learn more, visit Donnie’s blog post.

Amazon Connect – You can use new customer profiles and outbound campaigns to help you proactively address customer needs before they become potential issues. Amazon Connect Contact Lens now supports creating custom dashboards, as well as adding or removing widgets from existing dashboards. With new Amazon Connect Email, you can receive and respond to emails sent by customers to business addresses or submitted via web forms on your website or mobile app.

Amazon EC2 – You can shift the launches of EC2 instances in an Auto Scaling Group (ASG) away from an impaired Availability Zone (AZ) to quickly recover your unhealthy application in another AZ with Amazon Application Recovery Controller (ARC) zonal shift and zonal autoshift. Application Load Balancer (ALB) now supports HTTP request and response header modification giving you greater controls to manage your application’s traffic and security posture without having to alter your application code.

AWS End User Messaging (aka Amazon Pinpoint) – You can now track feedback for messages sent through the SMS and MMS channel, explicitly block or allow messages to individual phone numbers overriding your country rule settings, and cost allocation tags for SMS resources to track spend for each tag associated with a resource. AWS End User Messaging also now support integration with Amazon EventBridge.

AWS Lambda – You can use Lambda SnapStart for Python and .NET functions to deliver as low as sub-second startup performance. AWS Lambda now supports Amazon S3 as a failed-event destination for asynchronous invocations and Amazon CloudWatch Application Signals to easily monitor the health and performance of serverless applications built using Lambda. You can also use a new Node.js 22 runtime and Provisioned Mode for event source mappings (ESMs) that subscribe to Apache Kafka event sources.

Amazon OpenSearch Service – You can scale a single cluster to 1000 data nodes (1000 hot nodes and/or 750 warm nodes) to manage 25 petabytes of data. Amazon OpenSearch Service introduces Custom Plugins, a new plugin management option to extend the search and analysis functions in OpenSearch.

Amazon Q Business – You can use tabular search to extract answers from tables embedded in documents ingested in Q Business. You can drag and drop files to upload and reuse any recently uploaded files in new conversations without uploading the files again. Amazon Q Business now supports integrations to Smartsheet in general, and Asana, Google Calendar in preview to automatically sync your index with your selected data sources. You can also use Q Business browser extensions for Google Chrome, Mozilla Firefox, and Microsoft Edge.

Amazon Q Developer – You can ask questions directly related to the AWS Management Console page you’re viewing, eliminating the need to specify the service or resource in your query. You can also use customizable chat responses generated by Q Developer in the IDE to securely connect Q Developer to your private codebases to receive more precise chat responses. Finally, you can use voice input and output capabilities in the AWS Console Mobile App along conversational prompts to list resources in your AWS account.

Amazon QuickSight – You can use Layer Map to visualize custom geographic boundaries, such as sales territories, or user-defined regions, and Image Component to upload your images directly for a variety of use cases, such as adding company logos. Amazon QuickSight also provides the ability to import visuals from an existing dashboard or analysis into your current analysis and Highcharts visuals to create custom visualizations using the Highcharts Core library in preview.

Amazon Redshift – You can ingest data from a wider range of streaming sources from Confluent Managed Cloud and self-managed Apache Kafka clusters on Amazon EC2 instances. You can also use enhanced security defaults which helps you adhere to best practices in data security and reduce the risk of potential misconfigurations.

AWS System Manager – You can use a new and improved version of AWS Systems Manager that brings a highly requested cross-account, and cross-Region experience for managing nodes at scale. AWS Systems Manager now supports instances running Windows Server 2025, Ubuntu Server 24.04, and Ubuntu Server 24.10.

Amazon S3 – You can configure S3 Lifecycle rules for S3 Express One Zone to expire objects on your behalf and append data to objects in S3 Express One Zone. You can also use Amazon S3 Express One Zone as a high performance read cache with Mountpoint for Amazon S3. Amazon S3 Connector for PyTorch now supports Distributed Checkpoint (DCP), improving the time to write checkpoints to Amazon S3.

Amazon VPC – You can use Block Public Access (BPA) for VPC, a new centralized declarative control that enables network and security administrators to authoritatively block Internet traffic for their VPCs. Amazon VPC Lattice now provides native integration with Amazon ECS, easily to deploy, manage, and scale containerized applications.

There’s a lot more launch news that I haven’t covered here. See AWS What’s New for more details.

See you virtually in AWS re:Invent
AWS re:Invent 2023Next week we’ll hear the latest news from AWS, learn from experts, and connect with the global cloud community in Las Vegas. If you come, check out the agenda, session catalog, and attendee guides before your departure.

If you’re not able to attend re:Invent in person, we’re offering the option to livestream our Keynotes and Innovation Talks. With the registration for online pass, you will have access to on-demand keynote, Innovation Talks, and selected breakout sessions after the event. You can also register with AWS Builder ID, a personal account that enables one-click event registration and provides access to many AWS tools and services.

Please stay tuned in the next week!

Channy

Security updates for Monday

Post Syndicated from jake original https://lwn.net/Articles/999597/

Security updates have been issued by Debian (ansible, chromium, ghostscript, glib2.0, intel-microcode, and kernel), Fedora (dotnet9.0, needrestart, php, and python3.6), Oracle (cups, kernel, osbuild-composer, podman, python3.12-urllib3, squid, and xerces-c), Red Hat (buildah, edk2, gnome-shell, haproxy, kernel, kernel-rt, libvpx, pam, python3.11-urllib3, python3.12-urllib3, qemu-kvm, rhc-worker-script, squid:4, and tigervnc), Slackware (php), SUSE (chromedriver, chromium, dcmtk, govulncheck-vulndb, iptraf-ng, and traefik2), and Ubuntu (linux-oracle and openjdk-23).

Automating event validation with Amazon EventBridge Schema Discovery

Post Syndicated from Chris McPeek original https://aws.amazon.com/blogs/compute/automating-event-validation-with-amazon-eventbridge-schema-discovery/

This post is written by Kurt Tometich, Senior Solutions Architect, and Giedrius Praspaliauskas, Senior Solutions Architect, Serverless

Event-driven architectures face challenges with event validation due to unique domains, varying event formats, frequencies, and governance levels. Events are constantly evolving, requiring a balanced approach between speed and governance. This blog post describes approaches to consumer and producer event validation, focusing on automated solutions for producer validation using Amazon EventBridge and Amazon API Gateway.

Consumer and Producer Event Validation

In an event-driven system, events should be validated by both producers and consumers to maintain data integrity. The producers’ job is to create and send valid events before they are routed to consumers. Failing to do so can lead to data inconsistencies, downstream errors in processing and unnecessary costs. As a consumer, even if events come from a trusted source, validation should still be applied. Producers may change data format over time, data may become corrupt, or interfaces between the producer and consumer may alter it.

A common way to manage and route events is through an event bus. EventBridge is a serverless event bus that can perform discovery, versioning and consumption of event schemas. When schema discovery is enabled on an event bus, new schema versions are generated when the event structure changes. These schemas can be used to perform validation on events.

The EventBridge Schema registry stores schemas in OpenAPI or JSONSchema formats. Schemas can be added to the registry automatically through schema discovery or by manually uploading your schema to the registry through the AWS console or programmatically. Schema discovery automates the process of finding schemas and adding them to your registry. Schemas for AWS events are automatically added to the registry.

Once a schema is added to the registry, you can generate a code binding for the schema. This allows you to represent the event as a strongly typed object in your code. Code bindings are available for Golang, Java, Python, or TypeScript programming languages. If preferred language-specific bindings are not available, schemas can be downloaded and validated using third-party schema validation libraries. For example, Ajv for JavaScript or the jsonschema library for Python.

If using code bindings, you can download them using the console, API, or within a supported IDE using the AWS Toolkit. Code bindings can be used like other code artifacts. If an AWS Lambda function is used as a consumer, add the code binding as a layer dependency. Bindings are not automatically synced to any artifact repositories, such as AWS CodeArtifact. The Lambda function code in this solution can be extended to automate binding uploads to your artifact repository.

The following diagram depicts a common producer (left) and consumer (right) event architecture on AWS. Producers send events through API Gateway or directly to an EventBridge event bus. It’s common to use API Gateway as a front door to provide authorization, validation and pre-processing of incoming events. Events going directly to EventBridge may also come from SaaS Partner Integrations (Salesforce, Jira, ServiceNow, etc.) or an application running in a private subnet using the AWS private network to connect to EventBridge. For these events, you can use third-party libraries to validate events prior to them arriving on EventBridge.

Image of Common Architecture for Producer and Consumer Event Validation.

Common Architecture for Producer and Consumer Event Validation

Workflow steps:

  1. Producers send events through API Gateway or directly to EventBridge. API Gateway provides request validation, parses and sends events to EventBridge if they pass validation. Invalid events that do not match the schema in API Gateway will be rejected before reaching EventBridge. Events going directly to EventBridge are validated using third party schema validation libraries (e.g. Ajv for JavaScript and jsonschema library for Python).
  2. With schema discovery enabled on a custom event bus, that bus will receive the event from an application and generate a new schema version in the registry. New schema versions are only created when the event structure changes. When new schema versions are created, a schema version created event is automatically emitted on the default EventBridge event bus. The default bus automatically receives AWS events. EventBridge rules can be configured to match all schema version changes or by filtering on schema name, type and other fields available on the event.
  3. Consumers define EventBridge rules to react to schema version change events. Consumers download the schema or code bindings from EventBridge and perform validation and parsing.
  4. Producers define EventBridge rules to react to schema version change events. The new schema is retrieved from the registry and either used in local development with third-party schema validation libraries, or a model in API Gateway is updated with the new schema directly. This step doesn’t exist as a native feature of EventBridge. The solution later in this post will demonstrate how to automate this step.

To scale this architecture to multiple event sources and API endpoints, you can create different models in API Gateway for each event schema. A model in API Gateway is a data schema that defines the structure and format of data for request and response payloads. Those models are then applied to different resources and methods defined on your APIs. The solutions below will demonstrate how event schemas can be automatically synced to models in API Gateway.

Solution Walkthrough

The following solutions use API Gateway to perform request validation and EventBridge schema discovery to automatically generate up-to-date schema versions. Both can be extended or modified to fit unique use cases. These solutions build upon the general producer and consumer validation architecture covered previously by incorporating automated solutions to downloading, processing and applying new schemas to API Gateway. Refer to the README.md file in the AWS Samples GitHub repository for pre-requisites, deployment instructions and testing.

Lambda Driven Schema Updater

The following architecture uses EventBridge schema discovery to generate new schema versions, download, process and post the schema to an API Gateway model for request validation. The Lambda schema updater function will trigger on schema version changes. The function trigger can be enabled or disabled by updating the rule in EventBridge console.

This solution is a good fit for quick updates with minimal processing. If complex testing and validation is required before updating a new schema, see the CI/CD driven schema updater solution covered later in this post. The rule in this solution triggers when a new schema version is added to the registry. To filter further, the rule can be modified or additional processing can be applied to the Lambda function. This provides flexibility in handling multiple domains or event types.

Image of Architecture for Lambda Driven Schema Updater.

Architecture for Lambda Driven Schema Updater

Workflow Steps:

  1. Producers send events to API Gateway endpoint or directly to EventBridge.
  2. API Gateway performs request validation on the body, modifies the event format and sends to EventBridge. If the event does not match the schema, API Gateway will reject the request.
  3. A custom event bus will receive the event and an optional rule based on source can log all events for tracking and troubleshooting.
  4. With schema discovery enabled on custom event bus, new event structures generate schema versions that are stored in the registry. If a new schema version is generated, consumers can download latest schema and code bindings from the registry.
  5. The schema version creation rule will invoke the Lambda function.
  6. The function will download, process and update the API Gateway model with the new schema. A new schema version is only generated if the structure of the event changes.

CI/CD Driven Schema Updater

The alternative approach uses a CI/CD pipeline to control schema changes. Instead of the Lambda function directly applying the new schema to the API Gateway model, it downloads, processes, and stores the schema in a repository. The CI/CD pipeline references the stored schema, performing additional tests and checks before the schema is promoted and enforced. This provides more control over the schema update process, though it introduces some additional complexity. The following diagram describes the CI/CD driven update process. The solution can be adapted to other artifact repositories and CI/CD systems.

Image of Architecture for CI/CD Driven Schema Updater

Architecture for CI/CD Driven Schema Updater

Workflow steps:

  1. Producers send events to API Gateway endpoint or directly to EventBridge.
  2. API Gateway will perform request validation against the body, modify the event format and send to EventBridge.
  3. A custom event bus will receive event and an optional rule based on source can log all events for tracking and troubleshooting.
  4. With discovery enabled on the custom event bus, schema versions are produced and stored in the registry.
  5. The schema version creation rule will invoke the Lambda function.
  6. The function will download, process and store the new schema in a repository of choice (i.e. S3, Git, Artifact Repository).
  7. The CI/CD pipeline updates the model in API Gateway and runs any necessary tests.
  8. The consumer downloads schema and code bindings from appropriate repositories.

Conclusion

Event validation can be challenging, but leveraging schema discovery and request validation minimizes custom logic and overhead. EventBridge can discover new schemas from events, while API Gateway validates incoming requests. This approach streamlines validation, improves data quality, and reduces the maintenance burden of manual validation.

For more information on event driven architectures, you can view additional resources on AWS Samples and Serverless Land.