[$] Pondering systemd-homed for Fedora

Post Syndicated from jzb original https://lwn.net/Articles/995915/

Fedora Linux, as a rule, handles version upgrades reasonably
well. However, there are times when users may want to do a fresh
installation rather than an upgrade but preserve existing
users and data under /home. This is a scenario that the
Fedora installer, currently, does not address. Users can maintain a
separate /home partition, of course, but the installer does
not incorporate existing users into the new install—that is an
exercise left to the user to handle. One solution might be to use systemd-homed, a systemd
service for managing users and home directories. However, a discussion
proposing the use systemd-homed as part of Fedora installation
uncovered some hurdles, such as trying to blend its approach to
managing users with tools that centralize user management.

Cohen: gccrs: An alternative compiler for Rust

Post Syndicated from corbet original https://lwn.net/Articles/997483/

Arthur Cohen has posted a
detailed introduction to the gccrs project
on the Rust Blog, seemingly
with the goal of convincing the Rust community about the value of the
project.

Likewise, many GCC plugins are used for increasing the safety of
critical projects such as the Linux kernel, which has recently
gained support for the Rust programming language. This makes
gccrs a useful tool for analyzing unsafe Rust code, and
more generally Rust code which has to interact with existing C
code. We also want gccrs to be a useful tool for
rustc itself by helping pan out the Rust specification
effort with a unique viewpoint – that of a tool trying to replicate
another’s functionality, oftentimes through careful experimentation
and source reading where the existing documentation did not go into
enough detail.

(LWN last looked at gccrs in October).

Incremental refresh for Amazon Redshift materialized views on data lake tables

Post Syndicated from Raks Khare original https://aws.amazon.com/blogs/big-data/incremental-refresh-for-amazon-redshift-materialized-views-on-data-lake-tables/

Amazon Redshift is a fast, fully managed cloud data warehouse that makes it cost-effective to analyze your data using standard SQL and business intelligence tools. You can use Amazon Redshift to analyze structured and semi-structured data and seamlessly query data lakes and operational databases, using AWS designed hardware and automated machine learning (ML)-based tuning to deliver top-tier price performance at scale.

Amazon Redshift delivers price performance right out of the box. However, it also offers additional optimizations that you can use to further improve this performance and achieve even faster query response times from your data warehouse.

One such optimization for reducing query runtime is to precompute query results in the form of a materialized view. Materialized views in Redshift speed up running queries on large tables. This is useful for queries that involve aggregations and multi-table joins. Materialized views store a precomputed result set of these queries and also support incremental refresh capability for local tables.

Customers use data lake tables to achieve cost effective storage and interoperability with other tools. With open table formats (OTFs) such as Apache Iceberg, data is continuously being added and updated.

Amazon Redshift now provides the ability to incrementally refresh your materialized views on data lake tables including open file and table formats such as Apache Iceberg.

In this post, we will show you step-by-step what operations are supported on both open file formats and transactional data lake tables to enable incremental refresh of the materialized view.

Prerequisites

To walk through the examples in this post, you need the following prerequisites:

  1. You can test the incremental refresh of materialized views on standard data lake tables in your account using an existing Redshift data warehouse and data lake. However, if you want to test the examples using sample data, download the sample data. The sample files are ‘|’ delimited text files.
  2. An AWS Identity and Access Management (IAM) role attached to Amazon Redshift to grant the minimum permissions required to use Redshift Spectrum with Amazon Simple Storage Service (Amazon S3) and AWS Glue.
  3. Set the IAM Role as the default role in Amazon Redshift.

Incremental materialized view refresh on standard data lake tables

In this section, you learn how to can build and incrementally refresh materialized views in Amazon Redshift on standard text files in Amazon S3, maintaining data freshness with a cost-effective approach.

  1. Upload the first file, customer.tbl.1, downloaded from the Prerequisites section in your desired S3 bucket with the prefix customer.
  2. Connect to your Amazon Redshift Serverless workgroup or Redshift provisioned cluster using Query editor v2.
  3. Create an external schema.
    create external schema datalake_mv_demo
    from data catalog   
    database 'datalake-mv-demo'
    iam_role default;

  4. Create an external table named customer in the external schema datalake_mv_demo created in the preceding step.
    create external table datalake_mv_demo.customer(
            c_custkey int8,
            c_name varchar(25),
            c_address varchar(40),
            c_nationkey int4,
            c_phone char(15),
            c_acctbal numeric(12, 2),
            c_mktsegment char(10),
            c_comment varchar(117)
        ) row format delimited fields terminated by '|' stored as textfile location 's3://<your-s3-bucket-name>/customer/';

  5. Validate the sample data in the external customer.
    select * from datalake_mv_demo.customer;

  6. Create a materialized view on the external table.
    CREATE MATERIALIZED VIEW customer_mv 
    AS
    select * from datalake_mv_demo.customer;

  7. Validate the data in the materialized view.
    select * from customer_mv limit 5;

  8. Upload a new file customer.tbl.2 in the same S3 bucket and customer prefix location. This file contains one additional record.
  9. Using Query editor v2 , refresh the materialized view customer_mv.
    REFRESH MATERIALIZED VIEW customer_mv;

  10. Validate the incremental refresh of the materialized view when the new file is added.
    select mv_name, status, start_time, end_time
    from SYS_MV_REFRESH_HISTORY
    where mv_name='customer_mv'
    order by start_time DESC;

  11. Retrieve the current number of rows present in the materialized view customer_mv.
    select count(*) from customer_mv;

  12. Delete the existing file customer.tbl.1 from the same S3 bucket and prefix customer. You should only have customer.tbl.2 in the customer prefix of your S3 bucket.
  13. Using Query editor v2, refresh the materialized view customer_mv again.
    REFRESH MATERIALIZED VIEW customer_mv;

  14. Verify that the materialized view is refreshed incrementally when the existing file is deleted.
    select mv_name, status, start_time, end_time
    from SYS_MV_REFRESH_HISTORY
    where mv_name='customer_mv'
    order by start_time DESC;

  15. Retrieve the current row count in the materialized view customer_mv. It should now have one record as present in the customer.tbl.2 file.
    select count(*) from customer_mv;

  16. Modify the contents of the previously downloaded customer.tbl.2 file by altering the customer key from 999999999 to 111111111.
  17. Save the modified file and upload it again to the same S3 bucket, overwriting the existing file within the customer prefix.
  18. Using Query editor v2, refresh the materialized view customer_mv
    REFRESH MATERIALIZED VIEW customer_mv;

  19. Validate that the materialized view was incrementally refreshed after the data was modified in the file.
    select mv_name, status, start_time, end_time
    from SYS_MV_REFRESH_HISTORY
    where mv_name='customer_mv'
    order by start_time DESC;

  20. Validate that the data in the materialized view reflects your prior data changes from 999999999 to 111111111.
    select * from customer_mv;

Incremental materialized view refresh on Apache Iceberg data lake tables

Apache Iceberg is a data lake open table format that’s rapidly becoming an industry standard for managing data in data lakes. Iceberg introduces new capabilities that enable multiple applications to work together on the same data in a transactionally consistent manner.

In this section, we will explore how Amazon Redshift can seamlessly integrate with Apache Iceberg. You can use this integration to build materialized views and incrementally refresh them using a cost-effective approach, maintaining the freshness of the stored data.

  1. Sign in to the AWS Management Console, go to Amazon Athena, and execute the following SQL to create a database in an AWS Glue catalog.
    create database iceberg_mv_demo;

  2. Create a new Iceberg table
    create table iceberg_mv_demo.category (
      catid int ,
      catgroup string ,
      catname string ,
      catdesc string)
      PARTITIONED BY (catid, bucket(16,catid))
      LOCATION 's3://<your-s3-bucket-name>/iceberg/'
      TBLPROPERTIES (
      'table_type'='iceberg',
      'write_compression'='snappy',
      'format'='parquet');

  3. Add some sample data to iceberg_mv_demo.category.
    insert into iceberg_mv_demo.category values
    (1, 'Sports', 'MLB', 'Major League Basebal'),
    (2, 'Sports', 'NHL', 'National Hockey League'),
    (3, 'Sports', 'NFL', 'National Football League'),
    (4, 'Sports', 'NBA', 'National Basketball Association'),
    (5, 'Sports', 'MLS', 'Major League Soccer');

  4. Validate the sample data in iceberg_mv_demo.category.
    select * from iceberg_mv_demo.category;

  5. Connect to your Amazon Redshift Serverless workgroup or Redshift provisioned cluster using Query editor v2.
  6. Create an external schema
    CREATE external schema iceberg_schema
    from data catalog
    database 'iceberg_mv_demo'
    region 'us-east-1'
    iam_role default;

  7. Query the Iceberg table data from Amazon Redshift.
    SELECT *  FROM "dev"."iceberg_schema"."category";

  8. Create a materialized view using the external schema.
    create MATERIALIZED view mv_category as
    select  * from
    "dev"."iceberg_schema"."category";

  9. Validate the data in the materialized view.
    select  * from
    "dev"."iceberg_schema"."category";

  10. Using Amazon Athena, modify the Iceberg table iceberg_mv_demo.category and insert sample data.
    insert into category values
    (12, 'Concerts', 'Comedy', 'All stand-up comedy performances'),
    (13, 'Concerts', 'Other', 'General');

  11. Using Query editor v2, refresh the materialized view mv_category.
    Refresh  MATERIALIZED view mv_category;

  12. Validate the incremental refresh of the materialized view after the additional data was populated in the Iceberg table.
    select mv_name, status, start_time, end_time
    from SYS_MV_REFRESH_HISTORY
    where mv_name='mv_category'
    order by start_time DESC;

  13. Using Amazon Athena, modify the Iceberg table iceberg_mv_demo.category by deleting and updating records.
    delete from iceberg_mv_demo.category
    where catid = 3;
     
    update iceberg_mv_demo.category
    set catdesc= 'American National Basketball Association'
    where catid=4;

  14. Validate the sample data in iceberg_mv_demo.category to confirm that catid=4 has been updated and catid=3 has been deleted from the table.
    select * from iceberg_mv_demo.category;

  15. Using Query editor v2, Refresh the materialized view mv_category.
    Refresh  MATERIALIZED view mv_category;

  16. Validate the incremental refresh of the materialized view after one row was updated and another was deleted.
    select mv_name, status, start_time, end_time
    from SYS_MV_REFRESH_HISTORY
    where mv_name='mv_category'
    order by start_time DESC;

Performance Improvements

To understand the performance improvements of incremental refresh over full recompute, we used the industry-standard TPC-DS benchmark using 3 TB data sets for Iceberg tables configured in copy-on-write. In our benchmark, fact tables are stored on Amazon S3, while dimension tables are in Redshift. We created 34 materialized views representing different customer use cases on a Redshift provisioned cluster of size ra3.4xl with 4 nodes. We applied 1% inserts and deletes on fact tables, i.e., tables store_sales, catalog_sales and web_sales. We ran the inserts and deletes with Spark SQL on EMR serverless. We refreshed all 34 materialized views using incremental refresh and measured refresh latencies. We repeated the experiment using full recompute.

Our experiments show that incremental refresh provides substantial performance gains over full recompute. After insertions, incremental refresh was 13.5X faster on average than full recompute (maximum 43.8X, minimum 1.8X). After deletions, incremental refresh was 15X faster on average (maximum 47X, minimum 1.2X). The following graphs illustrate the latency of refresh.

Inserts

Deletes

Clean up

When you’re done, remove any resources that you no longer need to avoid ongoing charges.

  1. Run the following script to clean up the Amazon Redshift objects.
    DROP  MATERIALIZED view mv_category;
    
    DROP  MATERIALIZED view customer_mv;

  2. Run the following script to clean up the Apache Iceberg tables using Amazon Athena.
    DROP  TABLE iceberg_mv_demo.category;

Conclusion

Materialized views on Amazon Redshift can be a powerful optimization tool. With incremental refresh of materialized views on data lake tables, you can store pre-computed results of your queries over one or more base tables, providing a cost-effective approach to maintaining fresh data. We encourage you to update your data lake workloads and use the incremental materialized view feature. If you’re new to Amazon Redshift, try the Getting Started tutorial and use the free trial to create and provision your first cluster and experiment with the feature.

See Materialized views on external data lake tables in Amazon Redshift Spectrum for considerations and best practices.


About the authors

Raks KhareRaks Khare is a Senior Analytics Specialist Solutions Architect at AWS based out of Pennsylvania. He helps customers across varying industries and regions architect data analytics solutions at scale on the AWS platform. Outside of work, he likes exploring new travel and food destinations and spending quality time with his family.

Tahir Aziz is an Analytics Solution Architect at AWS. He has worked with building data warehouses and big data solutions for over 15+ years. He loves to help customers design end-to-end analytics solutions on AWS. Outside of work, he enjoys traveling and cooking.

Raza Hafeez is a Senior Product Manager at Amazon Redshift. He has over 13 years of professional experience building and optimizing enterprise data warehouses and is passionate about enabling customers to realize the power of their data. He specializes in migrating enterprise data warehouses to AWS Modern Data Architecture.

Enrico Siragusa is a Senior Software Development Engineer at Amazon Redshift. He contributed to query processing and materialized views. Enrico holds a M.Sc. in Computer Science from the University of Paris-Est and a Ph.D. in Bioinformatics from the International Max Planck Research School in Computational Biology and Scientific Computing in Berlin.

Mind the Gap: How Surface Command Tackles Asset Visibility in Attack Surface Management

Post Syndicated from Ed Montgomery original https://blog.rapid7.com/2024/11/08/mind-the-gap-how-surface-command-tackles-asset-visibility-in-attack-surface-management/

“Only 17% of organizations can clearly identify and inventory a majority (95% or more) of their assets.” – Gartner

Mind the Gap: How Surface Command Tackles Asset Visibility in Attack Surface Management

Imagine the scenario: your organization has been exposed to a new zero-day vulnerability. You are responsible for Threat & Vulnerability Management (TVM), you have asked your IT department for an assessment of the asset inventory in your organization.

You make the same request to your security team. Both teams give you a different number of assets, with a significant disparity: IT reports 10,000 assets, compared to 8,200 from your colleagues in security.

When you look up your Configuration Management Database (CMDB_ application, you quickly discover that it has not been updated for months and does not accurately represent of your attack surface either.

How do you measure your risk exposure when three sources of information are not in agreement? Your highly-skilled colleagues are now back to using spreadsheets to document your assets—a very manual and time-consuming process that is not a productive use of their time.

Attack Surface Management (ASM)

ASM covers both internal and external assets—the physical and digital assets that an organization needs to have visibility into in order to understand its security posture. By establishing visibility of the attack surface and implementing management processes to prioritize, validate, and mobilize responses, security teams can reduce exposures exploited by malicious threat actors.

“Asset inventory is a common and well-known problem for organizations.”

Manage the Gap in Asset Inventory with Surface Command

We began this blog with a real-life and anonymized example for a customer and the disparity in their asset count between IT and Security teams. Surface Command addresses this operational challenge. Firstly, Surface Command is platform-agnostic; what’s important to Rapid7 is capturing your actual number of assets using a mixture of external scanning and importing data feeds from over 100 commonly used IT and Security tools (EDR, CNAPP, VM, CMDB, etc.). This provides a true, constantly updated view of all assets across the cloud and on-premises. Assets detailed will include cloud containers, servers, workstations, IoT devices, identities, smartphones and more.

To help demonstrate the value of this complete visibility, we have created a short, 2-minute product tour, which you can view at your convenience. In this initial product tour, we show how to identify coverage gaps in your security posture using Surface Command. Take the example of a zero-day vulnerability discovered for a particular operating system; you need to understand your attack surface immediately.

Surface Command will quickly display assets missing  key security controls, such as a deployed endpoint security agent. You can drill down further to focus on assets by operating system or device type. This technology is powered by Rapid7’s Machine Learning (ML) classifiers to ensure coverage and data accuracy.

Watch as we filter down from a large number of total assets, to a smaller, focused number of high-risk assets that can be prioritized for action by your IT and Security Teams, all done with just a few clicks.

This scenario is commonly used by our customers to quickly identify simple security gaps, and with Surface Command, you can easily save this for future use, as well as publish the results to reporting dashboards.

By establishing visibility of the attack surface and implementing management processes to prioritize, validate, and mobilize responses, security teams can reduce their exposure and improve cyber risk management.

After all, you can’t protect what you can’t see.

Mind the Gap: How Surface Command Tackles Asset Visibility in Attack Surface Management

To learn more, click here.

Sources:

Gartner, Innovation Insight: Attack Surface Management – 9 April 2024 – ID G00809126

Gartner, Innovation Insight: Attack Surface Management – 9 April 2024 – ID G00809126

Security updates for Friday

Post Syndicated from daroc original https://lwn.net/Articles/997480/

Security updates have been issued by AlmaLinux (edk2), Debian (webkit2gtk), Fedora (thunderbird), Oracle (bzip2, container-tools:ol8, edk2, go-toolset:ol8, libtiff, python-idna, python3.11, and python3.12), Slackware (expat), and SUSE (apache2, govulncheck-vulndb, grub2, java-1_8_0-openjdk, python3, python39, qemu, xorg-x11-server, and xwayland).

Моя страна, моя България. В кафяво

Post Syndicated from Емилия Милчева original https://www.toest.bg/moya-strana-moya-bulgaria-v-kafyavo/

Моя страна, моя България. В кафяво

Когато в държава от Европейския съюз граждански протести прераснат в цензура, спираща книги, филми и театрални представления, започва да ехти Die Fahne hoch! на нацисткия химнописец Хорст Весел. 

Чуваме ли?

Националистически агитки спряха хората, купили билети за пиесата „Оръжията и човекът“ на Бърнард Шоу, да гледат постановката в Народния театър, защото унизявала, казват, достойнството на българите и българските офицери. Миналото лято подобни агитки провалиха и прожекции на белгийския филм „Близо“, защото уж разпространявал хомосексуална пропаганда и педофилия. Пак националисти от „Възраждане“ и сродни на тях формации публикуваха черен списък на учители, обявили се срещу злокобната и безполезна поправка за т.нар. джендър пропаганда, и организираха кампания срещу Националната природо-математическа гимназия в София заради нагнетен скандал с ученик, в който вкараха и Истанбулската конвенция. 

Това не са просто „два свята на патриотизъм“, за които говори директорът на Народния театър Васил Василев. Това е диктат на единия свят, шумен и креслив, в който момчета от футболни агитки се смесват с ослепени от пропагандата мозъци и заплашват свободата на изразяване, без която не само изкуството, а и демократичните общества не могат да съществуват. Под тази заплаха в България са артисти, интелектуалци, активисти, които са срещу монопола на мисленето, налаган постепенно в политика, медии, социални мрежи. Този монопол не само задушава обществото в догми, но прави хората по-уязвими на пропаганда и контрол и застрашава самите основи на демокрацията. 

Със слабата политическа съпротива и негласното институционално съдействие, в това число и на полицията, тази нова културна революция си проправя път. Уви, не само в България, макар че тук я улеснява масираната прокремълска пропаганда. Също и симпатиите към (завърналия се в Белия дом) Доналд Тръмп на 49% от нашите сънародници, установени от „Галъп“. Но и към Путин – 37% от българите според проучване на френската агенция Ipsos харесват руския президент, което е най-високият резултат в целия ЕС.

Служебният министър на вътрешните работи Атанас Илков не видя никакви проблеми пред Народния театър след провалената постановка на Джон Малкович. Дори посочи като виновник директора на театъра Васил Василев, въпреки че отговорността за реда и сигурността не е на Василев, опитал се да осъществи цивилизован диалог с протестиращите.

„Контрапатриотично“

Пред Народния театър на 7 ноември все пак имаше два свята. Единият, който развяваше знамена, обиждаше на „еничари“, „педераси“, „предатели“ и „евроатлантически подлоги“, хвърляше камъни и удряше. Гражданка патриотка със знаменце в ръка беше довела и кучето си, което се изпика на стълбите на театъра, а качил се върху една от колоните ултрас с черна маска също развяваше флаг. 

Другият свят си чакаше кротко с билетитe и дори предлагаше на беснеещите отсреща да им ги подари, за да гледат постановката. „Да ги нагрубим с думи, които нe знаят“, предложи някой. А други се възмущаваха, че полицията не прави нищо, за да пусне публиката, и пяха химна заедно с лагера отсреща. 

„Какво имате срещу една пиеса – ами Бай Ганьо?“, имаше такъв въпрос от дошлите да гледат постановката към родолюбците. „Бай Ганьо е самоирония, а това е обида, унижение!“, беше отговорът. (Така де, само българи може да пишат сатира за българи, не и чужденци…)

След седмици на подгряване в определени медии и социални мрежи, в антиваксърски групи и националистически организации, Столичната община и СДВР, които съгласуват протеста, не бяха предвидили периметър пред входа на театъра, който да позволи на хората с билети да влязат, за да гледат постановката, а на протестиращите – да изразят недоволството си от „гаврата с паметта на българската саможертва в Сръбско-българската война през 1885 г.“. От Народния театър предупредиха в позиция за случващото се още преди премиерата и бяха подкрепени и от Европейската театрална конвенция.

Трябва да признаем сериозността на тази ситуация. В една демократична европейска държава не трябва да има толерантност към подобна цензура и сплашване. Това е решителен момент за Европа да заеме позиция и да отхвърли опасния възход на ултранационалистически движения, които се стремят да заглушат творческите гласове и да разрушат нашите културни основи.

Министърът на културата Найден Тодоров, който е пианист, композитор и диригент на Софийската филхармония, също защити изкуството.

Можем да вземем оставката на директора на Народния театър, да забраним постановката, после аз мога да кажа две-три други постановки, които трябва да забраним, разбира се, трябва да забраним Бърнард Шоу, после да започнем да горим книгите, после знаем какво следва – нацизъм.

Но актьорът Захари Бахаров беше съвсем прям.

Не мога да обсъждам репертоара на Народния театър с онова безпросветно леке, което нападна Владо Пенев. Владо Пенев с една от стоте си роли е направил повече за България и за българската култура, отколкото тези говеда ще направят когато и да било. Затова не се обясняваш с тях, не спориш, не ги каниш на театър…

„Кристална нощ“ нарече аниматорът Теодор Ушев вечерта на 7 ноември, след като беше нападнат от агитките. Задържани за безредиците нямаше. Но все пак полицията се е поразмърдала, след като съобщи, че са установени нападателите на Ушев и тези, които удряха директора на Народния театър Васил Василев. Макар вътрешният министър Илков вече да се произнесе, че „хаос не е имало“. 

Софийската районна прокуратура се самосезира за безредиците пред Народния театър и възложи разследването на СДВР – същите, които трябваше да направят възможното те да не се състоят. Впрочем вътрешният министър обеща да бъдат гледани записи от камери, за да се види кой е удрял

В България прокуратурата и полицията протежират националистите още от времената на „Атака“ – „бабата“ на днешните „Възраждане“, „Величие“ и прочее. Но бият протестиращи срещу властта скришом, зад колони. През 2015 г. Европейският съд по правата на човека в Страсбург осъди България заради нападенията на привърженици на „Атака“ срещу софийската джамия „Баня Башъ“. 

На 20 май 2011 г. Волен Сидеров и депутати от „Атака“ поведоха протест срещу високоговорителите на храма и прекъснаха петъчната молитва на мюсюлманите. Стигна се до грозни сцени на сблъсъци, подпалени молитвени килимчета и рязане на фесове, а Сидеров определи намесилите се полицаи като „еничари“. Разследването срещу него пропадна. По-късно и самата „Атака“ пропадна, но все по-отровни нейни двойници се появяват.

Три от българските партии пуснаха декларации, в които осъдиха цензурата и забраната на правото на свободно изразяване – ГЕРБ, „Продължаваме промяната“ и „Да, България“, част от „Демократична България“. Но макар от ПП–ДБ да определиха протеста като подготвяна акция и активно мероприятие на хора, свързани с бившата Държавна сигурност, искания за оставки нямаше, нито бяха назовани политическите сили, подпомогнали изстъпленията. Сред протестиращите пред Народния театър бяха Цончо Ганев и Коста Стоянов от третата парламентарно представена партия „Възраждане“, Красимир Каракачанов и Ангел Джамбазки от ВМРО, също и лидерът на Българското национално обединение Георги Георгиев. 

Вместо епилог

Знаете ли, че когато строяли Народния театър, отсреща живеел един богат търговец, който, като се вдигала сградата, вдигнал и той своята, за да може да пикае отгоре на театъра, както сам казвал. 

Разказва го жена, кротко пушеща пред полицаите, която се отказа да чака Народният театър да отвори вратите си, защото на следваща сутрин има дело със столичната „Топлофикация“. 

Спектакълът обаче ще се играе, билетите са разпродадени. Малкович никога не е мислил, че някой може да се обиди на пиеса. В България има твърде много хора, които с кеф биха се изпикали на изкуството. 

How we prevent conflicts in authoritative DNS configuration using formal verification

Post Syndicated from James Larisch original https://blog.cloudflare.com/topaz-policy-engine-design

Over the last year, Cloudflare has begun formally verifying the correctness of our internal DNS addressing behavior — the logic that determines which IP address a DNS query receives when it hits our authoritative nameserver. This means that for every possible DNS query for a proxied domain we could receive, we try to mathematically prove properties about our DNS addressing behavior, even when different systems (owned by different teams) at Cloudflare have contradictory views on which IP addresses should be returned.

To achieve this, we formally verify the programs — written in a custom Lisp-like programming language — that our nameserver executes when it receives a DNS query. These programs determine which IP addresses to return. Whenever an engineer changes one of these programs, we run all the programs through our custom model checker (written in Racket + Rosette) to check for certain bugs (e.g., one program overshadowing another) before the programs are deployed.

Our formal verifier runs in production today, and is part of a larger addressing system called Topaz. In fact, it’s likely you’ve made a DNS query today that triggered a formally verified Topaz program.

This post is a technical description of how Topaz’s formal verification works. Besides being a valuable tool for Cloudflare engineers, Topaz is a real-world example of formal verification applied to networked systems. We hope it inspires other network operators to incorporate formal methods, where appropriate, to help make the Internet more reliable for all.

Topaz’s full technical details have been peer-reviewed and published in ACM SIGCOMM 2024, with both a paper and short video available online. 

Addressing: how IP addresses are chosen

When a DNS query for a customer’s proxied domain hits Cloudflare’s nameserver, the nameserver returns an IP address — but how does it decide which address to return?

Let’s make this more concrete. When a customer, say example.com, signs up for Cloudflare and proxies their traffic through Cloudflare, it makes Cloudflare’s nameserver authoritative for their domain, which means our nameserver has the authority to respond to DNS queries for example.com. Later, when a client makes a DNS query for example.com, the client’s recursive DNS resolver (for example, 1.1.1.1) queries our nameserver for the authoritative response. Our nameserver returns some Cloudflare IP address (of our choosing) to the resolver, which forwards that address to the client. The client then uses the IP address to connect to Cloudflare’s network, which is a global anycast network — every data center advertises all of our addresses.


Clients query Cloudflare’s nameserver (via their resolver) for customer domains. The nameserver returns Cloudflare IP addresses, advertised by our entire global network, which the client uses to connect to the customer domain. Cloudflare may then connect to the origin server to fulfill the user’s HTTPS request.

When the customer has configured a static IP address for their domain, our nameserver’s choice of IP address is simple: it simply returns that static address in response to queries made for that domain.

But for all other customer domains, our nameserver could respond with virtually any IP address that we own and operate. We may return the same address in response to queries for different domains, or different addresses in response to different queries for the same domain. We do this for resilience, but also because decoupling names and IP addresses improves flexibility.

With all that in mind, let’s return to our initial question: given a query for a proxied domain without a static IP, which IP address should be returned? The answer: Cloudflare chooses IP addresses to meet various business objectives. For instance, we may choose IPs to:

  • Change the IP address of a domain that is under attack.

  • Direct fractions of traffic to specific IP addresses to test new features or services.

  • Remap or “renumber” domain names to new IP address space.

Topaz executes DNS objectives

To change authoritative nameserver behavior — how we choose IPs —  a Cloudflare engineer encodes their desired DNS business objective as a declarative Topaz program. Our nameserver stores the list of all such programs such that when it receives a DNS query for a proxied domain, it executes the list of programs in sequence until one returns an IP address. It then returns that IP to the resolver.


Topaz receives DNS queries (metadata included) for proxied domains from Cloudflare’s nameserver. It executes a list of policies in sequence until a match is found. It returns the resulting IP address to the nameserver, which forwards it to the resolver.

What do these programs look like?

Each Topaz program has three primary components:

  1. Match function: A program’s match function specifies under which circumstances the program should execute. It takes as input DNS query metadata (e.g., datacenter information, account information) and outputs a boolean. If, given a DNS query, the match function returns true, the program’s response function is executed.

  2. Response function: A program’s response function specifies which IP addresses should be chosen. It also takes as input all the DNS query metadata, but outputs a 3-tuple (IPv4 addresses, IPv6 addresses, and TTL). When a program’s match function returns true, its corresponding response function is executed. The resulting IP addresses and TTL are returned to the resolver that made the query. 

  3. Configuration: A program’s configuration is a set of variables that parameterize that program’s match and response function. The match and response functions reference variables in the corresponding configuration, thereby separating the macro-level behavior of a program (match/response functions) from its nitty-gritty details (specific IP addresses, names, etc.). This separation makes it easier to understand how a Topaz program behaves at a glance, without getting bogged down by specific function parameters.

Let’s walk through an example Topaz program. The goal of this program is to give all queried domains whose metadata field “tag1” is equal to “orange” a particular IP address. The program looks like this:

- name: orange
  config: |
    (config
      ([desired_tag1 "orange"]
       [ipv4 (ipv4_address “192.0.2.3”)]
       [ipv6 (ipv6_address “2001:DB8:1:3”)]
       [t (ttl 300]))
  match: |
    (= query_domain_tag1 desired_tag1) 
  response: |
    (response (list ipv4) (list ipv6) t)

Before we walk through the program, note that the program’s configuration, match, and response function are YAML strings, but more specifically they are topaz-lang expressions. Topaz-lang is the domain-specific language (DSL) we created specifically for expressing Topaz programs. It is based on Scheme, but is much simpler. It is dynamically typed, it is not Turing complete, and every expression evaluates to exactly one value (though functions can throw errors). Operators cannot define functions within topaz-lang, they can only add new DSL functions by writing functions in the host language (Go). The DSL provides basic types (numbers, lists, maps) but also Topaz-specific types, like IPv4/IPv6 addresses and TTLs.

Let’s now examine this program in detail. 

  • The config is a set of four bindings from name to value. The first binds the string ”orange” to the name desired_tag1. The second binds the IPv4 address 192.0.2.3 to the name ipv4. The third binds the IPv6 address 2001:DB8:1:3 to the name ipv6. And the fourth binds the TTL (for which we added a topaz-lang type) 300 (seconds) to the name t.

  • The match function is an expression that must evaluate to a boolean. It can reference configuration values (e.g., desired_tag1), and can also reference DNS query fields. All DNS query fields use the prefix query_ and are brought into scope at evaluation time. This program’s match function checks whether deired_tag1 is equal to the tag attached to the queried domain, query_domain_tag1

  • The response function is an expression that evaluates to the special response type, which is really just a 3-tuple consisting of: a list of IPv4 addresses, a list of IPv6 addresses, and a TTL. This program’s response function simply returns the configured IPv4 address, IPv6 address, and TTL (seconds).

Critically, all Topaz programs are encoded as YAML and live in the same version-controlled file. Imagine this program file contained only the orange program above, but now, a new team wants to add a new program, which checks whether the queried domain’s “tag1” field is equal to “orange” AND that the domain’s “tag2” field is equal to true:

- name: orange_and_true
  config: |
    (config
      ([desired_tag1 "orange"]
       [ipv4 (ipv4_address “192.0.2.2”)]
       [ipv6 (ipv6_address “2001:DB8:1:2”)]
       [t (ttl 300)]))
  match: |
    (and (= query_domain_tag1 desired_tag1)
         query_domain_tag2)
  response: |
    (response (list ipv4) (list ipv6) t)

This new team must place their new orange_and_true program either below or above the orange program in the file containing the list of Topaz programs. For instance, they could place orange_and_true after orange, like so:

- name: orange
  config: …
  match: …
  response: …
- name: orange_and_true
  config: …
  match: …
  response: …

Now let’s add a third, more interesting Topaz program. Say a Cloudflare team wants to test a modified version of our CDN’s HTTP server on a small percentage of domains, and only in a subset of Cloudflare’s data centers. Furthermore, they want to distribute these queries across a specific IP prefix such that queries for the same domain get the same IP. They write the following:

- name: purple
  config: |
    (config
      ([purple_datacenters (fetch_datacenters “purple”)]
       [percentage 10]
       [ipv4_prefix (ipv4_prefix “203.0.113.0/24”)]
       [ipv6_prefix (ipv6_prefix “2001:DB8:3::/48”)]))
  match: |
    (let ([rand (rand_gen (hash query_domain))])
      (and (member? purple_datacenters query_datacenter)
           (< (random_number (range 0 99) rand) percentage)))
  response: |
    (let ([hashed_domain (hash query_domain)]
          [ipv4_address (select_from ipv4_prefix hashed_domain)]
          [ipv6_address (select_from ipv6_prefix hashed_domain)])
      (response (list ipv4_address) (list ipv6_address) (ttl 1)))

This Topaz program is significantly more complicated, so let’s walk through it.

Starting with configuration: 

  • The first configuration value, purple_datacenters, is bound to the expression (fetch_datacenters “purple”), which is a function that retrieves all Cloudflare data centers tagged “purple” via an internal HTTP API. The result of this function call is a list of data centers. 

  • The second configuration value, percentage, is a number representing the fraction of traffic we would like our program to act upon.

  • The third and fourth names are bound to IP prefixes, v4 and v6 respectively (note the built-in ipv4_prefix and ipv6_prefix types).

The match function is also more complicated. First, note the let form — this lets operators define local variables. We define one local variable, a random number generator called rand seeded with the hash of the queried domain name. The match expression itself is a conjunction that checks two things. 

  • First, it checks whether the query landed in a data center tagged “purple”. 

  • Second, it checks whether a random number between 0 and 99 (produced by a generator seeded by the domain name) is less than the configured percentage. By seeding the random number generator with the domain, the program ensures that 10% of domains trigger a match. If we had seeded the RNG with, say, the query ID, then queries for the same domain would behave differently.

Together, the conjuncts guarantee that the match expression evaluates to true for 10% of domains queried in “purple” data centers.

Now let’s look at the response function. We define three local variables. The first is a hash of the domain. The second is an IPv4 address selected from the configured IPv4 prefix. select_from always chooses the same IP address given the same prefix and hash — this ensures that queries for a given domain always receive the same IP address (which makes it easier to correlate queries for a single domain), but that queries for different domains can receive different IP addresses within the configured prefix. The third local variable is an IPv6 address selected similarly. The response function returns these IP addresses and a TTL of value 1 (second).

Topaz programs are executed on the hot path

Topaz’s control plane validates the list of programs and distributes them to our global nameserver instances. As we’ve seen, the list of programs reside in a single, version-controlled YAML file. When an operator changes this file (i.e., adds a program, removes a program, or modifies an existing program), Topaz’s control plane does the following things in order:

  • First, it validates the programs, making sure there are no syntax errors. 

  • Second, it “finalizes” each program’s configuration by evaluating every configuration binding and storing the result. (For instance, to finalize the purple program, it evaluates fetch_datacenters, storing the resulting list. This way our authoritative nameservers never need to retrieve external data.) 

  • Third, it verifies the finalized programs, which we will explain below. 

  • Finally, it distributes the finalized programs across our network.

Topaz’s control plane distributes the programs to all servers globally by writing the list of programs to QuickSilver, our edge key-value store. The Topaz service on each server detects changes in Quicksilver and updates its program list.

When our nameserver service receives a DNS query, it augments the query with additional metadata (e.g., tags) and then forwards the query to the Topaz service (both services run on every Cloudflare server) via Inter-Process Communication (IPC). Topaz, upon receiving a DNS query from the nameserver, walks through its program list, executing each program’s match function (using the topaz-lang interpreter) with the DNS query in scope (with values prefixed with query_). It walks the list until a match function returns true. It then executes that program’s response function, and returns the resulting IP addresses and TTL to our nameserver. The nameserver packages these addresses and TTL in valid DNS format, and then returns them to the resolver. 

Topaz programs are formally verified

Before programs are distributed to our global network, they are formally verified. Each program is passed through our formal verification tool which throws an error if a program has a bug, or if two programs (e.g., the orange_and_true and orange programs) conflict with one another.

The Topaz formal verifier (model-checker) checks three properties.

First, it checks that each program is satisfiable — that there exists some DNS query that causes each program’s match function to return true. This property is useful for detecting internally-inconsistent programs that will simply never match. For instance, if a program’s match expression was (and true false), there exists no query that will cause this to evaluate to true, so the verifier throws an error.

Second, it checks that each program is reachable — that there exists some DNS query that causes each program’s match function to return true given all preceding programs. This property is useful for detecting “dead” programs that are completely overshadowed by higher-priority programs. For instance, recall the ordering of the orange and orange_and_true programs:

- name: orange
  config: …
  match: (= query_domain_tag1 "orange")  
  response: …
- name: orange_and_true
  config: …
  match: (and (= query_domain_tag1 "orange") query_domain_tag2)
  response: …

The verifier would throw an error because the orange_and_true program is unreachable. For all DNS queries for which query_domain_tag1 is ”orange”, regardless of metadata2, the orange program will always match, which means the orange_and_true program will never match. To resolve this error, we’d need to swap these two programs like we did above.

Finally, and most importantly, the verifier checks for program conflicts: queries that cause any two programs to both match. If such a query exists, it throws an error (and prints the relevant query), and the operators are forced to resolve the conflict by changing their programs. However, it only checks whether specific programs conflict — those that are explicitly marked exclusive. Operators mark their program as exclusive if they want to be sure that no other exclusive program could match on the same queries.

To see what conflict detection looks like, consider the corrected ordering of the orange_and_true and orange programs, but note that the two programs have now been marked exclusive:

- name: orange_and_true
  exclusive: true
  config: ...
  match: (and (= query_domain_tag1 "orange") query_domain_tag2)
  response: ...
- name: orange
  exclusive: true
  config: ...
  match: (= query_domain_tag1 "orange") 
  response: ...

After marking these two programs exclusive, the verifier will throw an error. Not only will it say that these two programs can contradict one another, but it will provide a sample query as proof:

Checking: no exclusive programs match the same queries: check FAILED!
Intersecting programs found:
programs "orange_and_true" and "orange" both match any query...
  to any domain...
    with tag1: "orange"
    with tag2: true

The teams behind the orange and orange_and_true programs respectively must resolve this conflict before these programs are deployed, and can use the above query to help them do so. To resolve the conflict, the teams have a few options. The simplest option is to remove the exclusive setting from one program, and acknowledge that it is simply not possible for these programs to be exclusive. In that case, the order of the two programs matters (one must have higher priority). This is fine! Topaz allows developers to write certain programs that absolutely cannot overlap with other programs (using exclusive), but sometimes that is just not possible. And when it’s not, at least program priority is explicit.

Note: in practice, we place all exclusive programs at the top of the program file. This makes it easier to reason about interactions between exclusive and non-exclusive programs.

In short, verification is powerful not only because it catches bugs (e.g., satisfiability and reachability), but it also highlights the consequences of program changes. It helps operators understand the impact of their changes by providing immediate feedback. If two programs conflict, operators are forced to resolve it before deployment, rather than after an incident.

Bonus: verification-powered diffs. One of the newest features we’ve added to the verifier is one we call semantic diffs. It’s in early stages, but the key insight is that operators often just want to understand the impact of changes, even if these changes are deemed safe. To help operators, the verifier compares the old and new versions of the program file. Specifically, it looks for any query that matched program X in the old version, but matches a different program Y in the new version (or vice versa). For instance, if we changed orange_and_true thus:

- name: orange_and_true
  config: …
  match: (and (= query_domain_tag1 "orange") (not query_domain_tag2))
  response: …

Our verifier would emit:

Generating a report to help you understand your changes...
NOTE: the queries below (if any) are just examples. Other such queries may exist.

* program "orange_and_true" now MATCHES any query...
  to any domain...
    with tag1: "orange"
    with tag2: false

While not exhaustive, this information helps operators understand whether their changes are doing what they intend or not, before deployment. We look forward to expanding our verifier’s diff capabilities going forward.

How Topaz’s verifier works, and its tradeoffs

How does the verifier work? At a high-level, the verifier checks that, for all possible DNS queries, the three properties outlined above are satisfied. A Satisfiability Modulo Theories (SMT) solver — which we explain below — makes this seemingly impossible operation feasible. (It doesn’t literally loop over all DNS queries, but it is equivalent to doing so — it provides exhaustive proof.)

We implemented our formal verifier in Rosette, a solver-enhanced domain-specific language written in the Racket programming language. Rosette makes writing a verifier more of an engineering exercise, rather than a formal logic test: if you can express the interpreter for your language in Racket/Rosette, you get verification “for free”, in some sense. We wrote a topaz-lang interpreter in Racket, then crafted our three properties using the Rosette DSL.

How does Rosette work? Rosette translates our desired properties into formulae in first-order logic. At a high level, these formulae are like equations from algebra class in school, with “unknowns” or variables. For instance, when checking whether the orange program is reachable (with the orange_and_true program ordered before it), Rosette produces the formula ((NOT orange_and_true.match) AND orange.match). The “unknowns” here are the DNS query parameters that these match functions operate over, e.g., query_domain_tag1. To solve this formula, Rosette interfaces with an SMT solver (like Z3), which is specifically designed to solve these types of formulae by efficiently finding values to assign to the DNS query parameters that make the formulae true. Once the SMT solver finds satisfying values, Rosette translates them into a Racket data structure: in our case, a sample DNS query. In this example, once it finds a satisfying DNS query, it would report that the orange program is indeed reachable.

However, verification is not free. The primary cost is maintenance. The model checker’s interpreter (Racket) must be kept in lockstep with the main interpreter (Go). If they fall out-of-sync, the verifier loses the ability to accurately detect bugs. Furthermore, functions added to topaz-lang must be compatible with formal verification.

Also, not all functions are easily verifiable, which means we must restrict the kinds of functions that program authors can write. Rosette can only verify functions that operate over integers and bit-vectors. This means we only permit functions whose operations can be converted into operations over integers and bit-vectors. While this seems restrictive, it actually gets us pretty far. The main challenge is strings: Topaz does not support programs that, for example, manipulate or work with substrings of the queried domain name. However, it does support simple operations on closed-set strings. For instance, it supports checking if two domain names are equal, because we can convert all strings to a small set of values representable using integers (which are easily verifiable).

Fortunately, thanks to our design of Topaz programs, the verifier need not be compatible with all Topaz program code. The verifier only ever examines Topaz match functions, so only the functions specified in match functions need to be verification-compatible. We encountered other challenges when working to make our model accurate, like modeling randomness — if you are interested in the details, we encourage you to read the paper.

Another potential cost is verification speed. We find that the verifier can ensure our existing seven programs satisfy all three properties within about six seconds, which is acceptable because verification happens only at build time. We verify programs centrally, before programs are deployed, and only when programs change. 

We also ran microbenchmarks to determine how fast the verifier can check more programs — we found that, for instance, it would take the verifier about 300 seconds to verify 50 programs. While 300 seconds is still acceptable, we are looking into verifier optimizations that will reduce the time further.

Bringing formal verification from research to production

Topaz’s verifier began as a research project, and has since been deployed to production. It formally verifies all changes made to the authoritative DNS behavior specified in Topaz.

For more in-depth information on Topaz, see both our research paper published at SIGCOMM 2024 and the recording of the talk.

We thank our former intern, Tim Alberdingk-Thijm, for his invaluable work on Topaz’s verifier.

AI Industry is Trying to Subvert the Definition of “Open Source AI”

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/11/ai-industry-is-trying-to-subvert-the-definition-of-open-source-ai.html

The Open Source Initiative has published (news article here) its definition of “open source AI,” and it’s terrible. It allows for secret training data and mechanisms. It allows for development to be done in secret. Since for a neural network, the training data is the source code—it’s how the model gets programmed—the definition makes no sense.

And it’s confusing; most “open source” AI models—like LLAMA—are open source in name only. But the OSI seems to have been co-opted by industry players that want both corporate secrecy and the “open source” label. (Here’s one rebuttal to the definition.)

This is worth fighting for. We need a public AI option, and open source—real open source—is a necessary component of that.

But while open source should mean open source, there are some partially open models that need some sort of definition. There is a big research field of privacy-preserving, federated methods of ML model training and I think that is a good thing. And OSI has a point here:

Why do you allow the exclusion of some training data?

Because we want Open Source AI to exist also in fields where data cannot be legally shared, for example medical AI. Laws that permit training on data often limit the resharing of that same data to protect copyright or other interests. Privacy rules also give a person the rightful ability to control their most sensitive information ­ like decisions about their health. Similarly, much of the world’s Indigenous knowledge is protected through mechanisms that are not compatible with later-developed frameworks for rights exclusivity and sharing.

How about we call this “open weights” and not open source?

8 промени в образованието, които могат да се въведат и без чаканата реформа

Post Syndicated from original https://www.toest.bg/8-promeni-v-obrazovenieto-koito-mogat-da-se-vuvedat-i-bez-chakanata-reforma/

8 промени в образованието, които могат да се въведат и без чаканата реформа

От десетилетия вече се обсъжда така необходимата реформа на образователната система у нас. Спорадичните мерки, които се вземат, много често са чисто административни и не засягат основния потърпевш от неадекватността на системата – самия ученик. Учители и директори обаче упорито не използват свободата, давана от сравнително широката рамка, в която могат да се движат при изпълнение на задълженията си. И всичко, което не е „предписано“ изрично, бива заобикаляно старателно като нещо отровно, като натрапен свободноизбираем час.

Блокови часове

Два поредни часа математика. Няма наредба, която да забранява това. Единственото изискване е да има почивка след всеки учебен час.

40-минутните часове са абсолютно недостатъчни, ако трябва да съберем в един от тях преподаване на нов урок и затвърждаването на новия материал. Оставането по-дълго в една тема (което препълнените учебни планове правят и бездруго невъзможно) е гаранция за по-голям успех. Учудващо, това се практикува в множество частни училища, които показват нагледно колко е ползотворно, но общинските и държавните училища упорито отказват да променят заучената формула за 7 различни учебни часа на ден. 

Член 2 от Наредба 10 от 19.06.2014 гласи:

Седмичните учебни разписания трябва да осигуряват условия за най-добро усвояване на учебното съдържание при най-малка степен на умора и опазване здравето на учениците.

А член 3, алинея 2 от същата наредба казва:

Учебните предмети в дневното разписание се подреждат в последователност, осигуряваща оптимална работоспособност.

Най-доброто усвояване и по-добрата работоспособност биха били една идея по-възможни, ако не трябва на по-малко от астрономически час децата да превключват в коренно различни сфери на знание. 

Трябва да признаем, че макар и минимални, все пак тук-там намираме взаимовръзките между отделните предмети в рамките на една учебна година. Дали не е време някой да се престраши и да въведе дни на хуманитарните науки и дни на естествените науки например? Само да не вземе да му хрумне да събира и учителите по тези предмети заедно, това вече би надскочило най-големите ни мечти. Но няма страшно, със сигурност някоя наредба го прави толкова трудно, че никой не си го е и помислял.

Подредба в стаята

Все повече училища успяват да осъвременят класните стаи и да приютят учениците в нови, по-леки и по-удобни учебни пространства. Рядко се прави обаче с мисълта дали новите мебели ще са от полза за работа в групи, дали ще са достатъчно мобилни, за да може да се разместват лесно, а още по-рядко зад избора им стои мисълта дали ще допринесат за учебния процес. Вече споменаваната плачевна учебна среда всъщност без усилия и без погазването на указанията в безбройните наредби може да бъде използвана за подобряване на работната атмосфера, за разнообразяване на видовете дейности и за радост на учениците. 

Реалността обаче показва, че учителите обичат все още да стоят пред класа, а не да са в центъра му. Разместването пък обикновено е само при проблеми с дисциплината. 

Нека учителите не се притесняват да бъдат смели и да разместят мебелите в стаята. Така и те самите ще могат да я погледнат с нови очи.

Носене на учебници

Учебникът е само ръководство, в което се съдържа рамката. Как точно ще преподадем един урок, е въпрос на желание и способности. Почти всички учители държат всяко дете да носи своя личен учебник, въпреки че това по никакъв начин не допринася за по-доброто научаване на материала. 

Вариантите тук са много и никой от тях не е забранен:

  • организиране на комплект учебници, които да стоят в кабинета по дадения предмет;
  • ползване на един учебник на чин – това е сериозно облекчение, ако се има предвид колко тежки са раниците на децата;
  • разрешаване (поне от време на време) на използването на електронен учебник по време на час;
  • изнасяне на урок без учебник – напълно е възможно учебникът понякога да остане неизползван.

Домашни от миналия век

Да научиш наизуст „Отечество любезно, как хубаво си ти!“ в шести клас не е геройство, нито означава, че си разбрал стихотворението. Да не говорим за оценяването тип „до трети куплет за четворка“, което малко прилича на „550 грама Иван Вазов за 4 лева“. Децата могат да бъдат разходени по текста, може той да се свърже със съвремието – нещо, което е все по-нужно. А ако стихотворението толкова ги е впечатлило, те може и сами да пожелаят да го научат. 

Писането на думи по три реда като домашно по който и да е чужд език не води задължително до научаването им. Безплатни приложения като Quizlet например, където думите могат да бъдат вкарани във виртуални карти, със сигурност ще подтикнат учениците да заобичат по-силно и учителя, и чуждия език. На преподавателя ще му коства около 10 минути на урок, но пък ще може впоследствие да използва картите всяка година. В същото време децата ще са усвоили страхотна и доказана техника на учене, създадена през миналия век в Германия – Karteikarten.

Повече диалог, не само цифри

Оценката е цифра и тя е еднаква за всички, които са я получили. Получателите обаче са различни. Не бива цифрите да говорят вместо учителите. Би било добре преподавателите да се опитват да разберат какво друго има зад конкретното представяне на ученика. Може детето да не е научило, защото е уморено (защо?), защото е претоварено (защо?), защото нарочно не е научило (защо?). Не бива да се диагностицира моментното състояние, а това, което е довело до него. 

Полезни умения в часа на класа

Чудесен час, в който може да се изгражда истинска връзка между децата от класа и учителя. И това е час, който зависи само от вас – преподавателя. Може да се канят интересни гости, да се провеждат дискусии, да се покаже, че освен класен ръководител, учителят е просто човек като всички. Може дори децата да се водят на разходка. Нищо от посоченото не е забранено. 

В учебната година има 36 седмици, тоест 36 часа на разположение. Дори безумното разпределение за задължителните теми, които са почти идентични с тези от предходни години (тук посочвам темите за 7. клас), могат да излязат от калъпа на скучните монолози.

  • Патриотично възпитание и изграждане на национално самочувствиe – 4 часа. Класните ръководители имат 4 часа да разкажат и покажат какво обичат в България и защо, при положение че са свободни хора и са избрали да са тук. Може да бъде поканен гост в час – някой, завърнал в България след години в чужбина. 
  • Толерантност и интеркултурен диалог – 2 часа. Гостът от предишната точка може да разкаже как е живял в чуждата културна среда. Може да бъде поканен издател или преводач – едни от най-добрите културни посланици.
  • Безопасност и движение по пътищата – 5 часа. Децата и класният застават заедно на кръстовище и търсят правилно/грешно. Ще има голямо разнообразие.
  • Защита на населението при бедствия, аварии и катастрофи; оказване на първа помощ – 5 часа. Чудесно време за първия истински курс по оказване на първа помощ. Има чудесни организации, които провеждат обучения по изключително интересен и достъпен начин.
  • Превенция на насилието, справяне с гнева и с агресията; мирно решаване на конфликти – 2 часа. Може да бъде поканен психологът на училището или медиатор – нека разкажат за работата си. Може да се разучат техники за справяне с гнева, които учителят да предаде на децата.
  • Превенция на тероризма и поведение при терористична заплаха; киберзащита – 2 часа. Варианти: гледане на филм, посещение с децата в полицията и разговор с полицаите.
  • Превенция и противодействие на корупцията – 1 час. Лесно може да се покаже на живо какво е корупция, дори без да се политизира разговорът, защото, за съжаление, корупцията е навсякъде. Децата могат да бъдат провокирани да помислят как биха постъпили, ако: а) могат да променят законите; б) могат да дадат подкуп; в) ги хванат, че подкупват; и още, и още.
  • Електронно управление и медийна грамотност – 1 час. Варианти за провеждане на часа: как работи електронният подпис или приложението е-здраве.
  • Кариерно ориентиране – 1 час. Тук може да се отдели повече време, защото със сигурност има много остатъчни часове. Може да се проведе като посещение в различни производства или офиси, за да проследят децата процесите на практика и да наблюдават, макар и за малко, реалния професионален живот.

Изнесени уроци

Като говорим за разходки, няма час, който да не може да бъде проведен навън. Или с гости. „Уроците“ могат да бъдат поканени в класната стая или класът да бъде заведен при живия урок. Учителят има пълното право на това и никой не би трябвало да го спре.

Отношение

Ако учителите се отнасят с уважение към учениците си, ще получават същото от децата. Не бива да ги подценяваме или унижаваме. Нека всеки от нас се опита да си спомни какви сме били на тяхната възраст. А понякога не е лошо дори да им го разказваме. Да, може учителите да разкажат за себе си, когато са били на мястото на учениците, да ги спечелят за каузите си и да им покажат, че и те са хора и не са перфектни. Никой не е. Особено когато се учи. 

Наказанията рядко водят до добри резултати. Обикновено водят до страх и омраза. Викането в час от страна на учителите все още е широко разпространено явление, а директорите, които са наясно с тези членове на екипите им, най-често ги оправдават като силно емоционални или просто „с такъв тембър“.

Редовните контроли, които се провеждат вътрешно в училище или от външни експерти, често водят до това учителите да впрегнат цялото си въображение, да използват мултимедийната техника в клас, която никога не са докосвали, да направят интересен и полезен за учениците час, да са в добро настроение и да не повишават тон. 

В следващия учебен час обаче, когато няма проверяващи, всичко се връща постарому. 

Така учениците нагледно получават „най-важните“ уроци. Единият е, че всичко е привидно, за парлама. А другият – че не е нужно да се прави нищо извън задължителното, нищо по лична инициатива. А това възпитава ново поколение от непредприемчиви и незаинтересовани бъдещи учители, служители и членове на обществото, повечето от които няма да знаят, че е възможно да излязат извън строго задължителното. 

Каквото и да си говорим, колкото и чудесни програми да има, колкото и неизброими да са всички възможности, незабранени от наредби и закони, на първо място в инструментариума са самите учители. 

Когато оставят за малко учебника настрани и покажат ангажираност, когато се опитат да излязат дори със сантиметър извън задължителния минимум и въведат нов похват, когато просто се усмихват или простят чуждата грешка без назидание, тогава вече са спечелили класа си и са на прав път. И децата първи ще им го покажат.

Добре дошли в реалността

Post Syndicated from Йоанна Елми original https://www.toest.bg/dobre-doshli-v-realnostta/

Добре дошли в реалността

Доналд Тръмп е 47-мият президент на САЩ. Въпреки неписаното правило много журналисти и анализатори си позволиха да прекарат последните седмици в прогнози кой ще победи – и предсказанието обикновено завършваше с победа за кандидата на демократите Камала Харис. Можем да си кажем само: добре дошли в реалността. И да се опитаме да отговорим на въпроса, който всички вероятно си задават след нощта на 5 ноември: 

Как е възможно? 

Победата на Тръмп не е просто победа – той става първият републикански кандидат-президент, спечелил популярния вот от 20 години насам. Освен това се очертава т.нар. червена вълна – Камарата на представителите, както и Сенатът ще бъдат под контрола на Републиканската партия. Тръмп отбелязва преднина почти навсякъде в САЩ – както в традиционно демократични анклави, така и в провинцията. Тръмп получава по-голяма подкрепа и от кандидатите на републиканците по населени места, т.е. повече хора са подкрепили него като кандидат-президент на републиканците, отколкото кандидатите на същата партия за сенатори и представители. 

Тръмп отбелязва преднина сред всички демографски групи, с изключение на хората над 65 години и белите образовани жени. Екзитполовете показват много вероятни причини за това извън традиционната – че при влошена икономическа ситуация в страната избирателите обикновено наказват партията, която е на власт; че във времена на кризи, войни и несигурност хората винаги предпочитат авторитарни лидери. 

Една от причините е силното недоволство относно „посоката, в която върви САЩ“. 43% от запитаните в екзитпол на CNN казват, че не са доволни, а 29% са гневни. Едва 19% са доволни, а смешните 7% са „ентусиазирани“. В този контекст изборът на Тръмп не е просто подкрепа за републиканска идеология или икономическа доктрина (както казахме, Тръмп се представя по-добре от партийните представители по места). Изборът на Тръмп е нещо повече: определен неприличен жест, който американският избирател показва на цялата система. 

Руската рулетка на съвременната демокрация 

Писали сме по темата достатъчно, за да няма нужда да го повтаряме: американската демокрация е в криза и причината не е Доналд Тръмп. Става въпрос за криза на доверието в институциите, на вярата в американския идеал и на възможността „американската мечта“ да бъде реализирана, както и на конфронтация с нейните исторически противоречия. Тази криза се разиграва в контекста на световна криза и възход на авторитаризма поради потребност от силни, харизматични лидери – приказка, стара като историята. Понякога кризите имат късмета да попаднат на лидери като Рузвелт. Друг път… Друг път – не.

Kраят на XX и началото на XXI в. може да се разглежда като бавен разпад на идентичността, с която САЩ съществува в света след Втората световна война. Униженията, грешките и лицемерието на войните във Виетнам, Афганистан, Ирак, Сирия; финансовата криза и спасяването на спекулантите за сметка на обществото – модел, който след това се повтаря и с опиоидната криза, за която има отговорни, но няма последствия; разпадът на СССР и изчезването на идеологическия враг, спрямо когото Америка разглежда себе си като „лидер на свободния свят“ и добър герой; разрастването на медийни таблоидни империи като тази на Роджър Ейлс и Рупърт Мърдок (които не просто промениха информационния поток на новото време, а го съчетаха с изпипани пропагандни техники от миналото) в комбинация с възхода на технологиите и комуникационната революция – всичко това гние и ерозира почвата на (неолибералната) демокрация много преди Доналд Тръмп да се появи на сцената. 

В подобни моменти, в които става ясно, че цялостната система буксува, е въпрос на време коя от „традиционните“ партии ще изчезне първа. Подобни политически турбуленции има и в късния XIX век в Европа, преди появата на т.нар. масови партии, от които се зараждат съвременните леви и десни движения. Индустриализацията, както и революцията в комуникациите и транспорта водят до зараждането на нови идеологии и до потребността от нови политически сили, които да представляват работниците и интересите на бизнеса – съответно леви (като лейбъристите във Великобритания, чието име буквално идва от глагола за работа и труд) и десни. 

В епохата на технологична революция сме свидетели на подобни процеси: във Франция първо се сринаха левите партии, след това през 2017 г. центристкото движение на Макрон издърпа електорат и отляво, и отдясно, а на последните избори дясното срина границата между център и крайнодясно. Какво ще запълни вакуума, предстои да разберем. Първите знаци за този процес в САЩ са към 2010 г. в консервативната партия, в която се появяват наченки на реакционерски движения. През 2024 г. партията е много по-различна идеологически и много по-близо до тази политика на реакцията, отколкото преди десетина години – именно поради посочените процеси. 

Сега е ред на демократите. В почти всеки от ключовите райони Харис не успява да подобри постигнатото от Байдън и дори губи гласоподаватели, обратно на Тръмп. Подобни мащабни тенденции не могат да бъдат редуцирани до проблеми с кандидатите или кампанията, защото кампанията на Харис беше на много високо ниво и усърдно се избягваха по-противоречиви похвати, като например да се залага на политиката на идентичностите.

Проблемът е другаде – вероятно в посланията и в призивите на демократите за връщане на уважението и умереността спрямо статукво, което вече не работи в полза на средната класа и на средностатистическия гласуващ: той не вижда успешната икономика, справедливите институции и прекрасните плодове на демокрацията, независимо колко го убеждават в тях политиците, експертите и медиите. Това се дължи отчасти на факта, че в САЩ политиците, експертите и журналистите все повече съществуват в изолирани балони или са част от икономически и културни елити, което налива масло в огъня на отровния популизъм. 

Победата на Тръмп също е пирова, защото е слабо вероятно той да е спасителят, на когото избирателите се надяват. Идеолозите на Тръмп се опитват да го превърнат в нещо повече от протестен вот или идол на определена демография, но политическите му реформи не станаха реалност по време на първия мандат и е слабо вероятно да доведат до мир и просперитет по време на втория. Специалисти по икономика сочат, че дефицитът и бюджетът ще се раздуят по различни начини от демократите и републиканците, но със сигурност нито ще се свият, нито е възможно завръщане към икономиката на миналото в света на настоящето. 

Външнополитически Тръмп се очертава като бедствие във време на зараждащи се световни конфликти. Въпросът е коя партия ще бъде реформирана из основи първа и как – дали демократите ще бъдат достатъчно смели действително да се изправят срещу дефицитите в системата, преди републиканците да я доизкривят в своя полза, и колко ще продължи агонията на идеологиите на миналото и на опита да се прилагат в бъдещето. Не е ясно и как е възможно да се постигне относително изравняване в общество с все по-големи неравенства без тежки кризи и катаклизми

Какво е Америка? 

Американският вътрешен срив има своите външни еквиваленти най-вече за световния ред, който след 1945 г. разчита на САЩ като хегемон и водач на свободния свят. САЩ триумфират във Втората световна война и помагат за възстановяването на Европа. Така пишат героичната история, която днес все още се разказва с патос и се е превърнала в основен символ за западните ни общества, носейки със себе си определен набор от ценности и клишета. 

Този патос обаче пречи както на САЩ, така и на Европа да поставят под въпрос идеята за абсолютен героизъм и доброта у държава, която например изпраща свои несвободни чернокожи (формално) граждани на фронтовете без десегрегация, след което им отказва помощите и привилегиите на ветераните, прибрали се у дома като герои. Забравена е и историята, че преди японците да атакуват Пърл Харбър, американците са изключително разделени относно участието на САЩ в световния конфликт. Традиционният за страната изолационизъм се проявява в анкетите от онова време: към януари 1940 г. 88% от американците считат, че Америка изобщо не трябва да се намесва в европейския конфликт и да застава срещу Хитлер. През юни 1940 г. само 35% смятат, че американците трябва да рискуват живота си, за да помогнат на британците. Нацизмът има множество последователи в САЩ по онова време (и така до днес). 

В навечерието на Втората световна война в САЩ действа законът за имиграционните квоти, който е наложен преди Голямата депресия и допълнен след нея. През 1939 г. квотата за имигранти от Германия е 27 370 души. Над 300 000 германски граждани, повечето от еврейски произход, подават документи за емиграция година по-рано. Одобрени са малко над 20 000. Освен това САЩ отказват визи на емигранти, които могат да бъдат в тежест на социалната система. Това се оказва проблемно за евреите, чието имущество е отнето от нацистите, тоест често остават без всякакъв финансов ресурс. Когато Франсис Пъркинс, секретар на труда в кабинета на Рузвелт, предлага президентът да издаде указ, за да се улесни влизането на бежанци, които са преследвани поради расови или религиозни причини, кабинетът се противопоставя, за да не създаде напрежение в отношенията между САЩ и Германия, както и да не предизвика гнева на безработните американци.

Можем да разкажем всичко това и още много не с цел да представим нещата в черно-бяла светлина, а именно за да избягаме от черно-белите митове на Студената война, които умират грозно пред очите ни. Защото Америка е била герой. Но е била и злодей – същият злодей, в когото сега не можем да повярваме. 

САЩ са изградени от противоречия, които никога не са си отивали. От една страна, имаме красотата на демокрацията, възможностите, свободата, равенството. От друга – жестокостта, която прави тази красота възможна: колониализмът, геноцидът, робството, окупацията и войната. Докато това противоречие не се осмисли и разреши, то ще се връща, а държавата – както и светът – ще бъдат гонени от призрака на първородния грях, който е създал тази държава и все още е част от нея,

написа в профила си във Facebook американо-виетнамският писател Виет Тан Нгуен. И за САЩ, както и за света, е крайно време да се раздели с мита за американската изключителност. В истинския свят митовете, съвършените герои и линейното движение на сюжетa към неминуем прогрес и щастлив край не съществуват. В истинския свят битката за човещина винаги е актуална, поколение след поколение, и никой не е имунизиран нито срещу чуждата жестокост, нито срещу своята собствена. 

И сега? 

Демократката Сара Макбрайд беше избрана и е първата трансджендър представителка в долната камара. Анди Ким пък е първият американец от корейски произход в долната камара. В референдум (в САЩ по време на изборите може да се гласува по отделни въпроси, предложени на щатско ниво) червеният щат Мисури отхвърли тоталната забрана на абортите и ще узакони правото на прекратяване на бременността. Другото предложение в Мисури, което също бе прието, е увеличаване на минималната работна заплата на 15 долара на час – и двете са мерки и политики на демократите и това, че се приемат в традиционно червени щати, със сигурност е добър знак. 

Аризона, Невада, Колорадо и Монтана също ще защитят правото на аборт след подобни инициативи, както Ню Йорк и Мериленд, където такива мерки са по-скоро очаквани. Във Флорида т.нар. Четвърта поправка имаше същата цел и получи по-голяма подкрепа – 57% в полза на узаконяването, но не беше преминат прагът от 60%, необходим за приемането на закон. Правото на аборт се очертава като втория най-наболял проблем за повечето американци след икономиката и това се отрази и на посланията. Голяма част от усилията на кампанията на Тръмп бяха насочени към убеждаване на жените, че той не им е враг – въпреки отвратителната реторика спрямо жените, която бъдещият президент и негови подчинени си позволяват. Самият Тръмп отхвърли федерална забрана на абортите – нещо, което евангелското и религиозното крило на Републиканската партия трудно ще преглътне. 

Защо думите вече нямат значение? Вероятно защото повечето хора не смятат, че са подкрепени с действия. Пред политиците, които обещават и не правят нищо (например демократите имаха десетилетия, за да защитят правото на аборт, тъй като беше ясно, че „Роу срещу Уейд“ не е достатъчно като решение), избирателите предпочетоха онези, които престъпват всякакви норми и не уважават традицията, но поне „казват нещата така, както са“ (аргумент на гласуващите за Тръмп). Изборът на Тръмп не е знак, че американците са неморални и отвратителни хора (както Хилари Клинтън нарече избирателите на Тръмп deplorables). Той е сигнал за бунт, революция и промяна, както знае всеки, който може да издържи да слуша революционния и патриотичен патос на алтернативните консервативни медии. 

За журналистите и медиите, които все още не смятат всеки републикански гласоподавател за deplorable, е ясно, че в живия живот повечето републиканци не са расисти, сексисти или ксенофоби, нито считат, че техните кандидати подкрепят това – точно обратното. Ако човек се задълбочи в аргументацията (независимо колко пробита е тя на моменти и колко си противоречи с реални действия и думи), ще чуе, че всъщност републиканците твърдят, че правят същото като демократите: грижат се за малкия човек, независимо от неговия пол, религия и вероизповедание, пазят свободата на словото и спасяват нацията си от демократите, които са фашисти и комунисти. 

Няма време за сълзи 

Добрата новина е и лошата новина – този когнитивен дисонанс и племенно мислене означават, че светът на миналото си отива. Историческото изключение на добрия победител във Втората световна война, на социалната държава на 60-те и на просперитета на Америка в сравнение с разрушената Европа и света приключва. Америка може би ще бъде велика отново, но никога вече така – най-малкото защото това е невъзможно и светът е друг. 

Важният въпрос е как ще си отиде миналото, какво и колко ще помете по пътя си. Ще предизвика ли Тръмп трета световна война, въпреки че е избран с празните обещания, че ще я прекрати? Надяваме се, че не. Ще се сбъднат ли най-страшните пророчества и ще се превърне ли САЩ в неофашистка автокрация? Надали. Но няма и нужда – капиталът и без това решава политиката в страната, в която Върховният съд отдавна взе решение парите да се третират като свобода на словото и съответно политиката да минава през неизбрани свръхбогаташи и корпорации, на които законите гледат като на лица с права или най-добре като на донори на луксозни ваканции и каравани (както е в случая с един от върховните съдии). И преди сме писали, че по линия на всички тези проблеми републиканците и демократите мислят еднакво. В най-добрия случай скоро ще го разберат. 

Едно от неофициалните обещания на Доналд Тръмп е да даде на Илън Мъск контрол над държавния апарат, който най-богатият човек в света (но иначе представящ се за антиелит и аутсайдер) да може да употреби за собствени цели. В идеите за по-малко или повече държава вече вярват само хора, които изпитват облекчение да водят спорове за миналото, защото реалността е непоносима. Държава ще има, и то много – въпросът е на кого и как ще служи, както и кой ще бъде подчинен и до каква степен. Това е колкото дистопично, толкова и вероятно, и точно толкова въпрос на формалност – със или без Тръмп, това е реалността, в която бавно се сваряваме, защото институциите буксуват както в САЩ, така и в ЕС и отвъд (само преди дни стартира нова журналистическа инициатива, която цели да изложи на показ корупцията и липсата на прозрачност в ЕС, където финансовите интереси също все по-често диктуват политиката). 

Независимо от резултата на 5 ноември, половината от една държава – а и от един свят – щеше да се събуди с усещането, че светът свършва. И това не е плод на един политик, една идеология и едни избори, а на средата и времената ни, на процеси, много по-големи от нас. В крайна сметка всеки диктатор е левиатан, направен от всички, които го подкрепят. Но светът все още е тук. Засега. И какво друго остава на всеки от нас, освен да се бори – достойно и по възможност човешки – за онова, в което вярва? Не слушайте мен, чуйте Джон Стюарт. И дишайте. Няма време за сълзи.

Amazon OpenSearch Service announces Standard and Extended Support dates for Elasticsearch and OpenSearch versions

Post Syndicated from Arvind Mahesh original https://aws.amazon.com/blogs/big-data/amazon-opensearch-service-announces-standard-and-extended-support-dates-for-elasticsearch-and-opensearch-versions/

Amazon OpenSearch Service supports 19 versions of Elasticsearch opensource, and 11 versions of OpenSearch. Over the years, we have added several stability, resiliency, and security features to recent engine versions, helping customers derive better value from OpenSearch Service. As software versions grow older, we need to make sure that these versions continue to meet high security and compliance standards. Many of the legacy versions supported on OpenSearch Service, such as Elasticsearch versions 1.5 and 2.3, depend on third-party dependencies that are no longer actively supported. By moving to the latest engine versions, customers can derive maximum benefit from the new features, improved price-performance, and security improvements we make to OpenSearch.

Today, we’re announcing timelines for end of Standard Support and Extended Support for legacy Elasticsearch versions up to 6.7, Elasticsearch versions 7.1 through 7.8, OpenSearch versions from 1.0 through 1.2, and OpenSearch versions 2.3 through 2.9 available on Amazon OpenSearch Service. Versions that are under Standard Support receive regular bug fixes and security fixes, and versions in Extended Support receive critical security fixes and operating system patches for an additional flat fee per normalized instance hour. With Extended Support, we want to make sure that our customers continue to receive critical security fixes for an adequate time, while they plan to upgrade to more recent engine versions. For more details on Extended Support please see the FAQs.

End of Standard Support and Extended Support for Elasticsearch versions

See Table 1 that follows for end of Standard Support and Extended Support dates for legacy Elasticsearch versions available on OpenSearch Service. We recommend that customers running Elasticsearch versions upgrade to the latest OpenSearch versions. All Elasticsearch versions will receive at least 12 months of Extended Support, and version 5.6 will receive 36 months of Extended Support. After Extended Support ends for a version, domains running the specific version will not receive bug fixes or security updates.

Software version End of Standard Support End of Extended Support
Elasticsearch versions 1.5 and 2.3 11/7/2025 11/7/2026
Elasticsearch versions 5.1 to 5.5 11/7/2025 11/7/2026
Elasticsearch version 5.6 11/7/2025 11/7/2028
Elasticsearch versions 6.0 to 6.7 11/7/2025 11/7/2026
Elasticsearch version 6.8 Not announced Not announced
Elasticsearch versions 7.1 to 7.8 11/7/2025 11/7/2026
Elasticsearch version 7.9 Not announced Not announced
Elasticsearch version 7.10 Not announced Not announced

End of Standard Support and Extended Support for OpenSearch versions

For OpenSearch versions running on Amazon OpenSearch Service, we will provide at least 12 months of Standard Support after the end of support date for the corresponding upstream open source OpenSearch version, or 12 months of Standard Support after the release of the next minor version on OpenSearch Service, whichever is longer. All OpenSearch versions will receive at least 12 months of Extended Support after the end of Standard Support date. For more details, check the open source OpenSearch maintenance policy.

See Table 2 that follows for end of Standard Support and Extended Support dates for various OpenSearch versions available on OpenSearch Service. For future updates on versions in Standard Support and Extended Support, follow supported versions.

Software Version End of Standard Support End of Extended Support
OpenSearch versions 1.0 to 1.2 11/7/2025 11/7/2026
OpenSearch version 1.3 Not announced Not announced
OpenSearch versions 2.3 to 2.9 11/7/2025 11/7/2026
OpenSearch versions 2.11 and higher versions Not announced Not announced

Upgrading OpenSearch Service domains: We recommend that you update your domains to the latest available OpenSearch version to derive maximum value out of OpenSearch Service. Minor version upgrades on OpenSearch tend to be seamless because they don’t contain breaking changes, and we recommend moving to the latest minor version, or a version for which end of support has not yet been announced. For example, if you are on OpenSearch version 1.2, you can move to OpenSearch version 1.3, because it’s the last minor version of the 1.x series and because presently it continues to be supported by the open source community and AWS. If you want to choose an Elasticsearch version, and you are running an older 6.x or 7.x version, you can move to version 6.8, or 7.10.

There are various ways to upgrade your cluster to a newer version, and the steps vary depending on the version your domain is running and the version you want to upgrade to. See Upgrading OpenSearch Service domains for detailed instructions on upgrading your domain to a new version. You can also use the Migration Assistant for Amazon OpenSearch Service for upgrading to newer versions

Calculating Extended Support charges: Domains running versions under Extended Support will be charged a flat additional fee per normalized instance hour (NIH). For example, $0.0065 per NIH in the US East (North Virginia) AWS Region. See the pricing page for exact pricing by Region.

NIH is computed as a factor of the instance size (for example, medium or large), and the number of instance hours. For example, if you’re running an m7g.medium.search instance for 24 hours in the US EAST (North Virginia) Region, which is priced at $0.068 per instance hour (on-demand), you will typically pay $1.632 ($0.068×24). If you’re running a version that is in Extended Support, you will pay an additional $0.0065 per NIH, which is computed as $0.0065 x 24 (number of instance hours) x 2 (size normalization factor, which is 2 for medium-sized instances), which comes to $0.312 for Extended Support for 24 hours. The total amount that you will pay for 24 hours will be a sum of the standard instance usage cost and the Extended Support cost, which is $1.944 ($1.632+$0.312, excluding storage cost). The following table shows the normalization factor for various instance sizes in OpenSearch Service.

Instance size Normalization Factor
nano 0.25
micro 0.5
small 1
medium 2
large 4
xlarge 8
2xlarge 16
4xlarge 32
8xlarge 64
9xlarge 72
10xlarge 80
12xlarge 96
16xlarge 128
18xlarge 144
24xlarge 192
32xlarge 256

Summary

We add new capabilities across various vectors to the latest OpenSearch versions, which include new features, performance and resiliency improvements, and security improvements. We recommend that you update to recent OpenSearch versions to get the most benefit out of OpenSearch Service. For any questions on Standard and Extended Support options, see the FAQs. For further questions, contact AWS Support.


About the authors

Arvind Mahesh is a Senior Manager-Product at Amazon Web Services for Amazon OpenSearch Service. He has close to two decades of technology experience across a variety of domains such as Analytics, Search, Cloud, Network Security, and Telecom.

Kuldeep Yadav is a Senior Technical Program Manager at Amazon Web Services who is passionate about driving innovation and complex problem solving. He works closely with teams and customers in ensuring operational excellence and achieving more with less. Outside of work he enjoys trekking and all sports

Jon Handler is a Senior Principal Solutions Architect at Amazon Web Services based in Palo Alto, CA. Jon works closely with OpenSearch and Amazon OpenSearch Service, providing help and guidance to a broad range of customers who have search and log analytics workloads that they want to move to the AWS Cloud. Prior to joining AWS, Jon’s career as a software developer included 4 years of coding a large-scale, ecommerce search engine. Jon holds a Bachelor of the Arts from the University of Pennsylvania, and a Master of Science and a PhD in Computer Science and Artificial Intelligence from Northwestern University.

Introducing Express brokers for Amazon MSK to deliver high throughput and faster scaling for your Kafka clusters

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/introducing-express-brokers-for-amazon-msk-to-deliver-high-throughput-and-faster-scaling-for-your-kafka-clusters/

Today, we’re announcing the general availability of Express brokers, a new broker type for Amazon Managed Streaming for Apache Kafka (Amazon MSK). It’s designed to deliver up to three times more throughput per-broker, scale up to 20 times faster, and reduce recovery time by 90 percent as compared to Standard brokers running Apache Kafka. Express brokers come preconfigured with Kafka best practices by default, support Kafka APIs, and provide the same low latency performance that Amazon MSK customers expect, so they can continue using existing client applications without any changes.

Express brokers provide improved compute and storage elasticity for Kafka applications when using Amazon MSK provisioned clusters. Amazon MSK is a fully-managed AWS service that makes it easier for you to build and run highly available and scalable applications based on Apache Kafka.

Let’s dive deeper into some of the key features that Express brokers have and the benefits they provide:

  • Easier operations with hands-free storage management – Express brokers offer unlimited storage without preprovisioning, eliminating disk-related bottlenecks. Cluster sizing is simpler, requiring only ingress and egress throughput divided by recommended per-broker throughput. This removes the need for proactive disk capacity monitoring and scaling, simplifying cluster management and improving resilience by eliminating a potential failure source.
  • Fewer brokers with up to three times throughput per broker – Higher throughput per broker allows for smaller clusters for the same workload. Standard brokers’ throughput must account for client traffic and background operations, with m7g.16xl Standard brokers safely handling 154 MBps ingress. Express brokers use opinionated settings and resource isolation, enabling m7g.16xl size instances to safely manage up to 500 MBps ingress without compromising performance or availability during cluster events.
  • Higher utilization with 20 times faster scaling – Express brokers reduce data movement during scaling, making them up to 20 times faster than Standard brokers. This allows for more quicker and reliable cluster resizing. You can monitor each broker’s ingress throughput capacity and add brokers within minutes, eliminating the need for over-provisioning in anticipation of traffic spikes.
  • Higher resilience with 90 percent faster recovery – Express brokers are designed for mission-critical applications requiring high resilience. They come preconfigured with best-practice defaults, including 3-way replication (RF=3), which reduce failures due to misconfiguration. Express brokers also recover 90 percent faster from transient failures compared to standard Apache Kafka brokers. Express brokers’ rebalancing and recovery use minimal cluster resources, simplifying capacity planning. This eliminates the risk of increased resource utilization and the need for continuous monitoring when right-sizing clusters.

You have choice options in Amazon MSK depending on your workload and preference:

MSK provisioned MSK Serverless
Standard brokers Express brokers
Configuration range Most flexible Flexible Least flexible
Cluster rebalancing Customer managed Customer managed
but up to 20x faster
MSK managed
Capacity management Yes Yes (compute only) No
Storage management Yes No No

Express brokers lower costs, provide higher resiliency, and lower operational overhead, making them the best choice for all Kafka workloads. If you prefer to use Kafka without managing any aspect of its capacity, its configuration, or how it scales, then you can choose Amazon MSK Serverless. This provides a fully abstracted Apache Kafka experience that eliminates the need for any infrastructure management, scales automatically, and charges you on a pay-per-use consumption model that doesn’t require you to optimize resource utilization.

Getting started with Express brokers in Amazon MSK
To get started with Express brokers, you can use the Sizing and Pricing worksheet that Amazon MSK provides. This worksheet helps you estimate the cluster size you’ll need to accommodate your workload and also gives you a rough estimate of the total monthly cost you’ll incur.

The throughput requirements of your workload are the primary factor in the size of your cluster. You should also consider other factors, such as partition and connection count to arrive at the size and number of brokers you’ll need for your cluster. For example, if your streaming application needs 30 MBps of data ingress (write) and 80 MBps data egress (read) capacity, you can use three express.m7g.large brokers to meet your throughput needs (assuming the partition count for your workload is within the maximum number of partitions that Amazon MSK recommends for an m7g.large instance).

The following table shows the recommended maximum ingress, egress, and partition counts per instance size for sustainable and safe operations. You can learn more about these recommendations in the Best practices section of Amazon MSK Developer Guide.

Instance size Ingress (MBps) Egress (MBps)
express.m7g.large 15.6 31.2
express.m7g.4xlarge 124.9 249.8
express.m7g.16xlarge 500.0 1000.0

Once you have decided the number and size of Express brokers you’ll need for your workload, go to the AWS Management Console or use the CreateCluster API to create an Amazon MSK provisioned cluster.

When you create a new cluster on the Amazon MSK console, in the Broker type option, choose Express brokers and then select the mount of compute capacity that you want to provision for the broker. As you can see in the screen shot, you can use Apache Kafka 3.6.0 version and Graviton-based instances for Express brokers. You don’t need to preprovision storage for Express brokers.

You can also customize some of these configurations to further fine-tune the performance of your clusters according to your own preferences. To learn more, visit Express broker configurations in the Amazon MSK developer guide.

To create an MSK cluster in the AWS Command Line Interface (AWS CLI), use the create-cluster command.

aws kafka create-cluster \
    --cluster-name "channy-express-cluster" \
    --kafka-version "3.6.0" \
    --number-of-broker-nodes 3 \
    --broker-node-group-info file://brokernodegroupinfo.json

A JSON file named brokernodegroupinfo.json specifies the three subnets over which you want Amazon MSK to distribute the broker nodes.

{
    "InstanceType": "express.m7g.large",
    "BrokerAZDistribution": "DEFAULT",
    "ClientSubnets": [
        "subnet-0123456789111abcd",
        "subnet-0123456789222abcd",
        "subnet-0123456789333abcd"
    ]
}

Once the cluster is created, you can use the bootstrap connection string to connect your clients to the cluster endpoints.

With Express brokers, you can scale vertically (changing instance size) or horizontally (adding brokers). Vertical scaling doubles throughput without requiring partition reassignment. Horizontal scaling adds brokers in sets of three and and allows you to create more partitions, but it requires partition reassignment for new brokers to serve traffic.

A major benefit of Express brokers is that you can add brokers and rebalance partitions within minutes. On the other hand, rebalancing partitions after adding Standard brokers can take several hours. The graph below shows the time it took to rebalance partitions after adding 3 Express brokers to a cluster and reassigning 2000 partitions to each of the new brokers.

As you can see, it took approximately 10 minutes to reassign these partitions and utilize the additional capacity of the new brokers. When we ran the same experiment on an equivalent cluster comprising of Standard brokers, partition reassignment took over 24hours.

To learn more about the partition reassignment, visit Expanding your cluster in the Apache Kafka documentation.

Things to know
Here are some things you should know about Express brokers:

  • Data migration – You can migrate the data in your existing Kafka or MSK cluster to a cluster composed of Express brokers using Amazon MSK Replicator, which copies both the data and the metadata of your cluster to a new cluster.
  • Monitoring – You can monitor your cluster composed of Express brokers in the cluster and at the broker level with Amazon CloudWatch metrics and enable open monitoring with Prometheus to expose metrics using the JMX Exporter and the Node Exporter.
  • Security – Just like with other broker types, Amazon MSK integrates with AWS Key Management Service (AWS KMS) to offer transparent server-side encryption for the storage in Express brokers. When you create an MSK cluster with Express brokers, you can specify the AWS KMS key that you want Amazon MSK to use to encrypt your data at rest. If you don’t specify a KMS key, Amazon MSK creates an AWS managed key for you and uses it on your behalf.

Now available
The Express broker type is available today in the US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), and Europe (Ireland), and Europe (Stockholm) Regions.

You pay an hourly rate for Apache Kafka broker instance usage (billed at one-second resolution) for Express brokers, with varying fees depending on the size of the broker instance and active brokers in your MSK clusters. You also pay a per-GB rate for data written to an Express broker (billed at per-byte resolution). To learn more, visit the Amazon MSK pricing page.

Give Express brokers for Amazon MSK a try in the Amazon MSK console. For more information, visit the Amazon MSK Developer Guide and send feedback to AWS re:Post for Amazon MSK or through your usual AWS support contacts.

Channy

Write queries faster with Amazon Q generative SQL for Amazon Redshift

Post Syndicated from Raghu Kuppala original https://aws.amazon.com/blogs/big-data/write-queries-faster-with-amazon-q-generative-sql-for-amazon-redshift/

Amazon Redshift is a fully managed, AI-powered cloud data warehouse that delivers the best price-performance for your analytics workloads at any scale. Amazon Q generative SQL brings the capabilities of generative AI directly into the Amazon Redshift query editor. Amazon Q generative SQL for Amazon Redshift was launched in preview during AWS re:Invent 2023. With over 85,000 queries executed in preview, Amazon Redshift announced the general availability in September 2024.

Amazon Q generative SQL for Amazon Redshift uses generative AI to analyze user intent, query patterns, and schema metadata to identify common SQL query patterns directly within Amazon Redshift, accelerating the query authoring process for users and reducing the time required to derive actionable data insights. It provides a conversational interface where users can submit queries in natural language within the scope of their current data permissions. Generative SQL uses query history for better accuracy, and you can further improve accuracy through custom context, such as table descriptions, column descriptions, foreign key and primary key definitions, and sample queries. Custom context enhances the AI model’s understanding of your specific data model, business logic, and query patterns, allowing it to generate more relevant and accurate SQL recommendations. It enables you to get insights faster without extensive knowledge of your organization’s complex database schema and metadata.

Within this feature, user data is secure and private. Your data is not shared across accounts. Your queries, data and database schemas are not used to train a generative AI foundational model (FM). Your input is used as contextual prompts to the FM to answer only your queries.

In this post, we show you how to enable the Amazon Q generative SQL feature in the Redshift query editor and use the feature to get tailored SQL commands based on your natural language queries. With Amazon Q, you can spend less time worrying about the nuances of SQL syntax and optimizations, allowing you to concentrate your efforts on extracting invaluable business insights from your data.

Solution overview

At a high level, the feature works as follows:

  1. For generating the SQL code, you can write your query request in plain English within the conversational interface in the Redshift query editor.
  2. The query editor sends the query context to the underlying Amazon Q generative SQL platform, which uses generative AI to generate SQL code recommendations based on your Redshift metadata.
  3. You receive the generated SQL code suggestions within the same chat interface.

The following diagram illustrates this workflow.

Your content processed by generative SQL is not stored or used by AWS for service improvement.

Amazon Q generative SQL uses a large language model (LLM) and Amazon Bedrock to generate the SQL query. AWS uses different techniques, such as prompt engineering and Retrieval Augmented Generation (RAG), to query the model based on your context:

  • The database you’re connected to
  • The schema you’re working on
  • Your query history
  • Optionally, the query history of other users connected to the same endpoint

Amazon Q generative SQL is conversational, and you can ask it to refine a previously generated query.

In the following sections, we demonstrate how to enable the generative SQL feature in the Redshift query editor and use it to generate SQL queries using natural language.

Prerequisites

To get started, you need an Amazon Redshift Serverless endpoint or an Amazon Redshift provisioned cluster. For this post, we use Redshift Serverless. Refer to Easy analytics and cost-optimization with Amazon Redshift Serverless to get started.

Enable the Amazon Q generative SQL feature in the Redshift query editor

If you’re using the feature for the first time, you need to enable the Amazon Q generative SQL feature in the Redshift query editor.

To enable the feature, complete the following steps:

  1. On the Amazon Redshift console, open the Redshift Serverless dashboard.
  2. Choose Query data.

You can also choose Query Editor V2 in the navigation pane of the Amazon Redshift console.

When you open the Redshift query editor, you will see the new icon for Amazon Q next to the database dropdown menu on the top of the query editor console.

If you choose the Amazon Q icon, you will see the message “Amazon Redshift query editor V2 now supports generative SQL functionality. Contact your administrator to activate this feature in Settings.” If you’re not the administrator, you need to work with the account administrator to enable this feature.

  1. If you’re the administrator, choose the hyperlink in the message, or go to the settings icon and choose Generative SQL settings.
  2. In the Generative SQL settings section, select Q generative SQL, which will turn on Amazon Q generative SQL for all users of the account.

Amazon Q generative SQL is personalized to your database and, based on the updates or conversations you have had with the feature, will apply those learnings to other user conversations who connect to the same database with their own credentials. In the generative SQL settings, you can see the instructions to grant the sys:monitor role to a user or role.

  1. Choose Save.

You will receive a confirmation that the Amazon Q generative SQL settings have been successfully updated.

Load notebooks with sample TPC-DS data

The Redshift query editor comes with sample data and SQL notebooks that you can load into a sample database and corresponding schema. For this post, we use TPC-DS for a decision support benchmark.

We start by loading the TPC-DS data into the Redshift database. When you load this data, the schema tpcds is updated with sample data. We also use the provided notebooks with the tpcds schema to run queries to build a query history.

Complete the following steps:

  1. Connect to your Redshift Serverless workgroup or Redshift provisioned cluster.
  2. Navigate to the sample_data_dev database to view the sample databases available for running the generative SQL feature.
  3. Hover over the tpcds schema and choose Open sample notebooks.
  4. In the Create sample database pop-up message, choose Create.

In a few seconds, you will see the notification that the database sample_data_dev is created successfully and tpcds sample data is loaded successfully. Two sample notebooks for the schema are also generated.

  1. Choose Run all on each notebook tab.

This will take a few minutes to run and will establish a query history for the tpcds data.

This step is not mandatory for using the feature for your organization’s data warehouse.

Use Amazon Q to generate SQL queries from natural language

Now that the Amazon Q generative SQL feature is enabled and ready for use, open a new notebook and choose the Amazon Q icon to open a chat pane in the Redshift query editor.

Amazon Q generative SQL is personalized to your schema. It uses metadata from database schemas to improve the SQL query suggestions. Optionally, administrators can allow the use of the account’s query history to further improve the generated SQL. This can be enabled by running the following GRANT commands to provide access to your query history to other roles or users:

GRANT ROLE SYS:MONITOR to "IAMR:role-name";
GRANT ROLE SYS:MONITOR to "IAM:user-name";
GRANT ROLE SYS:MONITOR to "database-username";

This optional step allows users to make query monitoring history available to other users connected to the same database.

Let’s get started with some query examples.

  1. First, make sure you’re connected to sample_data_dev
  2. Let’s ask the query “What are the top 10 stores in sales in 1998?”

This generates a SQL query. Amazon Q generative SQL is also personalized to your data domain. You will notice that it joins to the Store table to retrieve store_name.

  1. Choose Add to notebook under the query to add the generated SQL.

Our query runs successfully and shows that the store able has the most sales.

  1. Amazon Q is personalized to your conversation. Suppose you want to know what the top selling item was for store able. You can ask this question “What was the unique identifier of the top selling item for the store ‘able’?”

The results show the top selling item. However, the query didn’t filter on the year.

  1. Let’s ask Amazon Q to give us the top selling item for store able in 1998. Instead of repeating the whole question again, you can simply ask “Can you filter by the year 1998?”

Now we have the top selling item for store able for 1998.

  1. To display the item description, you can ask the query “Can you modify the query to include its name and description?”

Amazon Q added the join to the item table and the query ran successfully.

Now that we have done some basic queries, let’s do some deeper analysis.

  1. Let’s ask Amazon Q “Can you give me aggregated store sales, for each county by quarter for all years?”

The answer is correct, but let’s ask a follow-up to include the state.

  1. Ask the follow-up question: “Can you include state?”

This answer looks good; you can also add an ORDER BY clause if you want the data sorted or ask Amazon Q to add that.

So far, we have only been looking at store_sales data. The TPC-DS data contains data for other sales channels, including web_sales and catalog_sales.

  1. Let’s ask Amazon Q “Can you give me the total sales for 1998, from different sales channels, using a union of the sales data from different channels?”

Let’s dive deeper into some other capabilities of Amazon Q generative SQL.

  1. Let’s try logging in with a different user and see how Amazon Q generative SQL interacts with that user. We have created User3 and granted the sys:monitor
  2. Logged in as User3, let’s ask the original question of “What are the top 10 stores in sales in 1998?”

Amazon Q generative SQL is able to use the query history and provide SQL recommendations for User3’s prompts because they have access to the system metadata provided through the role sys:monitor.

Safety features

Amazon Q generative SQL has built-in safety features to warn if a generated SQL statement will modify data and will only run based on user permissions. To test this, let’s ask Amazon Q to “delete data from web_sales table.”

Amazon Q gives a message “I detected that this query changes your database. Only run this SQL command if that is appropriate.”

Now, still logged in as User3, choose Run to try to delete the web_sales data.

As expected, User3 gets a permission denied error, because they don’t have the necessary privileges to delete the web_sales table.

Custom context

Custom context is a feature that allows you to provide domain-specific knowledge and preferences, giving you fine-grained control over the SQL generation process.

The custom context is defined in a JSON file, which can be uploaded by the query editor administrator or can be added directly in the Custom context section in Amazon Q generative SQL settings.

This JSON file contains information that helps Amazon Q generative SQL better understand the specific requirements and constraints of your domain, enabling it to generate more targeted and relevant SQL queries.

By providing a custom context, you can influence factors such as:

  • The terminology and vocabulary used in the generated SQL
  • The level of complexity and optimization of the SQL queries
  • The formatting and structure of the SQL statements
  • The data sources and tables that should be considered

The custom context feature empowers you to take a more active role in shaping the SQL generation process, leading to SQL queries that are better suited to your data and business requirements.

In this post, we use the BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) sample dataset, consisting of three tables. BIRD represents a pioneering, cross-domain dataset that examines the impact of extensive database contents on text-to-SQL parsing.

You can load the following BIRD sample dataset into your Redshift data warehouse to experiment with using custom contexts.

For this post, we demonstrate with three custom contexts.

TablesToInclude

TablesToInclude specifies a set of tables that are considered for SQL generation. This field is crucial when you want to limit the scope of SQL queries to a defined subset of available tables. It can help optimize the generation process by reducing unnecessary table references.

Let’s ask Amazon Q “List the distinct translated title and the set code of all cards translated into Spanish.”

This SQL unnecessarily uses the public.cards table. The public.set_translations table contains the data sufficient to answer the question.

We can add the following TablesToInclude custom context JSON:

{
  "resources": [
    {
      "ResourceId":"Serverless:Serverless-workgroup-name",
      "ResourceType": "REDSHIFT_WAREHOUSE",
      "TablesToInclude": [
        "bird.public.set_translations"
      ]
    }
  ]
}

After adding the custom context, the unwanted joins are eliminated and the correct SQL is generated.

ColumnAnnotations

ColumnAnnotations allows you to provide metadata or annotations specific to individual columns in your data tables. These annotations can offer valuable insights into the definitions and characteristics of the columns, which can be beneficial in guiding the SQL generation process.

Let’s ask Amazon Q to “Show me the unconverted mana cost and name for all the cards created by Rob Alexander.”

The generated SQL points to the column convertedmanacost, which doesn’t give a value for unconverted mana cost. The manacost column gives the unconverted mana cost.

Let’s add this using ColumnAnnotations in the custom context JSON:

{
  "resources": [
    {
      "ResourceId": "Serverless: Serverless-workgroup-name",
      "ResourceType": "REDSHIFT_WAREHOUSE",
      "ColumnAnnotations":
         {"bird.public.cards": { "manaCost": "manaCost is the unconverted mana"} }
    }
  ]
}

After the custom context is added, the correct SQL gets generated.

CuratedQueries

CuratedQueries provides a set of predefined question and answer pairs. In this set, the questions are written in natural language and the corresponding answers are the SQL queries that should be generated to address those questions.

These examples serve as a valuable reference point for Amazon Q generative SQL, helping it understand the types of queries it is expected to generate. You can guide Amazon Q generative SQL with the desired format, structure, and content of the SQL queries it should produce.

Let’s ask Amazon Q “List down the name of artists for cards in Chinese Simplified.”

Although the join key multiverseid exists, it is not correct.

Let’s add the following using CuratedQueries in the custom context JSON:

{
  "resources": [
    {
      "ResourceId": "Serverless: Serverless-workgroup-name",
      "ResourceType": "REDSHIFT_WAREHOUSE",
      "CuratedQueries": [
        {
          "Question": "List down the name of artists for cards in Spanish.",
          "Answer": "SELECT artist FROM public.cards c JOIN public.foreign_data f ON c.uuid = f.uuid WHERE f.language = 'Spanish';"
        }
      ]
    }
  ]
}

After the custom context is added, the correct SQL gets generated.

Additional features

In this section, we discuss the supporting features available with Amazon Q generative SQL feature for Redshift query editor:

Provide feedback

Amazon Q generative SQL allows you to provide feedback on the SQL queries it generates, helping improve the quality and relevance of the SQL over time. This feedback mechanism is accessible through the Amazon Q generative SQL interface, where you can indicate whether the generated SQL was helpful or not.

If you find the generated SQL to not be helpful, you can categorize the feedback into the following areas:

  • Incorrect Tables/Columns – This indicates that the SQL references the wrong tables or columns, or is missing essential tables or columns
  • Incorrect Predicates/Literals/Group By – This category covers issues with the SQL’s filter conditions, literal values, or grouping logic
  • Incorrect SQL Structure – This feedback suggests that the overall structure or syntax of the generated SQL is not correct
  • Other – This option allows you to provide feedback that doesn’t fit into the preceding categories

In addition to selecting the appropriate feedback category, you can also provide free text comments to elaborate on the specific issues or inaccuracies you found in the generated SQL. This additional information can be valuable for Amazon Q to better understand the problems and make improvements.

By actively providing this feedback, you play a crucial role in refining the generation capabilities of Amazon Q generative SQL. The feedback you provide helps the service learn from its mistakes, leading to more accurate and relevant SQL queries that better meet your needs over time.

This feedback loop is an important part of Amazon Q generative SQL’s continuous improvement, because it allows the service to adapt and evolve based on your specific requirements and use cases.

Regenerate SQL

The Regenerate SQL option will prompt Amazon Q to generate a new SQL query based on the same natural language prompt, using its learning and improvement capabilities to provide a potentially better-suited response.

Refresh database

By choosing Refresh database, you can instruct Amazon Q generative SQL to re-fetch and update the metadata information about the connected database.

This metadata includes:

  • Schema definitions – The structure and organization of your database schemas
  • Table definitions – The names, columns, and other properties of the tables in your database
  • Column definitions – The data types, names, and other characteristics of the columns within your database tables

Tips and techniques

To get more accurate SQL recommendations from Amazon Q generative SQL, keep in mind the following best practices:

  • Be as specific as possible. Instead of asking for total store sales, ask for total sales across all sales channels if that is what you need.
  • Add your schema to the path. For example:
    set search_path to tpcds;

  • Iterate when you have complex requests and verify the results. For example, ask which county has the most sales in 2000 and follow up with which item had the most sales.
  • Ask follow-up questions to make queries more specific.
  • If an incomplete response is generated, instead of rephrasing the entire request, provide specific instructions to Amazon Q as a continuation to the prior question.

Clean up

To avoid incurring future charges, delete the Redshift cluster you provisioned as part of this post.

Conclusion

Amazon Q generative SQL for Amazon Redshift simplifies query authoring and increases productivity by allowing you to express queries in natural language and receive SQL code recommendations. This post demonstrated how the Amazon Q generative SQL feature can accelerate data analysis by reducing the time required to write SQL queries. By using natural language processing and seamlessly converting it into SQL, you can boost productivity without requiring an in-depth understanding of your organization’s database structures. Importantly, the robust security measures of Amazon Redshift remain fully enforced, and the quality of the generated SQL continues to improve over time by enabling query history sharing across users.

Get started on your Amazon Q generative SQL journey with Amazon Redshift today by implementing the solution in this post or by referring to Interacting with Amazon Q generative SQL. For pricing information, refer to Amazon Q generative SQL pricing. Also, please try other Redshift generative AI features such as Amazon Redshift Integration with Amazon Bedrock and Amazon Redshift Serverless AI-driven scaling and optimization.


About the authors

Raghu Kuppala is an Analytics Specialist Solutions Architect experienced working in the databases, data warehousing, and analytics space. Outside of work, he enjoys trying different cuisines and spending time with his family and friends.

Sushmita Barthakur is a Senior Data Solutions Architect at Amazon Web Services (AWS), supporting Enterprise customers architect their data workloads on AWS. With a strong background in data analytics, she has extensive experience helping customers architect and build enterprise data lakes, ETL workloads, data warehouses and data analytics solutions, both on-premises and the cloud. Sushmita is based out of Tampa, FL and enjoys traveling, reading and playing tennis.

Xiao Qin is a senior applied scientist with the Learned Systems Group (LSG) at Amazon Web Services (AWS). He studies and applies machine learning techniques to solve data management problems. He is one of the developers that build the Amazon Q generative SQL capability.

Erol MurtezaogluErol Murtezaoglu, a Technical Product Manager at AWS, is an inquisitive and enthusiastic thinker with a drive for self-improvement and learning. He has a strong and proven technical background in software development and architecture, balanced with a drive to deliver commercially successful products. Erol highly values the process of understanding customer needs and problems, in order to deliver solutions that exceed expectations.

Phil Bates was a Senior Analytics Specialist Solutions Architect at AWS, before retiring, with over 25 years of data warehouse experience.