[$] Home Assistant: ten years of privacy-focused home automation

Post Syndicated from jake original https://lwn.net/Articles/947843/

Many home-automation devices come with their own mobile app or cloud
service. However, using multiple apps or services is
inconvenient, so it’s (purposely) tempting to only buy devices from the same
vendor, but this can lead to lock-in. One project that lets
users manage home-automation devices from various vendors without lock-in
is Home Assistant. Over its
ten-year existence, it has developed into a user-friendly home-automation
platform that caters to both technically inclined and less tech-savvy
people.

Security updates for Tuesday

Post Syndicated from corbet original https://lwn.net/Articles/948688/

Security updates have been issued by Debian (ceph and dbus), Fedora (cachelib, fb303, fbthrift, fizz, folly, matrix-synapse, mcrouter, mvfst, nats-server, nodejs18, proxygen, wangle, watchman, and wdt), Mageia (libcue), Oracle (18, grafana, kernel, nodejs, nodejs:16, nodejs:18, php, php:8.0, and tomcat), Red Hat (python27:2.7, python3, python39:3.9, python39-devel:3.9, toolbox, varnish, and varnish:6), SUSE (fwupdate, gcc13, icu73_2, netty, netty-tcnative, and xen), and Ubuntu (aom, ffmpeg, libvpx, libxpm, linux-aws, linux-gcp-5.4, php7.0, php7.2, ring, and sofia-sip).

Cache Rules go GA: precision control over every part of your cache

Post Syndicated from Alex Krivit original http://blog.cloudflare.com/cache-rules-go-ga/


Cache Rules go GA: precision control over every part of your cache

One year ago we introduced Cache Rules, a new way to customize cache settings on Cloudflare. Cache Rules provide greater flexibility for how users cache content, offering precise controls, a user-friendly API, and seamless Terraform integrations. Since it was released in late September 2022, over 100,000 websites have used Cache Rules to fine-tune their cache settings.

Today, we’re thrilled to announce that Cache Rules, along with several other Rules products, are generally available (GA). But that’s not all — we’re also introducing new configuration options for Cache Rules that provide even more options to customize how you cache on Cloudflare. These include functionality to define what resources are eligible for Cache Reserve, what timeout values should be respected when receiving data from your origin server, which custom ports we should use when we cache content, and whether we should bypass Cloudflare’s cache in the absence of a cache-control header.

Cache Rules give users full control and the ability to tailor their content delivery strategy for almost any use case, without needing to write code. As Cache Rules go GA, we are incredibly excited to see how fast customers can achieve their perfect cache strategy.

History of Customizing Cache on Cloudflare

The journey of cache customization on Cloudflare began more than a decade ago, right at the beginning of the company. From the outset, one of the most frequent requests from our customers involved simplifying their configurations. Customers wanted to easily implement precise cache policies, apply robust security measures, manipulate headers, set up redirects, and more for any page on their website. Using Cloudflare to set these controls was especially crucial for customers utilizing origin servers that only provided convoluted configuration options to add headers or policies to responses, which could later be applied downstream by CDNs.

In response to this demand, we introduced Page Rules, a product that has since witnessed remarkable growth in both its popularity and functionality. Page Rules became the preferred choice for customers seeking granular control over how Cloudflare caches their content. Currently, there are over 5 million active cache-related Page Rules, assisting websites in tailoring their content delivery strategies.

However, behind the scenes, Page Rules encountered a scalability issue.

Whenever a Page Rule is encountered by Cloudflare we must transform all rule conditions for a customer into a single regex pattern. This pattern is then applied to requests for the website to achieve the desired cache configuration. When thinking about how all the regexes from all customers are then compared against tens of millions of requests per second, spanning across more than 300 data centers worldwide, it’s easy to see that the computational demands for applying Page Rules can be immense. This pressure is directly tied to the number of rules we could offer our users. For example, Page Rules would only allow for 125 rules to be deployed on a given website.

To address this challenge, we rebuilt all the Page Rule functionality on the new Rulesets Engine. Not only do ruleset engine-based products give users more rules to play with, they also offer greater flexibility on when these rules should run. Part of the magic of the Rulesets engine is that rather than combine all of a page’s rules into a single regular expression, rule logic can be evaluated on a conditional basis. For example, if subdomain A and B have different caching policies, a request from subdomain A can be evaluated using regex logic specific to A (while omitting any logic that applies to B). This yields meaningful benefits to performance, and reduces the computational demands of applying Page Rules across Cloudflare’s network.

Over the past year, Cache Rules, along with Origin Rules, Configuration Rules, and Single Redirect Rules, have been in beta. Thanks to the invaluable support of our early adopters, we have successfully fine-tuned our product, reaching a stage where it is ready to transition from beta to GA. These products can now accomplish everything that Page Rules could and more. This also marks the beginning of the EOL process for Page Rules. In the coming months we will announce timelines and information regarding how customers will replace their Page Rules with specific Rules products. We will automate this as much as possible and provide simple steps to ensure a smooth transition away from Page Rules for everyone.

How to use Cache Rules and What’s New

Those that have used Cache Rules know that they are intuitive and work similarly to our other ruleset engine products. User-defined criteria like URLs or request headers are evaluated, and if matching a specified value, the Cloudflare caching configuration is obeyed. Each Cache Rule depends on fields, operators, and values. For all the different options available, you should see our Cache Rules documentation.

Below are two examples of how to deploy different strategies to customize your cache. These examples only show the tip-of-the-iceberg of what’s possible with Cache Rules, so we encourage you to try them out and let us know what you think.

Example: Cached content is updated at a regular cadence

As an example, let’s say that Acme Corp wants to update their caching strategy. They want to customize their cache to take advantage of certain request headers and use the presence of those request headers to be the criteria that decides when to apply different cache rules. The first thing they’d need to decide is what information should be used to trigger the specific rule. This is defined in the expression.

Once the triggering criteria is defined Acme Corp should next determine how they want to customize their cache.

Content changing quickly

The most common cache strategy is to update the Edge Cache TTL. If Acme Corp thinks a particular piece of content on their website might change quickly, they can alter the time Cloudflare should consider a resource eligible to be served from cache to be shorter. This way Cloudflare would go back to the origin more frequently to revalidate and update the content. The Edge Cache TTL section is also where Acme Corp can define a resource’s TTL based on the status code Cloudflare gets back from their origin, and what Cloudflare should cache if there aren’t any cache-control instructions sent from Acme’s origin server.

Content changing slowly

On the other hand, if Acme Corp had a lot of content that did not change very frequently (like a favicon or logo) and they preferred to serve that from Cloudflare’s cache instead of their origin, they can define which content should be eligible for Cache Reserve with a new Cache Rule. Cache Reserve reduces egress fees by storing assets persistently in Cloudflare’s cache for an extended period of time.

Traditionally when a user would enable Cache Reserve, their entire zone would be eligible to be written to Cache Reserve. For customers that care about saving origin egress fees on all resources on their website, this is still the best path forward. But for customers that want to have additional control over precisely what assets should be part of their Cache Reserve or even what size of assets should be eligible, the Cache Reserve Eligibility Rule provides additional knobs so that customers can precisely increase their cache hits and reduce origin egress in a customized manner. Note that this rule requires a Cache Reserve subscription.

Example: Origin is slow

Let’s consider a hypothetical example. Recently, Acme Corp has been seeing an increase in errors in their Cloudflare logs. These errors are related to a new report that Acme is providing its users based on Acme’s proprietary data. This report requires that their origin access several databases, perform some calculations and generate the report based on these calculations. The origin generating this report needs to wait to respond until all of this background work is completed. Acme’s report is a success, generating an influx of traffic from visitors wanting to see it. But their origin is struggling to keep up. A lot of the errors they are seeing are 524s which correlate to Cloudflare not seeing an origin response before a timeout occurred.

Acme has plans to improve this by scaling their origin infrastructure but it’s taking a long time to deploy. In the meantime, they can turn to Cache Rules to configure a timeout to be longer. Historically the timeout value between Cloudflare and two successive origin reads was 100 seconds, which meant that if an origin didn’t successfully send a response for a period lasting longer than 100 seconds, it could lead to a 524 error. By using a Cache Rule to extend this timeout, Acme Corp can rely more heavily on Cloudflare’s cache.

The above cache strategies focus on how often a resource is changed on an origin, and the origin’s performance. But there are numerous other rules that allow for other strategies, like custom cache keys which allow for customers to determine how their cache should be defined on Cloudflare, respecting strong ETags which help customers determine when Cloudflare should revalidate particular cached assets, and custom ports which allow for customers to define non-standard ports that Cloudflare should use when making caching decisions about content.

The full list of Cache Rules can be found here.

Try Cache Rules today!

We will continue to build and release additional rules that provide powerful, easy to enable control for anyone using Cloudflare’s cache. If you have feature requests for additional Cache Rules, please let us know in the Cloudflare Community.

Go to the dashboard and try Cache Rules out today!

EPA Won’t Force Water Utilities to Audit Their Cybersecurity

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/10/epa-wont-force-water-utilities-to-audit-their-cybersecurity.html

The industry pushed back:

Despite the EPA’s willingness to provide training and technical support to help states and public water system organizations implement cybersecurity surveys, the move garnered opposition from both GOP state attorneys and trade groups.

Republican state attorneys that were against the new proposed policies said that the call for new inspections could overwhelm state regulators. The attorney generals of Arkansas, Iowa and Missouri all sued the EPA—claiming the agency had no authority to set these requirements. This led to the EPA’s proposal being temporarily blocked back in June.

So now we have a piece of our critical infrastructure with substandard cybersecurity. This seems like a really bad outcome.

Welcome, new partners: Growing the global impact of Code Club and CoderDojo

Post Syndicated from Ellie Proffitt original https://www.raspberrypi.org/blog/global-clubs-partners-code-club-coderdojo-new-partners-2023/

Increasing access to computing education is a global challenge, and at the Raspberry Pi Foundation we take a global approach in addressing it. One way we do this is to partner with organisations around the world and support them to introduce Code Clubs and CoderDojos in their local or national communities.

Students in a Code Club run by CSEd Botswana.

Code Club and CoderDojo are the two global networks of free, volunteer-led coding clubs for young people that we support. They are a great fit for a lot of organisations that share our vision and values and work with young people from backgrounds that are currently under-represented in computing. Right now, our Global Clubs Partner network involves more than 50 organisations in over 40 different countries around the world. Seven new partners have joined us since August.

New members in the Global Clubs Partner network

We send a warm welcome to our seven new partners. Here is some of what they are working on:

  • CSEd Botswana is training 25 teachers in rural areas to run Code Clubs in their schools
  • Hacedores in Mexico is working towards establishing CoderDojos in their 80 makerspaces, and Code Clubs in the local schools of their community members.
  • Code Club Luxembourg is already running several clubs and also hosts a number of workshops each year to encourage children to carry on their coding journey by joining a Code Club or CoderDojo.
  • Light Into Europe works with the Deaf community in Romania. They plan to open up coding to children with hearing impairments through accessible Code Clubs, supported by interpreters and adults who are also deaf.
  • KIT Hub in Burundi have plans to establish CoderDojos to support children from underserved areas, including a sizable community of Congolese young people living in refugee camps in Burundi.
  • Orientations Training Centre in Sudan will be setting up clubs in Khartoum and Darfur, and they are planning a special passion for supporting young people to submit entries to the Coolest Projects online showcase in 2024.
  • Savanna Developer Network will establish CoderDojos in northern Ghana to narrow the income and infrastructure gap between the north and the south by ensuring that children in the north aren’t left behind in computing education. 

We are really excited that these organisations have chosen to join the Global Clubs Partner network.

Benefits of partnering with us

When they join our Global Clubs Partner network, organisations work with us to grow the Code Club and CoderDojo communities around the world. Our Global Clubs Partners share our mission to enable young people to realise their full potential through the power of computing and digital technologies, and they commit to working towards this mission with our support.

For many partners and the educators and volunteers they work with, running Code Clubs and CoderDojos is an opportunity to learn to code alongside the young people. We give partners tailored support for their work through our free, high-quality resources, including online training, community events, and easy-to-follow coding projects.

Our new partners are as glad as we are to have joined our network.

Abdelmoneim Mohammed of Orientations Training Centre in Sudan is excited by the impact Code Club will have on his young coders, telling us:  

I expect this can help to make our citizens a global citizen, [by] learning from a well-established and developed educational system.

For Ethel Tshukudu of CSEd Botswana, it is the community focus and available support network that is important. She tells us:

The strong sense of community and the availability of mentorship opportunities are particularly appealing, as they ensure that CSEdBotswana can consistently access the support needed to enhance our coding clubs and create a more significant impact. 

Our partner from KIT Hub in Burundi, Ferdinand Alimasi, values how establishing clubs promotes collective learning and engagement in the community. He says:

Education and preparation of [the] future workforce require collective work and responsibilities, so these clubs will bring the change in communities by offering opportunities to learn for kids and teens, as well as opportunities for everyone to be involved in building a better future for all.

What we learn from our partnerships

Our partners work in lots of different circumstances all around the world. A key learning for us is that there is no ‘one size fits all’ approach to computing education. We support our partners to adapt and deliver our resources in a way that they know will best engage their learners. This highlights how important it is to work in a culturally sensitive way, and to prioritise providing opportunities for learners to use digital technology to make things that matter to them. That looks very different depending on where you are in the world, and who you are working with.

A child at a laptop in a classroom in rural Kenya.

Through working with our partners, we also see just how much world events can impact the already unequal access young people have to learning new digital skills. Climate crisis events such as floods and wildfires, and political crises such as war, conflict, and changes in government have affected many of our global partners this year. The resulting closures of schools and other educational venues, electricity blackouts, and funding challenges cause further educational disadvantage to the children in the affected areas. Our partners play a key role in providing additional educational opportunities for young people when it is safe to do so.

Three teenage girls at a laptop
Young tech creators at a Code Club in Brazil.

The experiences and perspectives we’ve gained through our partnerships with global organisations are extremely important to us and our mission. They help to inform the work we do to make computing education truly accessible for all learners and educators around the globe.  

Could your organisation become a Global Clubs Partner?

You can find out more about how your organisation could join our Global Clubs Partner network on the CoderDojo and Code Club websites, or contact us directly with your questions or ideas about a partnership.

The post Welcome, new partners: Growing the global impact of Code Club and CoderDojo appeared first on Raspberry Pi Foundation.

2023 Linux Foundation TAB election call for nominees

Post Syndicated from corbet original https://lwn.net/Articles/948589/

The 2033 election for members of the Linux Foundation Technical Advisory
Board will be held during the upcoming Linux
Plumbers Conference
. The call
for nominees
has been posted.

The TAB exists to provide advice from the kernel community to the
Linux Foundation; it also serves to facilitate interactions both
within the community and with outside entities. Over the last
year, the TAB has overseen the organization of the Linux Plumbers
Conference, released a kernel contribution maturity model for
organizations, advised on code-of-conduct issues, and more.

Nominations should be sent in by November 13.

2013 Linux Foundation TAB election call for nominees

Post Syndicated from corbet original https://lwn.net/Articles/948589/

The 2013 election for members of the Linux Foundation Technical Advisory
Board will be held during the upcoming Linux
Plumbers Conference
. The call
for nominees
has been posted.

The TAB exists to provide advice from the kernel community to the
Linux Foundation; it also serves to facilitate interactions both
within the community and with outside entities. Over the last
year, the TAB has overseen the organization of the Linux Plumbers
Conference, released a kernel contribution maturity model for
organizations, advised on code-of-conduct issues, and more.

Nominations should be sent in by November 13.

Какво чака Украйна тази зима

Post Syndicated from Григор original http://www.gatchev.info/blog/?p=2606

Накратко – още от същото. С подобен резултат.

Русия отново ще бомбардира гражданската ѝ инфраструктура – особено електроснабдяването, водоснабдяването и снабдяването с горива. С надеждата студът и мракът да пречупят волята на украинците.

Миналата година тя направи същото. Доста успешно – ключови трафопостове и помпени станции бяха взривявани, често многократно. Но всеки път украинските ремонтни екипи работеха като бесни, в тъмнината и студа, рискувайки живота си при нова руска бомбардировка. И възстановяваха повреденото, въпреки недостига на части и материали. А засегнатото население се справяше с много взаимопомощ и търпение, стискаше зъби и намразваше руския нацизъм още повече.

Оттогава нещата се промениха. Украйна се снабди с огромен брой най-разнокалибрени генератори, способни да захранват болници, училища, детски градини, домове и ключови обекти, ако токът спре. (За което и България помогна, за моя гордост.) Натрупа запаси от резервни трансформатори и водни помпи и от резервни части за тях. Обучи още специалисти по ремонт на поразени цивилни инсталации. Вече има и как да покрие нуждите си при руска бомбардировка, и как да се възстанови след нея.

А и пряката защита напредва. Киев бе поставен под защитата на две батареи „Пейтриът“, които свалят на практика 100% от изстреляните по него руски ракети и ирански дронове. (Весело е да се чете руската пропаганда, съгласно която тези батареи са унищожени поне по десетина пъти.) Подобна защита се обсъжда и за други големи украински градове. Е, за военните обекти надали ще стигне скоро, но Украйна знае – цивилните са по-важни. Демокрациите имат такива ценности.

Така че тази година успехите на Русия в тормоза на цивилното украинско население ще са по-малки. А омразата му към нея ще продължи да се натрупва. Путин го знае и цели точно това – след него Русия и Украйна да са врагове завинаги.

Защо ли? Путин обича да повтаря, че на него не му е нужен свят, в който я няма Русия. („Няма я“ в смисъл, че не тъпче този свят с железен ботуш.) Но истината е друга.

Че на него не му е нужна Русия, в която го няма Путин. В смисъл, начело на нея.

Run Spark SQL on Amazon Athena Spark

Post Syndicated from Pathik Shah original https://aws.amazon.com/blogs/big-data/run-spark-sql-on-amazon-athena-spark/

At AWS re:Invent 2022, Amazon Athena launched support for Apache Spark. With this launch, Amazon Athena supports two open-source query engines: Apache Spark and Trino. Athena Spark allows you to build Apache Spark applications using a simplified notebook experience on the Athena console or through Athena APIs. Athena Spark notebooks support PySpark and notebook magics to allow you to work with Spark SQL. For interactive applications, Athena Spark allows you to spend less time waiting and be more productive, with application startup time in under a second. And because Athena is serverless and fully managed, you can run your workloads without worrying about the underlying infrastructure.

Modern applications store massive amounts of data on Amazon Simple Storage Service (Amazon S3) data lakes, providing cost-effective and highly durable storage, and allowing you to run analytics and machine learning (ML) from your data lake to generate insights on your data. Before you run these workloads, most customers run SQL queries to interactively extract, filter, join, and aggregate data into a shape that can be used for decision-making, model training, or inference. Running SQL on data lakes is fast, and Athena provides an optimized, Trino- and Presto-compatible API that includes a powerful optimizer. In addition, organizations across multiple industries such as financial services, healthcare, and retail are adopting Apache Spark, a popular open-source, distributed processing system that is optimized for fast analytics and advanced transformations against data of any size. With support in Athena for Apache Spark, you can use both Spark SQL and PySpark in a single notebook to generate application insights or build models. Start with Spark SQL to extract, filter, and project attributes that you want to work with. Then to perform more complex data analysis such as regression tests and time series forecasting, you can use Apache Spark with Python, which allows you to take advantage of a rich ecosystem of libraries, including data visualization in Matplot, Seaborn, and Plotly.

In this first post of a three-part series, we show you how to get started using Spark SQL in Athena notebooks. We demonstrate querying databases and tables in the Amazon S3 and the AWS Glue Data Catalog using Spark SQL in Athena. We cover some common and advanced SQL commands used in Spark SQL, and show you how to use Python to extend your functionality with user-defined functions (UDFs) as well as to visualize queried data. In the next post, we’ll show you how to use Athena Spark with open-source transactional table formats. In the third post, we’ll cover analyzing data sources other than Amazon S3 using Athena Spark.

Prerequisites

To get started, you will need the following:

Provide Athena Spark access to your data through an IAM role

As you proceed through this walkthrough, we create new databases and tables. By default, Athena Spark doesn’t have permission to do this. To provide this access, you can add the following inline policy to the AWS Identity and Access Management (IAM) role attached to the workgroup, providing the region and your account number. For more information, refer to the section To embed an inline policy for a user or role (console) in Adding IAM identity permissions (console).

{
  "Version": "2012-10-17",
  "Statement": [
      {
          "Sid": "ReadfromPublicS3",
          "Effect": "Allow",
          "Action": [
              "s3:GetObject",
              "s3:ListBucket"
          ],
          "Resource": [
              "arn:aws:s3:::athena-examples-us-east-1/*",
              "arn:aws:s3:::athena-examples-us-east-1"
          ]
      },
      {
            "Sid": "GlueReadDatabases",
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabases"
            ],
            "Resource": "arn:aws:glue:<region>:<account-id>:*"
        },
        {
            "Sid": "GlueReadDatabase",
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabase",
                "glue:GetTable",
                "glue:GetTables",
                "glue:GetPartition",
                "glue:GetPartitions"
            ],
            "Resource": [
                "arn:aws:glue:<region>:<account-id>:catalog",
                "arn:aws:glue:<region>:<account-id>:database/sparkblogdb",
                "arn:aws:glue:<region>:<account-id>:table/sparkblogdb/*",
                "arn:aws:glue:<region>:<account-id>:database/default"
            ]
        },
        {
            "Sid": "GlueCreateDatabase",
            "Effect": "Allow",
            "Action": [
                "glue:CreateDatabase"
            ],
            "Resource": [
                "arn:aws:glue:<region>:<account-id>:catalog",
                "arn:aws:glue:<region>:<account-id>:database/sparkblogdb"
            ]
        },
        {
            "Sid": "GlueDeleteDatabase",
            "Effect": "Allow",
            "Action": "glue:DeleteDatabase",
            "Resource": [
                "arn:aws:glue:<region>:<account-id>:catalog",
                "arn:aws:glue:<region>:<account-id>:database/sparkblogdb",
                "arn:aws:glue:<region>:<account-id>:table/sparkblogdb/*"            ]
        },
        {
            "Sid": "GlueCreateDeleteTablePartitions",
            "Effect": "Allow",
            "Action": [
                "glue:CreateTable",
                "glue:UpdateTable",
                "glue:DeleteTable",
                "glue:BatchCreatePartition",
                "glue:CreatePartition",
                "glue:DeletePartition",
                "glue:BatchDeletePartition",
                "glue:UpdatePartition",
                "glue:GetPartition",
                "glue:GetPartitions",
                "glue:BatchGetPartition"
            ],
            "Resource": [
                "arn:aws:glue:<region>:<account-id>:catalog",
                "arn:aws:glue:<region>:<account-id>:database/sparkblogdb",
                "arn:aws:glue:<region>:<account-id>:table/sparkblogdb/*"
            ]
        }
  ]
}

Run SQL queries directly in notebook without using Python

When using Athena Spark notebooks, we can run SQL queries directly without having to use PySpark. We do this by using cell magics, which are special headers in a notebook that change the cells’ behavior. For SQL, we can add the %%sql magic, which will interpret the entire cell contents as a SQL statement to be run on Athena Spark.

Now that we have our workgroup and notebook created, let’s start exploring the NOAA Global Surface Summary of Day dataset, which provides environmental measures from various locations all over the earth. The datasets used in this post are public datasets hosted in the following Amazon S3 locations:

  • Parquet data for year 2020s3://athena-examples-us-east-1/athenasparkblog/noaa-gsod-pds/parquet/2020/
  • Parquet data for year 2021 s3://athena-examples-us-east-1/athenasparksqlblog/noaa_pq/year=2021/
  • Parquet data from year 2022s3://athena-examples-us-east-1/athenasparksqlblog/noaa_pq/year=2022/

To use this data, we need an AWS Glue Data Catalog database that acts as the metastore for Athena, allowing us to create external tables that point to the location of datasets in Amazon S3. First, we create a database in the Data Catalog using Athena and Spark.

Create a database

Run following SQL in your notebook using %%sql magic:

%%sql 
CREATE DATABASE sparkblogdb

You get the following output:
Output of CREATE DATABASE SQL

Create a table

Now that we have created a database in the Data Catalog, we can create a partitioned table that points to our dataset stored in Amazon S3:

%%sql
CREATE EXTERNAL TABLE sparkblogdb.noaa_pq(
  station string, 
  date string, 
  latitude string, 
  longitude string, 
  elevation string, 
  name string, 
  temp string, 
  temp_attributes string, 
  dewp string, 
  dewp_attributes string, 
  slp string, 
  slp_attributes string, 
  stp string, 
  stp_attributes string, 
  visib string, 
  visib_attributes string, 
  wdsp string, 
  wdsp_attributes string, 
  mxspd string, 
  gust string, 
  max string, 
  max_attributes string, 
  min string, 
  min_attributes string, 
  prcp string, 
  prcp_attributes string, 
  sndp string, 
  frshtt string)
  PARTITIONED BY (year string)
STORED AS PARQUET
LOCATION 's3://athena-examples-us-east-1/athenasparksqlblog/noaa_pq/'

This dataset is partitioned by year, meaning that we store data files for each year separately, which simplifies management and improves performance because we can target the specific S3 locations in a query. The Data Catalog knows about the table, and now we’ll let it work out how many partitions we have automatically by using the MSCK utility:

%%sql
MSCK REPAIR TABLE sparkblogdb.noaa_pq

When the preceding statement is complete, you can run the following command to list the yearly partitions that were found in the table:

%%sql
SHOW PARTITIONS sparkblogdb.noaa_pq

Output of SHOW PARTITIONS SQL

Now that we have the table created and partitions added, let’s run a query to find the minimum recorded temperature for the 'SEATTLE TACOMA AIRPORT, WA US' location:

%%sql
select year, min(MIN) as minimum_temperature 
from sparkblogdb.noaa_pq 
where name = 'SEATTLE TACOMA AIRPORT, WA US' 
group by 1

You get the following output:

The image shows output of previous SQL statement.

Query a cross-account Data Catalog from Athena Spark

Athena supports accessing cross-account AWS Glue Data Catalogs, which enables you to use Spark SQL in Athena Spark to query a Data Catalog in an authorized AWS account.

The cross-account Data Catalog access pattern is often used in a data mesh architecture, when a data producer wants to share a catalog and data with consumer accounts. The consumer accounts can then perform data analysis and explorations on the shared data. This is a simplified model where we don’t need to use AWS Lake Formation data sharing. The following diagram gives an overview of how the setup works between one producer and one consumer account, which can be extended to multiple producer and consumer accounts.

The image gives an overview of how the setup works between one producer and one consumer account, which can be extended to multiple producer and consumer accounts.

You need to set up the right access policies on the Data Catalog of the producer account to enable cross-account access. Specifically, you must make sure the consumer account’s IAM role used to run Spark calculations on Athena has access to the cross-account Data Catalog and data in Amazon S3. For setup instructions, refer to Configuring cross-account AWS Glue access in Athena for Spark.

There are two ways the consumer account can access the cross-account Data Catalog from Athena Spark, depending on whether you are querying from one producer account or multiple.

Query a single producer table

If you are just querying data from a single producer’s AWS account, you can tell Athena Spark to only use that account’s catalog to resolve database objects. When using this option, you don’t have to modify the SQL because you’re configuring the AWS account ID at session level. To enable this method, edit the session and set the property "spark.hadoop.hive.metastore.glue.catalogid": "999999999999" using the following steps:

  1. In the notebook editor, on the Session menu, choose Edit session.
    Image shows wherre to click to edit session
  2. Choose Edit in JSON.
  3. Add the following property and choose Save:
    {"spark.hadoop.hive.metastore.glue.catalogid": "999999999999"}The image shows where to put JSON config property to query single producerThis will start a new session with the updated parameters.
  4. Run the following SQL statement in Spark to query tables from the producer account’s catalog:
    %%sql
    SELECT * 
    FROM <central-catalog-db>.<table> 
    LIMIT 10

Query multiple producer tables

Alternatively, you can add the producer AWS account ID in each database name, which is helpful if you’re going to query Data Catalogs from different owners. To enable this method, set the property {"spark.hadoop.aws.glue.catalog.separator": "/"} when invoking or editing the session (using the same steps as the previous section). Then, you add the AWS account ID for the source Data Catalog as part of the database name:

%%sql
SELECT * 
FROM `<producer-account1-id>/database1`.table1 t1 
join `<producer-account2-id>/database2`.table2 t2 
ON t1.id = t2.id
limit 10

If the S3 bucket belonging to the producer AWS account is configured with Requester Pays enabled, the consumer is charged instead of the bucket owner for requests and downloads. In this case, you can add the following property when invoking or editing an Athena Spark session to read data from these buckets:

{"spark.hadoop.fs.s3.useRequesterPaysHeader": "true"}

Infer the schema of your data in Amazon S3 and join with tables crawled in the Data Catalog

Rather than only being able to go through the Data Catalog to understand the table structure, Spark can infer schema and read data directly from storage. This feature allows data analysts and data scientists to perform a quick exploration of the data without needing to create a database or table, but which can also be used with other existing tables stored in the Data Catalog in the same or across different accounts. To do this, we use a Spark temp view, which is an in-memory data structure that stores the schema of data stored in a data frame.

Using the NOAA dataset partition for 2020, we create a temporary view by reading S3 data into a data frame:

year_20_pq = spark.read.parquet(f"s3://athena-examples-us-east-1/athenasparkblog/noaa-gsod-pds/parquet/2020/")
year_20_pq.createOrReplaceTempView("y20view")

Now you can query the y20view using Spark SQL as if it were a Data Catalog database:

%%sql
select count(*) 
from y20view

Output of previous SQL query showing count value

You can query data from both temporary views and Data Catalog tables in the same query in Spark. For example, now that we have a table containing data for years 2021 and 2022, and a temporary view with 2020’s data, we can find the dates in each year when the maximum temperature was recorded for 'SEATTLE TACOMA AIRPORT, WA US'.

To do this, we can use the window function and UNION:

%%sql
SELECT date,
       max as maximum_temperature
FROM (
        SELECT date,
            max,
            RANK() OVER (
                PARTITION BY year
                ORDER BY max DESC
            ) rnk
        FROM sparkblogdb.noaa_pq
        WHERE name = 'SEATTLE TACOMA AIRPORT, WA US'
          AND year IN ('2021', '2022')
        UNION ALL
        SELECT date,
            max,
            RANK() OVER (
                ORDER BY max DESC
            ) rnk
        FROM y20view
        WHERE name = 'SEATTLE TACOMA AIRPORT, WA US'
    ) t
WHERE rnk = 1
ORDER by 1

You get the following output:

Output of previous SQL

Extend your SQL with a UDF in Spark SQL

You can extend your SQL functionality by registering and using a custom user-defined function in Athena Spark. These UDFs are used in addition to the common predefined functions available in Spark SQL, and once created, can be reused many times within a given session.

In this section, we walk through a straightforward UDF that converts a numeric month value into the full month name. You have the option to write the UDF in either Java or Python.

Java-based UDF

The Java code for the UDF can be found in the GitHub repository. For this post, we have uploaded a prebuilt JAR of the UDF to s3://athena-examples-us-east-1/athenasparksqlblog/udf/month_number_to_name.jar.

To register the UDF, we use Spark SQL to create a temporary function:

%%sql
CREATE OR REPLACE TEMPORARY FUNCTION 
month_number_to_name as 'com.example.MonthNumbertoNameUDF'
using jar "s3a://athena-examples-us-east-1/athenasparksqlblog/udf/month_number_to_name.jar";

Now that the UDF is registered, we can call it in a query to find the minimum recorded temperature for each month of 2022:

%%sql
select month_number_to_name(month(to_date(date,'yyyy-MM-dd'))) as month_yr_21,
min(min) as min_temp
from sparkblogdb.noaa_pq 
where NAME == 'SEATTLE TACOMA AIRPORT, WA US' 
group by 1 
order by 2

You get the following output:

Output of SQL using UDF

Python-based UDF

Now let’s see how to add a Python UDF to the existing Spark session. The Python code for the UDF can be found in the GitHub repository. For this post, the code has been uploaded to s3://athena-examples-us-east-1/athenasparksqlblog/udf/month_number_to_name.py.

Python UDFs can’t be registered in Spark SQL, so instead we use a small bit of PySpark code to add the Python file, import the function, and then register it as a UDF:

sc.addPyFile('s3://athena-examples-us-east-1/athenasparksqlblog/udf/month_number_to_name.py')

from month_number_to_name import month_number_to_name
spark.udf.register("month_number_to_name_py",month_number_to_name)

Now that the Python-based UDF is registered, we can use the same query from earlier to find the minimum recorded temperature for each month of 2022. The fact that it’s Python rather than Java doesn’t matter now:

%%sql
select month_number_to_name_py(month(to_date(date,'yyyy-MM-dd'))) as month_yr_21,
min(min) as min_temp
from sparkblogdb.noaa_pq 
where NAME == 'SEATTLE TACOMA AIRPORT, WA US' 
group by 1 
order by 2

The output should be similar to that in the preceding section.

Plot visuals from the SQL queries

It’s straightforward to use Spark SQL, including across AWS accounts for data exploration, and not complicated to extend Athena Spark with UDFs. Now let’s see how we can go beyond SQL using Python to visualize data within the same Spark session to look for patterns in the data. We use the table and temporary views created previously to generate a pie chart that shows percentage of readings taken in each year for the station 'SEATTLE TACOMA AIRPORT, WA US'.

Let’s start by creating a Spark data frame from a SQL query and converting it to a pandas data frame:

#we will use spark.sql instead of %%sql magic to enclose the query string
#this will allow us to read the results of the query into a dataframe to use with our plot command
sqlDF = spark.sql("select year, count(*) as cnt from sparkblogdb.noaa_pq where name = 'SEATTLE TACOMA AIRPORT, WA US' group by 1 \
                  union all \
                  select 2020 as year, count(*) as cnt from y20view where name = 'SEATTLE TACOMA AIRPORT, WA US'")

#convert to pandas data frame
seatac_year_counts=sqlDF.toPandas()

Next, the following code uses the pandas data frame and Matplot library to plot a pie chart:

import matplotlib.pyplot as plt

# clear the state of the visualization figure
plt.clf()

# create a pie chart with values from the 'cnt' field, and yearly labels
plt.pie(seatac_year_counts.cnt, labels=seatac_year_counts.year, autopct='%1.1f%%')
%matplot plt

The following figure shows our output.

Output of code showing pie chart

Clean up

To clean up the resources created for this post, complete the following steps:

  1. Run the following SQL statements in the notebook’s cell to delete the database and tables from the Data Catalog:
    %%sql
    DROP TABLE sparkblogdb.noaa_pq
    
    %%sql
    DROP DATABASE sparkblogdb

  2. Delete the workgroup created for this post. This will also delete saved notebooks that are part of the workgroup.
  3. Delete the S3 bucket that you created as part of the workgroup.

Conclusion

Athena Spark makes it easier than ever to query databases and tables in the AWS Glue Data Catalog directly through Spark SQL in Athena, and to query data directly from Amazon S3 without needing a metastore for quick data exploration. It also makes it straightforward to use common and advanced SQL commands used in Spark SQL, including registering UDFs for custom functionality. Additionally, Athena Spark makes it effortless to use Python in a fast start notebook environment to visualize and analyze data queried via Spark SQL.

Overall, Spark SQL unlocks the ability to go beyond standard SQL in Athena, providing advanced users more flexibility and power through both SQL and Python in a single integrated notebook, and providing fast, complex analysis of data in Amazon S3 without infrastructure setup. To learn more about Athena Spark, refer to Amazon Athena for Apache Spark.


About the Authors

Pathik Shah is a Sr. Analytics Architect on Amazon Athena. He joined AWS in 2015 and has been focusing in the big data analytics space since then, helping customers build scalable and robust solutions using AWS analytics services.

Raj Devnath is a Product Manager at AWS on Amazon Athena. He is passionate about building products customers love and helping customers extract value from their data. His background is in delivering solutions for multiple end markets, such as finance, retail, smart buildings, home automation, and data communication systems.

Updated Essential Eight guidance for Australian customers

Post Syndicated from James Kingsmill original https://aws.amazon.com/blogs/security/updated-essential-eight-guidance-for-australian-customers/

Amazon Web Services (AWS) is excited to announce the release of AWS Prescriptive Guidance on Reaching Essential Eight Maturity on AWS. We designed this guidance to help customers streamline and accelerate their security compliance obligations under the Essential Eight framework of the Australian Cyber Security Centre (ACSC).

What is the Essential Eight?

The Essential Eight is a security framework that the ACSC designed to help organizations protect themselves against various cyber threats. The Essential Eight covers the following eight strategies:

  • Application control
  • Patch applications
  • Configure Microsoft Office macro settings
  • User application hardening
  • Restrict administrative privileges
  • Patch operating systems
  • Multi-factor authentication
  • Regular backups

The Department of Home Affairs’ Protective Security Policy Framework (PSPF) mandates that Australian Non-Corporate Commonwealth Entities (NCCEs) reach Essential Eight maturity. The Essential Eight is also one of the compliance frameworks available to owners of critical infrastructure (CI) assets under the Critical Infrastructure Risk Management Program (CIRMP) requirements of the Security of Critical Infrastructure (SOCI) Act.

In the Essential Eight Explained, the ACSC acknowledges some translation is required when applying the principles of the Essential Eight to cloud-based environments:

“The Essential Eight has been designed to protect Microsoft Windows-based internet-connected networks. While the principles behind the Essential Eight may be applied to cloud services and enterprise mobility, or other operating systems, it was not primarily designed for such purposes and alternative mitigation strategies may be more appropriate to mitigate unique cyber threats to these environments.”

The newly released guidance walks customers step-by-step through the process of reaching Essential Eight maturity in a cloud native way, making best use of the security, performance, innovation, elasticity, scalability, and resiliency benefits of the AWS Cloud. It includes a compliance matrix that maps Essential Eight strategies and controls to specific guidance and AWS resources.

It also features an example of a customer with different workloads—a serverless data lake, a containerized webservice, and an Amazon Elastic Compute Cloud (Amazon EC2) workload running commercial-off-the-shelf (COTS) software.

For more information, see Reaching Essential Eight Maturity on AWS on the AWS Prescriptive Guidance page. You can also reach out to your account team or engage AWS Professional Services, our global team of experts that can help customers realize their desired security and business outcomes on AWS.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

James Kingsmill

James Kingsmill

James is a Senior Solutions Architect on the Australian public sector team. As a member of the enterprise federal team, he has a longstanding interest in helping public sector customers achieve their transformation, automation, and security goals.

Manuwai Korber

Manuwai Korber

Manuwai is a Solutions Architect based in Sydney who specializes in the field of machine learning. He is dedicated to helping Australian public sector organizations build reliable systems that improve the experience of citizens.

How to prevent SMS Pumping when using Amazon Pinpoint or SNS

Post Syndicated from Akshada Umesh Lalaye original https://aws.amazon.com/blogs/messaging-and-targeting/how-to-prevent-sms-pumping-when-using-amazon-pinpoint-or-sns/

SMS fraud is, unfortunately, a common issue that all senders of SMS encounter as they adopt SMS as a communication channel. This post defines the most common types of fraud and provides concrete guidance on how to mitigate or eliminate each of them.

Introduction to SMS Pumping:

SMS Pumping, also known as an SMS Flood attack, or Artificially Inflated Traffic (AIT), occurs when fraudsters exploit a phone number input field to acquire a one-time passcode (OTP), an app download link, or any other content via SMS. In cases where these input forms lack sufficient security measures, attackers can artificially increase the volume of SMS traffic, thereby exploiting vulnerabilities in your application. The perpetrators dispatch SMS messages to a selection of numbers under the jurisdiction of a particular mobile network operator (MNO), ultimately receiving a portion of the resulting revenue. It is essential to understand how to detect these attacks and prevent them.

Common Evidence of SMS Pumping:

  • Dramatic Decrease in Conversion Rates: A common SMS use case is for identity verification through the use of One Time Passwords (OTP) but this could also be seen in other types of use cases where a clear and consistent conversion rate is seen. A drop in a normally stable conversion rate may be caused by an increase in volume that will never convert and can indicate an issue that requires investigation. Setting up an alert for anomalies in conversion rates is always a good practice.
  • SMS Requests or Deliveries from Unknown Countries: If your application normally sends SMS to a defined set of countries and you begin to receive requests for a different country, then then this should be investigated.
  • Spike in Outgoing Messages: A significant and sudden increase in outgoing messages could indicate an issue that requires investigation.
  • Spike in Messages Sent to a Block of Adjacent Numbers: Fraudsters often deploy bots and programmatically loop through numbers in a sequence. You will probably notice an increase in messages to a group of nearby numbers frequently for example, +11111111110, +11111111111

How to Identify and Prevent SMS Pumping Attacks:

Now that we understand the common signs of SMS pumping, lets discuss how to use AWS Services to identify, confirm the fraud and how to place measures in place to prevent it in the first place.

Identify:

Delivery Statistics (UTC)

Delivery Statistics (UTC)

If you are using Amazon Pinpoint, you can use transactional messaging under analytics section to understand the SMS patterns

Transactional Messaging Charts

Transactional Messaging Charts

  • Spikes in Messages Sent to a Block of Adjacent Numbers: If you are using SNS you can use CloudWatch logs to analyse the destination numbers.

You can use CloudWatch Insights query on below log groups

sns/<region>/<Accountnumber>/DirectPublishToPhoneNumber
sns/<region>/<Accountnumber>/DirectPublishToPhoneNumber/failure

The below query will print all the logs that have the destination number like +11111111111
fields @timestamp, @message, @logStream, @log
| filter delivery.destination like '+11111111111'
| limit 20

If you are using Amazon Pinpoint, you can enable event stream to analyse destination numbers.

If you have deployed Digital User Engagement Events Database Solution You can use the below sample Amazon Athena query which displays entries that have the destination number like +11111111111

SELECT * FROM "due_eventdb"."sms_success" where destination_phone_number like '%11111111111%'
SELECT * FROM "due_eventdb"."sms_failure" where destination_phone_number like '%11111111111%'

How to Prevent SMS Pumping: 

      • Example: If you expect only users from India to sign up in your application, you can include rules such as “\+91[0-9]{10}”, which allows only Indian numbers as input.
      • Note: SNS and Pinpoint APIs are not natively integrated with WAF. However, you can connect your application to an Amazon API Gateway with which you can integrate with WAF.
      • How to Create a Regex Pattern Set with WAF – The below Regex Pattern set will allow sending messages to Australia (+61) and India (+91) destination phone numbers
          1. Sign in to the AWS Management Console and navigate to AWS WAF console
          2. In the navigation pane, choose Regex pattern sets and then Create regex pattern set.
          3. Enter a name and description for the regex pattern set. You’ll use these to identify it when you want to use the set. For example, Allowed_SMS_Countries
          4. Select the Region where you want to store the regex pattern set
          5. In the Regular expressions text box, enter one regex pattern per line
          6. Review the settings for the regex pattern set, and choose Create regex pattern set
Regex pattern set details

Regex pattern set details

      • Create a Web ACL with above Regex Pattern Set
          1. Sign in to the AWS Management Console and navigate to AWS WAF console
          2. In the navigation pane, choose Web ACLs and then Create web ACL
          3. Enter a Name, Description and CloudWatch metric name for Web ACL details
          4. Select Resource type as Regional resources
          5. Click Next

            Web ACL details

            Web ACL details

          6. Click on Add Rules > Add my own rules and rule groups
          7. Enter Rule name and select Regular rule

            Web ACL Rule Builder

            Web ACL Rule Builder

          8. Select Inspect > Body, Content type as JSON, JSON match scope as Values, Content to inspect as Full JSON content
          9. Select Match type as Matches pattern from regex pattern set and select the Regex pattern set as “Allowed_SMS_Countries” created above
          10. Select Action as Allow
          11. Click Add Rule  

            Web ACL Rule builder statement

            Web ACL Rule builder statement

          12. Select Block for Default web ACL action for requests that don’t match any rules

            Web ACL Rules

            Web ACL Rules

          13. Set rule priority and Click Next

            Web ACL Rule priority

            Web ACL Rule priority

          14. Configure metrics and Click Next

            Web ACL metrics

            Web ACL metrics

          15. Review and Click Create web ACL

For more information, please refer to WebACL

  • Rate Limit Requests
    • AWS WAF provides an option to rate limit per originating IP. You can define the maximum number of requests allowed in a five-minute period that satisfy the criteria you provide, before limiting the requests using the rule action setting
  • CAPTCHA
    • Implement CAPTCHA in your application request process to protect your application against common bot traffic
  • Turn off “Shared Routes”
  • Exponential Delay Verification Retries
    • Implement a delay between multiple messages to the same phone number. This doesn’t completely eliminate but will help slow down the attack
  • Set CloudWatch Alarm
  • Validate Phone Numbers – You can use the Pinpoint Phone number validate API to check the values for CountryCodeIso2, CountryCodeNumeric, and PhoneType prior to sending SMS and then only send SMS to countries that match your criteria
    Sample API Response:

{
"NumberValidateResponse": {
"Carrier": "ExampleCorp Mobile",
"City": "Seattle",
"CleansedPhoneNumberE164": "+12065550142",
"CleansedPhoneNumberNational": "2065550142",
"Country": "United States",
"CountryCodeIso2": "US",
"CountryCodeNumeric": "1",
"OriginalPhoneNumber": "+12065550142",
"PhoneType": "MOBILE",
"PhoneTypeCode": 0,
"Timezone": "America/Los_Angeles",
"ZipCode": "98101"
}
}

Conclusion:

This post covers the basics of SMS pumping attacks, the different mechanisms that can be used to detect them, and some potential ways to solve for or mitigate them using services and features like Pinpoint Validate API and WAF.

Further Reading:
Review the documentation of WAF with API gateway
here
Review the documentation of Phone number validate
here
Review the Web Access Control lists
here

 

Resources:
Amazon Pinpoint –
https://aws.amazon.com/pinpoint/
Amazon API Gateway –
https://aws.amazon.com/api-gateway/
Amazon Athena –
https://aws.amazon.com/athena/

Hello World #22 out now: Teaching and AI

Post Syndicated from Meg Wang original https://www.raspberrypi.org/blog/hello-world-22-ai-education/

Recent developments in artificial intelligence are changing how the world sees computing and challenging computing educators to rethink their approach to teaching. In the brand-new issue of Hello World, out today for free, we tackle some big questions about AI and computing education. We also get practical with resources for your classroom.

Cover of Hello World issue 22.

Teaching and AI

In their articles for issue 22, educators explore a range of topics related to teaching and AI, including what is AI literacy and how do we teach it; gender bias in AI and what we can do about it; how to speak to young children about AI; and why anthropomorphism hinders learners’ understanding of AI.

Our feature articles also include a research digest on AI ethics for children, and of course hands-on examples of AI lessons for your learners.

A snapshot of AI education

Hello World issue 22 is a comprehensive snapshot of the current landscape of AI education. Ben Garside, Learning Manager for our Experience AI programme and guest editor of this issue, says:

“When I was teaching in the classroom, I used to enjoy getting to grips with new technological advances and finding ways in which I could bring them into school and excite the students I taught. Occasionally, during the busiest of times, I’d also look longingly at other subjects and be jealous that their curriculum appeared to be more static than ours (probably a huge misconception on my behalf).”

It’s inspiring for me to see how the education community is reacting to the opportunities that AI can provide.

Ben Garside

“It’s inspiring for me to see how the education community is reacting to the opportunities that AI can provide. Of course, there are elements of AI where we need to tread carefully and be very cautious in our approach, but what you’ll see in this magazine is educators who are thinking creatively in this space.”

Download Hello World issue 22 for free

AI is a topic we’ve addressed before in Hello World, and we’ll keep covering this rapidly evolving area in future. We hope this issue gives you plenty of ideas to take away and build upon.

Also in issue 22:

  • Vocational training for young people
  • Making the most of online educator training
  • News about BBC micro:bit
  • An insight into the WiPSCE 2023 conference for teachers and educators
  • And much, much more

You can download your free PDF issue now, or purchase a print copy from our store. UK-based subscribers for a free print edition can expect their copies to arrive in the mail this week.

Send us a message or tag us on social media to let us know which articles have made you think and, most importantly, which will help you with your teaching.

The post Hello World #22 out now: Teaching and AI appeared first on Raspberry Pi Foundation.

The collective thoughts of the interwebz