[$] Restricted DMA

2021-01-07

Post Syndicated from original https://lwn.net/Articles/841916/rss

A key component of system hardening is restricting access to memory; this
extends to preventing the kernel itself from accessing or modifying much of
the memory in the system most of the time. Memory that cannot be accessed
cannot be read or changed by an attacker. On many systems, though, these
restrictions do not apply to peripheral devices, which can happily use
direct memory access (DMA) on most or all of the available memory. The
recently posted restricted
DMA patch set aims to reduce exposure to buggy or malicious device
activity by tightening up control over the memory that DMA operations are
allowed to access.

Highlights: Why We Sleep: Science of Sleep & Dreams | Matthew Walker | Talks at Google

2021-01-07 Talks at Google

Post Syndicated from Talks at Google original https://www.youtube.com/watch?v=pCwlfSTNK3A

New – AWS Transfer Family support for Amazon Elastic File System

2021-01-07 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-aws-transfer-family-support-for-amazon-elastic-file-system/

AWS Transfer Family provides fully managed Secure File Transfer Protocol (SFTP), File Transfer Protocol (FTP) over TLS, and FTP support for Amazon Simple Storage Service (S3), enabling you to seamlessly migrate your file transfer workflows to AWS.

Today I am happy to announce AWS Transfer Family now also supports file transfers to Amazon Elastic File System (EFS) file systems as well as Amazon S3. This feature enables you to easily and securely provide your business partners access to files stored in Amazon EFS file systems. With this launch, you now have the option to store the transferred files in a fully managed file system and reduce your operational burden, while preserving your existing workflows that use SFTP, FTPS, or FTP protocols.

Amazon EFS file systems are accessible within your Amazon Virtual Private Cloud (VPC) and VPC connected environments. With this launch, you can securely enable third parties such as your vendors, partners, or customers to access your files over the supported protocols at scale globally, without needing to manage any infrastructure. When you select Amazon EFS as the data store for your AWS Transfer Family server, the transferred files are readily available to your business-critical applications running on Amazon Elastic Compute Cloud (EC2), as well as to containerized and serverless applications run using AWS services such as Amazon Elastic Container Service (ECS), Amazon Elastic Kubernetes Service (EKS), AWS Fargate, and AWS Lambda.

Using Amazon EFS – Getting Started
To get started in your existing Amazon EFS file system, make sure the POSIX identities you assign for your SFTP/FTPS/FTP users are owners of the files and directories you want to provide access to. You will provide access to that Amazon EFS file system through a resource-based policy. Your role also needs to establish a trust relationship. This trust relationship allows AWS Transfer Family to assume the AWS Identity and Access Management (IAM) role to access your bucket so that it can service your users’ file transfer requests.

You will also need to make sure you have created a mount target for your file system. In the example below, the home directory is owned by userid 1234 and groupid 5678.

$ mkdir home/myname
$ chown 1234:5678 home/myname

When you create a server in the AWS Transfer Family console, select Amazon EFS as your storage service in the Step 4 section Choose a domain.

When the server is enabled and in an online state, you can add users to your server. On the Servers page, select the check box of the server that you want to add a user to and choose Add user.

In the User configuration section, you can specify the username, uid (e.g. 1234), gid (e.g 5678), IAM role, and Amazon EFS file system as user’s home directory. You can optionally specify a directory within the file system which will be the user’s landing directory. You use a service-managed identity type – SSH keys. If you want to use password type, you can use a custom option with AWS Secrets Manager.

Amazon EFS uses POSIX IDs which consist of an operating system user id, group id, and secondary group id to control access to a file system. When setting up your user, you can specify the username, user’s POSIX configuration, and an IAM role to access the EFS file system. To learn more about configuring ownership of sub-directories in EFS, visit the documentation.

Once the users have been configured, you can transfer files using the AWS Transfer Family service by specifying the transfer operation in a client. When your user authenticates successfully using their file transfer client, it will be placed directly within the specified home directory, or root of the specified EFS file system.

$ sftp [email protected]

sftp> cd /fs-23456789/home/myname
sftp> ls -l
-rw-r--r-- 1 3486 1234 5678 Jan 04 14:59 my-file.txt
sftp> put my-newfile.txt
sftp> ls -l
-rw-r--r-- 1 3486 1234 5678 Jan 04 14:59 my-file.txt
-rw-r--r-- 1 1002 1234 5678 Jan 04 15:22 my-newfile.txt

Most of SFTP/FTPS/FTP commands are supported in the new EFS file system. You can refer to a list of available commands for FTP and FTPS clients in the documentation.

Command	Amazon S3	Amazon EFS
`cd`	Supported	Supported
`ls/dir`	Supported	Supported
`pwd`	Supported	Supported
`put`	Supported	Supported
`get`	Supported	Supported including resolving symlinks
`rename`	Supported (only file)	Supported (file or folder)
`chown`	Not supported	Supported (`root` only)
`chmod`	Not supported	Supported (`root` only)
`chgrp`	Not supported	Supported (`root` or `owner` only)
`ln -s`	Not supported	Not supported
`mkdir`	Supported	Supported
`rm`	Supported	Supported
`rmdir`	Supported (non-empty folders only)	Supported
`chmtime`	Not Supported	Supported

You can use Amazon CloudWatch to track your users’ activity for file creation, update, delete, read operations, and metrics for data uploaded and downloaded using your server. To learn more on how to enable CloudWatch logging, visit the documentation.

Available Now
AWS Transfer Family support for Amazon EFS file systems is available in all AWS Regions where AWS Transfer Family is available. There are no additional AWS Transfer Family charges for using Amazon EFS as the storage backend. With Amazon EFS storage, you pay only for what you use. There is no need to provision storage in advance and there are no minimum commitments or up-front fees.

To learn more, take a look at the FAQs and the documentation. Please send feedback to the AWS forum for AWS Transfer Family or through your usual AWS support contacts.

Learn all the details about AWS Transfer Family to access Amazon EFS file systems and get started today.

— Channy;

Ransom DDoS attacks target a Fortune Global 500 company

2021-01-07 Omer Yoachimik

Post Syndicated from Omer Yoachimik original https://blog.cloudflare.com/ransom-ddos-attacks-target-a-fortune-global-500-company/

Ransom DDoS attacks target a Fortune Global 500 company

In late 2020, a major Fortune Global 500 company was targeted by a Ransom DDoS (RDDoS) attack by a group claiming to be the Lazarus Group. Cloudflare quickly onboarded them to the Magic Transit service and protected them against the lingering threat. This extortion attempt was part of wider ransom campaigns that have been unfolding throughout the year, targeting thousands of organizations around the world. Extortionists are threatening organizations with crippling DDoS attacks if they do not pay a ransom.

Throughout 2020, Cloudflare onboarded and protected many organizations with Magic Transit, Cloudflare’s DDoS protection service for critical network infrastructure, the WAF service for HTTP applications, and the Spectrum service for TCP/UDP based applications — ensuring their business’s availability and continuity.

Unwinding the attack timeline

I spoke with Daniel (a pseudonym) and his team, who work at the Incident Response and Forensics team at the aforementioned company. I wanted to learn about their experience, and share it with our readers so they could learn how to better prepare for such an event. The company has requested to stay anonymous and so some details have been omitted to ensure that. In this blog post, I will refer to them as X.

Initially, the attacker sent ransom emails to a handful of X’s publicly listed email aliases such as press@, shareholder@, and hostmaster@. We’ve heard from other customers that in some cases, non-technical employees received the email and ignored it as being spam which delayed the incident response team’s time to react by hours. However, luckily for X, a network engineer that was on the email list of the hostmaster@ alias saw it and immediately forwarded it to Daniel’s incident response team.

In the ransom email, the attackers demanded 20 bitcoin and gave them a week to pay up, or else a second larger attack would strike, and the ransom would increase to 30 bitcoin. Daniel says that they had a contingency plan ready for this situation and that they did not intend to pay. Paying the ransom funds illegitimate activities, motivates the attackers, and does not guarantee that they won’t attack anyway.

…Please perform a google search of “Lazarus Group” to have a look at some of our previous work. Also, perform a search for “NZX” or “New Zealand Stock Exchange” in the news. You don’t want to be like them, do you?…

The current fee is 20 Bitcoin (BTC). It’s a small price to pay for what will happen if your whole network goes down. Is it worth it? You decide!…

If you decide not to pay, we will start the attack on the indicated date and uphold it until you do. We will completely destroy your reputation and make sure your services will remain offline until you pay…

–An excerpt of the ransom note

The contingency plan

Upon receiving the email from the network engineer, Daniel called him and they started combing through the network data — they noticed a significant increase in traffic towards one of their global data centers. This attacker was not playing around, firing gigabits per second towards a single server. The attack, despite just being a proof of intention, saturated the Internet uplink to that specific data center, causing a denial of service event and generating a series of failure events.

This first “teaser” attack came on a work day, towards the end of business hours as people were already wrapping up their day. At the time, X was not protected by Cloudflare and relied on an on-demand DDoS protection service. Daniel activated the contingency plan which relied on the on-demand scrubbing center service.

Daniel contacted their DDoS protection service. It took them over 30 minutes to activate the service and redirect X’s traffic to the scrubbing center. Activating the on-demand service caused networking failures and resulted in multiple incidents for X on various services — even ones that were not under attack. Daniel says hindsight is 2020 and he realized that an always-on service would have been much more effective than on-demand, reactionary control that takes time to implement, after the impact is felt. The networking failures amplified the one-hour attack resulting in incidents lasting much longer than expected.

Onboarding to Cloudflare

Following the initial attack, Daniel’s team reached out to Cloudflare and wanted to onboard to our automated always-on DDoS protection service, Magic Transit. The goal was to onboard to it before the second attack would strike. Cloudflare explained the pre-onboarding steps, provided details on the process, and helped onboard X’s network in a process Daniel defined as “quite painless and very professional. The speed and responsiveness were impressive. One of the key differentiation is the attack and traffic analytics that we see that our incumbent provider couldn’t provide us. We’re seeing attacks we never knew about being mitigated automatically.”

The attackers promised a second, huge attack which never happened. Perhaps it was just an empty threat, or it could be that the attackers detected that X is protected by Cloudflare which deterred them and they, therefore, decided to move on to their next target?

Recommendations for organizations

I asked Daniel if he has any recommendations for businesses so they can learn from his experience and be better prepared, should they be targeted by ransom attacks:

1. Utilize an automated always-on DDoS protection service

Do not rely on reactive on-demand SOC-based DDoS Protection services that require humans to analyze attack traffic. It just takes too long. Don’t be tempted to use an on-demand service: “you get all of the pain and none of the benefits”. Instead, onboard to a cloud service that has sufficient network capacity and automated DDoS mitigation systems.

2. Work with your vendor to build and understand your threat model

Work together with your DDoS protection vendor to tailor mitigation strategies to your workload. Every network is different, and each poses unique challenges when integrating with DDoS mitigation systems.

3. Create a contingency plan and educate your employees

Be prepared. Have plans ready and train your teams on them. Educate all of your employees, even the non-techies, on what to do if they receive a ransom email. They should report it immediately to your Security Incident Response team.

Cloudflare customers need not worry as they are protected. Enterprise customers can reach out to their account team if they are being extorted in order to review and optimize their security posture if needed. Customers on all other plans can reach out to our support teams and learn more about how to optimize your Cloudflare security configuration.

Not a Cloudflare customer yet? Speak to an expert or sign up.

How COVID-19 Reinforced the Need for Mobile Device Management

2021-01-07 Justin Turcotte

Post Syndicated from Justin Turcotte original https://blog.rapid7.com/2021/01/07/how-covid-19-reinforced-the-need-for-mobile-device-management/

How COVID-19 Reinforced the Need for Mobile Device Management

How many of you got that call at the beginning of the pandemic to make your company’s workforce 100% capable for remote work? How many of you had no idea how to make that happen, seemingly (and sometimes literally) overnight? How many of you were already prepared for such an event?

Remote workforces and mobile device management (MDM) are more important than ever in 2020’s pandemic reality. Unmanaged remote endpoints are one of the biggest risks to an organization’s cybersecurity posture today.

Don’t think of remote endpoints solely from the isolated ransomware/malware infection standpoint. Instead, think of them from a MITRE ATT&CK matrix perspective. Ask yourself these questions:

Can attackers gain access to the endpoint?
Can attackers establish persistence?
Can attackers perform data collection and exfiltration?
What could an attacker achieve by compromising an unmanaged remote endpoint?
What can that endpoint bring back to the enterprise network with it when it returns to the office?

While working with Rapid7 customers over the past several months of quarantine and lockdown, it’s evident to me that many companies were caught completely off guard when facing the reality of being unable to work from their corporate offices.

Many customers have no ability to manage their endpoints remotely without them being connected to the company VPN, or in many cases, are unable to manage them at all. Many times, these VPN connections are unreliable, or the company had not planned for the network overhead required for a thousand employees connecting to the company VPN at the same time.

Companies have spent large amounts of money over the past several months rolling out more robust VPN solutions and mobile devices (like laptops and tablets) for users to be able to perform their jobs remotely. And security has seemingly taken a backseat to these larger efforts to keep workforces employed and productive.

Here are a few solutions we’ve seen many of our customers using for remote productivity and connectivity:

VPN: Company-controlled VPN service installed and configured on remote endpoints for users to connect as necessary.
Always-On VPN: A VPN connection that is “always on,” whenever the endpoint is connected to the internet. This configuration is more secure, as users are forced to connect to the company network in order to perform any work that requires the internet or network resources. This can help ensure users are not surfing dangerous websites or using other unapproved services such as personal email or file-sharing sites to perform official work.
Bring-Your-Own-Device (BYOD): BYOD scenarios include installing a company-controlled VPN client and configuration on an employee-owned device. This configuration is less than desirable due to the inability to control the remote endpoint in any capacity due to the lack of ownership.
Loose Controls: Some customers have even relaxed security measures that were in place prior to the pandemic. Due to the speed with which companies were forced to loosen security measures—such as removing multi-factor authentication requirements and disabling password rotation requirements—these actions have left some companies at great risk of being compromised.

Cloud-based remote management and security solutions are the key to beating remote work requirements imposed by federal and local governments.

There are still a large number of companies that seem to be cloud-averse when it comes to anything to do with endpoints or security, but this new reality makes it necessary to start adopting cloud-based solutions to manage your enterprise network.

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Cloud managed services to consider for a completely remote or mostly remote workforce

Antivirus

A cloud-based antivirus solution that does not require connectivity to the enterprise network in order to receive signature or software updates is crucial in this new dynamic. Users are taking their systems to their home wireless networks, which have notoriously weak security.

Anti-malware and endpoint detection and response (EDR)

Having an EDR and anti-malware solution that is able to report to a cloud-based management console is also important to prevent malware infections and alert on suspicious or anomalous activity.

Vulnerability management

Having a cloud-based vulnerabaility management solutionsystem in place that can report back to a centrally managed vulnerability management system is important for assessing the overall level of risk that an organization has in regard to remote endpoints.

Asset management

An effective asset management solution is crucial for an effective vulnerability management program. You cannot patch or secure what you do not know you have on your network. Asset management systems also help with remote support and resource planning.

Patch and software deployment

Are you able to patch or update software on remote endpoints easily and effectively? Is your current patch management solution able to reach remote endpoints reliably? Having a cloud-based patch and software deployment solution is key to ensuring your endpoints are kept up-to-date with the latest patches and version updates.

Data loss prevention

Are you able to see company data flowing across the enterprise network? Can you monitor the types of data flowing through VPN connections, personal emails, or cloud-based file sharing solutions?

These areas are just a few of the most important MDM or RMM solutions needed in today’s pandemic toolbox for the IT security professional.

Future considerations

If you already have some of these areas covered, can your tools integrate with one another to provide a single pane of glass administration console that enables your IT and security teams to perform day-to-day tasks?

Where can you consolidate tools into one platform? Can your patching solution act as your asset management solution as well? Can your endpoint detection and response system be a remote SIEM solution or a User Behavior Analytics system?

COVID-19 has altered the attack landscape forever. Work from home is likely not going anywhere and will only become more necessary as lockdowns continue. Some organizations have even opted to close offices and work remotely on a permanent basis after discovering how well their teams were able to work from the house. The need for mobile device management and the ability to detect and remediate vulnerabilities on remote endpoints is now a necessity rather than a convenience.

Explore Our Cloud-Based Security Solutions Today

Learn More

5 Simple Steps to Improve Your Portraits!

2021-01-07 Matt Granger

Post Syndicated from Matt Granger original https://www.youtube.com/watch?v=7gn5h-Mfzgo

Security updates for Thursday

2021-01-07

Post Syndicated from original https://lwn.net/Articles/841977/rss

Security updates have been issued by Debian (golang-websocket, nodejs, and pacemaker), Fedora (mingw-binutils and rubygem-em-http-request), and Ubuntu (linux-oem-5.6 and p11-kit).

Aranet — a wireless IOT sensor platform

2021-01-07 Toms Reksna

Post Syndicated from Toms Reksna original https://blog.zabbix.com/aranet-a-wireless-iot-sensor-platform/12953/

Aranet — wireless IoT sensor platform. Wherever you need to measure anything – temperature, air quality, light, or any other physical parameter – Aranet’s main mission is to deliver these measurements simply, easily, and above all – wirelessly. Aranet is manufactured by SAF Tehnika — a company with over 20 years of experience in the telecom industry, microwave radio, and test & measurement equipment manufacturing, and a certified partner of Zabbix.

I. Aranet wireless sensor network (1:41)
II. Aranet in retail (5:53)
III. Indoor air quality and COVID19 (8:20)
IV. Partnership with Aranet (12:11)
V. Questions & Answers (13:52)

Aranet wireless sensor network

Aranet is a wireless sensor network consisting of the Aranet PRO base station and sensors transmitting data to one another over the 868 MHz frequency in Europe and 920 MHz frequency in the United States. This frequency allows us to have a very large line of sight distance between the sensor and the base station — up to 3 kilometers line of sight and a couple of hundred meters indoors.

Sensors are intended to measure different environmental parameters. You can connect up to a hundred sensors per base station. Sensors can be configured to send the data over different intervals — once every minute, two minutes, five, or 10 minutes. Sensors are very power efficient — with a regular AA battery, they will last up to 10 years.

Aranet ecosystem

Aranet technology is based on the LoRa physical layer. We have built our proprietary LPWAN protocol with XXTEA encryption on top of LoRa to make the radio parameters better and to increase the battery life.

Aranet technology

The brain of the system is the Aranet PRO base station – the radio receiver with a built-in web server housing SensorHUB software and internal memory for local data storage. It is made with ease of use in mind – you can connect directly to the base station with your PC, laptop, or phone over Ethernet or Wi-Fi, open up your web browser and access the free SensorHUB software. You don’t even need to install anything.

Aranet PRO base station offers a lot of features such as graphing, exporting data, etc. In addition, its internal memory allows for storing 10 years of readings even if the Internet goes down.

The sensors are sending data to the base station. Several such base stations can be agglomerated into the Aranet Cloud solution collecting data from several base stations and allowing you to access the data from anywhere.

Aranet architecture

With over 20 years of experience in radio manufacturing, we believe that we’ve created one of the best-in-class systems in terms of wireless connectivity with our base stations and in-house cloud. However, we are looking for a strategic partnership where the Aranet system can become a part of a larger system. This brought us to the partnership with Zabbix so that we can integrate our cloud solution with the Zabbix monitoring system.

Aranet philosophy

Aranet Example Use Cases

Aranet for retail

Rimi

Aranet has been actively used in retail, for instance, Rimi — a chain of Latvian supermarkets, where 6,500 sensors have been installed in 125 stores. Aranet is planning to expand to other Baltic states.

Aranet equipment is primarily used for:

Monitoring of freezer temperatures. Earlier, they had to check the temperature manually — somebody had to walk around with the legal pad and check the temperature to make sure that freezers are working properly and to report to the relevant government agencies. Aranet allowed for automating this process.
Alarms in case of malfunction. In the case of a malfunction, an alarm can be sent to avoid product spoilage.
Working on predictive maintenance, including machine learning algorithms for predictive maintenance to locate anomalies in the defrost cycle temperature data helping to prevent breakages.

Aranet in retail

Benefits

Even the largest supermarkets (8800 m2/94 000 ft2) can be covered with a single base station.
Manual data collection can be avoided
Freezer temperature operating costs can be optimized (20% energy costs reduction).
Product spoilage can be avoided.
Litigation/fines for slip and fall accidents can be avoided.

Aranet for indoor air quality and COVID19 safety

Due to COVID19, many governments and health agencies have changed their guidelines, including the Center for Disease Control in the United States, and they now state that COVID19 can be transmitted through aerosols. Aerosols are small droplets that are released when we cough, sneeze, or talk. As these droplets are small — about five microns, they linger in the air for up to nine or more minutes. So, that means that you don’t even have to be in contact with the infected person to actually catch the disease.

This requires proper ventilation practices, which can decrease x10 the time aerosol particles stay in the air.

Aranet4 PRO – a wireless COVID19 safety network

One way to estimate if ventilation is sufficient is to measure CO2. The amount of CO2 (air exhaled by other people) in a certain room is a measure of the risk of contagion. The recommended air circulation per person is 60m3 /h, which is approximately 800ppm CO2 concentration — almost twice as much as the outside value.

Aranet wireless CO2 sensor

Aranet offers a wireless CO2 sensor that also measures temperature, relative humidity, and air pressure. It comes with a useful Bluetooth application, which allows you to easily get the latest readings. But the most important thing is that this sensor can generate alerts. So, whenever the value exceeds the critical level, you have a visual indication — green, yellow, or red, as well as an audible alert prompting to manually increase the ventilation, for instance, by opening windows.

Lately, these sensors have been gaining popularity, especially in schools, universities, and offices as they offer:

Simple plug-and-play setup with the Aranet base station.
Updating information available locally on each sensor, as well as centrally on the base station, so that you can see what spaces need additional ventilation.
Free software – graphs, reports, centralized alarms.
Control of airborne COVID19 spread in schools, offices, and other indoor facilities.

Partnership with Aranet

Aranet wireless network can be implemented in many other industries:

Horticulture,
Livestock,
Building Management,
Warehousing,
Data Centres,
Pharma,
Medical,
Retail.

So, Aranet is looking for integration and distribution partners, which are interested in wireless monitoring. Details of the partnership are available on aranet.com or can be requested from [email protected].

The Aranet’s core value is the wisdom of Lord Kelvin: “you can only improve what you can measure”. So, we strive for delivering these measurements in the easiest and the most straightforward way possible so that you could improve whatever you wish.

Questions & Answers

Question. Is there some way or some benefit to integrating Aranet with Zabbix?

Answer. Aranet has many and diverse applications, as well as Zabbix. So, adding physical parameters on top of the monitoring solution network parameters would help out. For data centers or retail stores, in addition to alerts of something wrong with the network, alarms of something physical happening would be useful. It might be useful to be alerted, for instance, if it’s too hot.

Question. Is it possible to switch your sensors to LoRaWAN so that we can use existing networks?

Answer. We have decided to have our proprietary network based on the LoRa physical layer with proprietary communication software. This decision was made for several reasons:

ease of use— the main thing that our customers actually value. Aranet system can be easily set up in a couple of minutes — you just lay the sensors and they start working. With LoRaWAN you have the base station from one provider, and sensors from the other, so it takes time to make the system work. Aranet works out of the box.
improved battery due to our protocol.
improved security as with Aranet you control the whole ecosystem from the base station to sensors. In addition, with Aranet you won’t face dependencies, password management, or communication issues.
private network.

Question. Are there any electrical sensors — volts, amps, power, or anything like that?

Answer. We can monitor voltage, but these are mostly for third-party integrations. We have pulse output sensors, which you can connect to these electricity meters, for instance. So, this can be monitored.

Extracting Personal Information from Large Language Models Like GPT-2

2021-01-07 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/01/extracting-personal-information-from-large-language-models-like-gpt-2.html

Researchers have been able to find all sorts of personal information within GPT-2. This information was part of the training data, and can be extracted with the right sorts of queries.

Paper: “Extracting Training Data from Large Language Models.”

Abstract: It has become common to publish large (billion parameter) language models that have been trained on private datasets. This paper demonstrates that in such settings, an adversary can perform a training data extraction attack to recover individual training examples by querying the language model.

We demonstrate our attack on GPT-2, a language model trained on scrapes of the public Internet, and are able to extract hundreds of verbatim text sequences from the model’s training data. These extracted examples include (public) personally identifiable information (names, phone numbers, and email addresses), IRC conversations, code, and 128-bit UUIDs. Our attack is possible even though each of the above sequences are included in just one document in the training data.

We comprehensively evaluate our extraction attack to understand the factors that contribute to its success. For example, we find that larger models are more vulnerable than smaller models. We conclude by drawing lessons and discussing possible safeguards for training large language models.

From a blog post:

We generated a total of 600,000 samples by querying GPT-2 with three different sampling strategies. Each sample contains 256 tokens, or roughly 200 words on average. Among these samples, we selected 1,800 samples with abnormally high likelihood for manual inspection. Out of the 1,800 samples, we found 604 that contain text which is reproduced verbatim from the training set.

The rest of the blog post discusses the types of data they found.

Code your own Pipe Mania puzzler | Wireframe #46

2021-01-07 Ryan Lambie

Post Syndicated from Ryan Lambie original https://www.raspberrypi.org/blog/code-your-own-pipe-mania-puzzler-wireframe-46/

Create a network of pipes before the water starts to flow in our re-creation of a classic puzzler. Jordi Santonja shows you how.

A screen grab of the game in motion — Pipe Mania’s design is so effective, it’s appeared in various guises elsewhere – even as a minigame in BioShock.

Pipe Mania, also called Pipe Dream in the US, is a puzzle game developed by The Assembly Line in 1989 for Amiga, Atari ST, and PC, and later ported to other platforms, including arcades. The player must place randomly generated sections of pipe onto a grid. When a counter reaches zero, water starts to flow and must reach the longest possible distance through the connected pipes.

Let’s look at how to recreate Pipe Dream in Python and Pygame Zero. The variable start is decremented at each frame. It begins with a value of 60*30, so it reaches zero after 30 seconds if our monitor runs at 60 frames per second. In that time, the player can place tiles on the grid to build a path. Every time the user clicks on the grid, the last tile from nextTiles is placed on the play area and a new random tile appears at the top of the next tiles. randint(2,8) computes a random value between 2 and 8.

Our Pipe Mania homage. Build a pipeline before the water escapes, and see if you can beat your own score.

grid and nextTiles are lists of tile values, from 0 to 8, and are copied to the screen in the draw function with the screen.blit operation. grid is a two-dimensional list, with sizes gridWidth=10 and gridHeight=7. Every pipe piece is placed in grid with a mouse click. This is managed with the Pygame functions on_mouse_move and on_mouse_down, where the variable pos contains the mouse position in the window. panelPosition defines the position of the top-left corner of the grid in the window. To get the grid cell, panelPosition is subtracted from pos, and the result is divided by tileSize with the integer division //. tileMouse stores the resulting cell element, but it is set to (-1,-1) when the mouse lies outside the grid.

The images folder contains the PNGs with the tile images, two for every tile: the graphical image and the path image. The tiles list contains the name of every tile, and adding to it _block or _path obtains the name of the file. The values stored in nextTiles and grid are the indexes of the elements in tiles.

wfmag46code — Here’s Jordi’s code for a Pipemania-style puzzler. To get it working on your system, you’ll need to install Pygame Zero. And to download the full code and assets, head here.

The image waterPath isn’t shown to the user, but it stores the paths that the water is going to follow. The first point of the water path is located in the starting tile, and it’s stored in currentPoint. update calls the function CheckNextPointDeleteCurrent, when the water starts flowing. That function finds the next point in the water path, erases it, and adds a new point to the waterFlow list. waterFlow is shown to the user in the draw function.

pointsToCheck contains a list of relative positions, offsets, that define a step of two pixels from currentPoint in every direction to find the next point. Why two pixels? To be able to define the ‘cross’ tile, where two lines cross each other. In a ‘cross’ tile the water flow must follow a straight line, and this is how the only points found are the next points in the same direction. When no next point is found, the game ends and the score is shown: the number of points in the water path, playState is set to 0, and no more updates are done.

Get your copy of Wireframe issue 46

You can read more features like this one in Wireframe issue 46, available directly from Raspberry Pi Press — we deliver worldwide.

And if you’d like a handy digital version of the magazine, you can also download issue 46 for free in PDF format.

The post Code your own Pipe Mania puzzler | Wireframe #46 appeared first on Raspberry Pi.

Cloud Key Firmware v2.0.24 – Major Update!

2021-01-07 Crosstalk Solutions

Post Syndicated from Crosstalk Solutions original https://www.youtube.com/watch?v=y_A-Zcc1yHM

[$] LWN.net Weekly Edition for January 7, 2021

2021-01-07

Post Syndicated from original https://lwn.net/Articles/841304/rss

The LWN.net Weekly Edition for January 7, 2021 is available.

Home Assistant Release Party 2021.1

2021-01-06 Home Assistant

Post Syndicated from Home Assistant original https://www.youtube.com/watch?v=LNRhITup4Bc

[$] Bootstrappable builds

2021-01-06

Post Syndicated from original https://lwn.net/Articles/841797/rss

The idea of Reproducible
Builds—being able to recreate bit-for-bit identical binaries using the
same source code—has gained momentum over the last few years.
Reproducible builds provide some safeguards against bad actors
in the software supply chain. But building software depends on the tools
used to construct the binary, including compilers and build-automation tools, many of
which depend on pre-existing binaries. Minimizing the reliance on opaque
binaries for building our software ecosystem is the goal of the Bootstrappable Builds project.

[$] Some unlikely 2021 predictions

2021-01-06

Post Syndicated from original https://lwn.net/Articles/840632/rss

Just because something is traditional does not imply that it is necessarily
a good idea. As a case in point, consider LWN’s tradition of starting the
year with some predictions for what is to come; some may be obvious while
others are implausible, but none of them are reliable. Nonetheless, we’ve
been doing this since 2002 so we can’t stop now.
Read on for our wild guesses as to what might transpire in 2021.

Content-Security-Policy Nonce with Spring Security

2021-01-06 Bozho

Post Syndicated from Bozho original https://techblog.bozho.net/content-security-policy-nonce-with-spring-security/

Content-Security-Policy is important for web security. Yet, it’s not mainstream yet, it’s syntax is hard, it’s rather prohibitive and tools rarely have flexible support for it.

While Spring Security does have a built-in Content Security Policy (CSP) configuration, it allows you to specify the policy a a string, not build it dynamically. And in some cases you need more than that.

In particular, CSP discourages the user of inline javascript, because it introduces vulnerabilities. If you really need it, you can use unsafe-inline but that’s a bad approach, as it negates the whole point of CSP. The alternative presented on that page is to use hash or nonce.

I’ll explain how to use nonce with spring security, if you are using .and().headers().contentSecurityPolicy(policy). The policy string is static, so you can’t generate a random nonce for each request. And having a static nonce is useless. So first, you define a CSP nonce filter:

public class CSPNonceFilter extends GenericFilterBean {
    private static final int NONCE_SIZE = 32; //recommended is at least 128 bits/16 bytes
    private static final String CSP_NONCE_ATTRIBUTE = "cspNonce";

    private SecureRandom secureRandom = new SecureRandom();

    @Override
    public void doFilter(ServletRequest req, ServletResponse res, FilterChain chain) throws IOException, ServletException {
        HttpServletRequest request = (HttpServletRequest) req;
        HttpServletResponse response = (HttpServletResponse) res;

        byte[] nonceArray = new byte[NONCE_SIZE];

        secureRandom.nextBytes(nonceArray);

        String nonce = Base64.getEncoder().encodeToString(nonceArray);
        request.setAttribute(CSP_NONCE_ATTRIBUTE, nonce);

        chain.doFilter(request, new CSPNonceResponseWrapper(response, nonce));
    }

    /**
     * Wrapper to fill the nonce value
     */
    public static class CSPNonceResponseWrapper extends HttpServletResponseWrapper {
        private String nonce;

        public CSPNonceResponseWrapper(HttpServletResponse response, String nonce) {
            super(response);
            this.nonce = nonce;
        }

        @Override
        public void setHeader(String name, String value) {
            if (name.equals("Content-Security-Policy") && StringUtils.isNotBlank(value)) {
                super.setHeader(name, value.replace("{nonce}", nonce));
            } else {
                super.setHeader(name, value);
            }
        }

        @Override
        public void addHeader(String name, String value) {
            if (name.equals("Content-Security-Policy") && StringUtils.isNotBlank(value)) {
                super.addHeader(name, value.replace("{nonce}", nonce));
            } else {
                super.addHeader(name, value);
            }
        }
    }
}

And then you configure it with spring security using: .addFilterBefore(new CSPNonceFilter(), HeaderWriterFilter.class).

The policy string should containt `nonce-{nonce}` which would get replaced with a random nonce on each request.

The filter is set before the HeaderWriterFilter so that it can wrap the response and intercept all calls to setting headers. Why it can’t be done by just overriding the headers after they are set by the HeaderWriterFiilter, using response.setHeader(..) – because the response is already committed and overriding does nothing.

Then in your pages where you for some reason need inline scripts, you can use:

<script nonce="{{ cspNonce }}">...</script>

(I’m using the Pebble template syntax; but you can use any template to output the request attribute “csp-nonce”)

Once again, inline javascript is rarely a good idea, but sometimes it’s necessary, at least temporarily – if you are adding a CSP to a legacy application, for example, and can’t rewrite everything).

We should have CSP everywhere, but building the policy should be aided by the frameworks we use, otherwise it’s rather tedious to write a proper policy that doesn’t break your application and is secure at the same time.

The post Content-Security-Policy Nonce with Spring Security appeared first on Bozho's tech blog.

Orchestrating an AWS Glue DataBrew job and Amazon Athena query with AWS Step Functions

2021-01-06 Sakti Mishra

Post Syndicated from Sakti Mishra original https://aws.amazon.com/blogs/big-data/orchestrating-an-aws-glue-databrew-job-and-amazon-athena-query-with-aws-step-functions/

As the industry grows with more data volume, big data analytics is becoming a common requirement in data analytics and machine learning (ML) use cases. Also, as we start building complex data engineering or data analytics pipelines, we look for a simpler orchestration mechanism with graphical user interface-based ETL (extract, transform, load) tools.

Recently, AWS announced the general availability of AWS Glue DataBrew, a new visual data preparation tool that helps you clean and normalize data without writing code. This reduces the time it takes to prepare data for analytics and ML by up to 80% compared to traditional approaches to data preparation.

Regarding orchestration or workflow management, AWS provides AWS Step Functions, a serverless function orchestrator that makes it easy to build a workflow by integrating different AWS services like AWS Lambda, Amazon Simple Notification Service (Amazon SNS), AWS Glue, and more. With its built-in operational controls, Step Functions manages sequencing, error handling, retry logic, and states, removing a significant operational burden from your team.

Today, we’re launching Step Functions support for DataBrew, which means you can now invoke DataBrew jobs in your Step Functions workflow to build an end-to-end ETL pipeline. Recently, Step Functions also started supporting Amazon Athena integration, which means that you can submit SQL queries to the Athena engine through a Step Functions state.

In this post, we walk through a solution where we integrate a DataBrew job for data preparation, invoke a series of Athena queries for data refresh, and integrate Amazon QuickSight for business reporting. The whole solution is orchestrated through Step Functions and is invoked through Amazon EventBridge.

Use case overview

For our use case, we use two public datasets. The first dataset is a sales pipeline dataset, which contains a list of over 20,000 sales opportunity records for a fictitious business. Each record has fields that specify the following:

A date, potentially when an opportunity was identified
The salesperson’s name
A market segment to which the opportunity belongs
Forecasted monthly revenue

The second dataset is an online marketing metrics dataset. This dataset contains records of marketing metrics, aggregated by day. The metrics describe user engagement across various channels, such as websites, mobile, and social media, plus other marketing metrics. The two datasets are unrelated, but for the purpose of this post, we assume that they’re related.

For our use case, these sales and marketing CSV files are maintained by your organization’s Marketing team, which uploads the updated full CSV file to Amazon Simple Storage Service (Amazon S3) every month. The aggregated output data is created through a series of data preparation steps, and the business team uses the output data to create business intelligence (BI) reports.

Architecture overview

To automate the complete process, we use the following architecture, which integrates Step Functions for orchestration, DataBrew for data preparation, Athena for data analysis with standard SQL, and QuickSight for business reporting. In addition, we use Amazon SNS for sending notifications to users, and EventBridge is integrated to schedule running the Step Functions workflow.

We use Amazon SNS for sending notifications to users, and EventBridge is integrated to schedule running the Step Functions workflow.

The workflow includes the following steps:

Step 1 – The Marketing team uploads the full CSV file to an S3 input bucket every month.
Step 2 – An EventBridge rule, scheduled to run every month, triggers the Step Functions state machine.
Steps 3 and 4 – We receive two separate datasets (sales data and marketing data), so Step Functions triggers two parallel DataBrew jobs, which create additional year, month, and day columns from the existing date field and uses those three columns for partitioning. The jobs write the final output to our S3 output bucket.
Steps 5, 6, 7, 8 – After the output data is written, we can create external tables on top of it with Athena create table statements and then load partitions with MCSK REPAIR commands. After the AWS Glue Data Catalog tables are created for sales and marketing, we run an additional query through Athena, which merges these two tables by year and month to create another table with aggregated output.
Steps 9 and 10 – As the last step of the Step Functions workflow, we send a notification to end-users through Amazon SNS to notify them that the data refresh job completed successfully.
Steps 11, 12, 13 – After the aggregated table data is refreshed, business users can use QuickSight for BI reporting, which fetches data through Athena. Data analysts can also use Athena to analyze the complete refreshed dataset.

Prerequisites

Before beginning this tutorial, make sure you have the required permissions to create the resources required as part of the solution.

Additionally, create the S3 input and output buckets with the required subfolders to capture the sales and marketing data, and upload the input data into their respective folders.

Creating a DataBrew project

To create a DataBrew project for the marketing data, complete the following steps:

On the DataBrew console, choose Projects.
Choose Create a project.
For Project name, enter a name (for this post, marketing-data-etl).
For Select a dataset, select New dataset.

For Select a dataset, select New dataset.

For Enter your source from S3, enter the S3 path of the marketing input CSV.

For Enter your source from S3, enter the S3 path of the marketing input CSV.

Under Permissions, for Role name, choose an AWS Identity and Access Management (IAM) role that allows DataBrew to read from your Amazon S3 input location.

You can choose a role if you already created one, or create a new one. Please read here for steps to create the IAM role.

After the dataset is loaded, on the Functions menu, choose Date functions.
Choose YEAR.

Choose YEAR.

Apply the year function on the date column to create a new column called year.

Repeat these steps to create month and day columns.

Repeat these steps to create month and day columns.

For our use case, we created a few new columns that we plan to use for partitioning, but you can integrate additional transformations as needed.

After you have finished applying all your transformations, choose Publish on the recipe.
Provide a description of the recipe version and choose Publish.

Creating a DataBrew job

Now that our recipe is ready, we can create a job for it, which gets invoked through our Step Functions state machine.

On the DataBrew console, choose Jobs.
Choose Create a job.
For Job name¸ enter a name (for example, marketing-data-etl).

Your recipe is already linked to the job.

Under Job output settings¸ for File type, choose your final storage format (for this post, we choose PARQUET).
For S3 location, enter your final S3 output bucket path.
For Compression, choose the compression type you want to apply (for this post, we choose Snappy).
Under Additional configurations, for Custom partition by column values, choose year, month, and day.
For File output storage, select Replace output files for each job run.

We choose this option because our use case is to do a full refresh.

Under Permissions, for Role name¸ choose your IAM role.
Choose Create job.

We choose this because we don’t want to run it now; we plan to invoke it through Step Functions.

When your marketing job is ready, repeat the same steps for your sales data, using the sales data output file location as needed.

Creating a Step Functions state machine

We’re now ready to create a Step Functions state machine for the complete flow.

On the Step Functions console, choose Create state machine.
For Define state machine¸ select Author with code snippets.
For Type, choose Standard.

For Type, choose Standard.

In the Definition section, Step Functions provides a list of service actions that you can use to automatically generate a code snippet for your state machine’s state. The following screenshot shows that we have options for Athena and DataBrew, among others.

For Generate code snippet, choose AWS Glue DataBrew: Start a job run.

4. For Generate code snippet, choose AWS Glue DataBrew: Start a job run.

For Job name, choose Select job name from a list and choose your DataBrew job.

The JSON snippet appears in the Preview pane.

Select Wait for DataBrew job runs to complete.
Choose Copy to clipboard.

Choose Copy to clipboard.

Integrate the code into the final state machine JSON code:

{
   "Comment":"Monthly Refresh of Sales Marketing Data",
   "StartAt":"Refresh Sales Marketing Data",
   "States":{
      "Refresh Sales Marketing Data":{
         "Type":"Parallel",
         "Branches":[
            {
               "StartAt":"Sales DataBrew ETL Job",
               "States":{
                  "Sales DataBrew ETL Job":{
                     "Type":"Task",
                     "Resource":"arn:aws:states:::databrew:startJobRun.sync",
                     "Parameters":{
                        "Name":"sales-data"
                     },
                     "Next":"Drop Old Sales Table"
                  },
                  "Drop Old Sales Table":{
                     "Type":"Task",
                     "Resource":"arn:aws:states:::athena:startQueryExecution.sync",
                     "Parameters":{
                        "QueryString":"DROP TABLE IF EXISTS sales_data_output",
                        "WorkGroup":"primary",
                        "ResultConfiguration":{
                           "OutputLocation":"s3://<your-aws-athena-query-results-bucket-path>/"
                        }
                     },
                     "Next":"Create Sales Table"
                  },
                  "Create Sales Table":{
                     "Type":"Task",
                     "Resource":"arn:aws:states:::athena:startQueryExecution.sync",
                     "Parameters":{
                        "QueryString":"CREATE EXTERNAL TABLE `sales_data_output`(`date` string, `salesperson` string, `lead_name` string, `segment` string, `region` string, `target_close` string, `forecasted_monthly_revenue` int,   `opportunity_stage` string, `weighted_revenue` int, `closed_opportunity` boolean, `active_opportunity` boolean, `latest_status_entry` boolean) PARTITIONED BY (`year` string,`month` string, `day` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION  's3://<your-bucket-name>/sales-pipeline/transformed/sales/' TBLPROPERTIES ('classification'='parquet', 'compressionType'='none', 'typeOfData'='file')",
                        "WorkGroup":"primary",
                        "ResultConfiguration":{
                           "OutputLocation":"s3://<your-aws-athena-query-results-bucket-path>/"
                        }
                     },
                     "Next":"Load Sales Table Partitions"
                  },
                  "Load Sales Table Partitions":{
                     "Type":"Task",
                     "Resource":"arn:aws:states:::athena:startQueryExecution.sync",
                     "Parameters":{
                        "QueryString":"MSCK REPAIR TABLE sales_data_output",
                        "WorkGroup":"primary",
                        "ResultConfiguration":{
                           "OutputLocation":"s3://<your-aws-athena-query-results-bucket-path>/"
                        }
                     },
                     "End":true
                  }
               }
            },
            {
               "StartAt":"Marketing DataBrew ETL Job",
               "States":{
                  "Marketing DataBrew ETL Job":{
                     "Type":"Task",
                     "Resource":"arn:aws:states:::databrew:startJobRun.sync",
                     "Parameters":{
                        "Name":"marketing-data-etl"
                     },
                     "Next":"Drop Old Marketing Table"
                  },
                  "Drop Old Marketing Table":{
                     "Type":"Task",
                     "Resource":"arn:aws:states:::athena:startQueryExecution.sync",
                     "Parameters":{
                        "QueryString":"DROP TABLE IF EXISTS marketing_data_output",
                        "WorkGroup":"primary",
                        "ResultConfiguration":{
                           "OutputLocation":"s3://<your-aws-athena-query-results-bucket-path>/"
                        }
                     },
                     "Next":"Create Marketing Table"
                  },
                  "Create Marketing Table":{
                     "Type":"Task",
                     "Resource":"arn:aws:states:::athena:startQueryExecution.sync",
                     "Parameters":{
                        "QueryString":"CREATE EXTERNAL TABLE `marketing_data_output`(`date` string, `new_visitors_seo` int, `new_visitors_cpc` int, `new_visitors_social_media` int, `return_visitors` int, `twitter_mentions` int,   `twitter_follower_adds` int, `twitter_followers_cumulative` int, `mailing_list_adds_` int,   `mailing_list_cumulative` int, `website_pageviews` int, `website_visits` int, `website_unique_visits` int,   `mobile_uniques` int, `tablet_uniques` int, `desktop_uniques` int, `free_sign_up` int, `paid_conversion` int, `events` string) PARTITIONED BY (`year` string, `month` string, `day` string) ROW FORMAT SERDE   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION  's3://<your-bucket-name>/sales-pipeline/transformed/marketing/' TBLPROPERTIES ('classification'='parquet', 'compressionType'='none', 'typeOfData'='file')",
                        "WorkGroup":"primary",
                        "ResultConfiguration":{
                           "OutputLocation":"s3://<your-aws-athena-query-results-bucket-path>/"
                        }
                     },
                     "Next":"Load Marketing Table Partitions"
                  },
                  "Load Marketing Table Partitions":{
                     "Type":"Task",
                     "Resource":"arn:aws:states:::athena:startQueryExecution.sync",
                     "Parameters":{
                        "QueryString":"MSCK REPAIR TABLE marketing_data_output",
                        "WorkGroup":"primary",
                        "ResultConfiguration":{
                           "OutputLocation":"s3://<your-aws-athena-query-results-bucket-path>/"
                        }
                     },
                     "End":true
                  }
               }
            }
         ],
         "Next":"Drop Old Summerized Table"
      },
      "Drop Old Summerized Table":{
         "Type":"Task",
         "Resource":"arn:aws:states:::athena:startQueryExecution.sync",
         "Parameters":{
            "QueryString":"DROP TABLE default.sales_marketing_revenue",
            "WorkGroup":"primary",
            "ResultConfiguration":{
               "OutputLocation":"s3://<your-aws-athena-query-results-bucket-path>/"
            }
         },
         "Next":"Create Summerized Output"
      },
      "Create Summerized Output":{
         "Type":"Task",
         "Resource":"arn:aws:states:::athena:startQueryExecution.sync",
         "Parameters":{
            "QueryString":"CREATE TABLE default.sales_marketing_revenue AS SELECT * FROM (SELECT sales.year, sales.month, total_paid_conversion, total_weighted_revenue FROM (SELECT year, month, sum(paid_conversion) as total_paid_conversion FROM default.marketing_data_output group by year, month) sales INNER JOIN (SELECT year, month, sum(weighted_revenue) as total_weighted_revenue FROM default.sales_data_output group by year, month) marketing on sales.year=marketing.year AND sales.month=marketing.month) ORDER BY year DESC, month DESC",
            "WorkGroup":"primary",
            "ResultConfiguration":{
               "OutputLocation":"s3://<your-aws-athena-query-results-bucket-path>/"
            }
         },
         "Next":"Notify Users"
      },
      "Notify Users":{
         "Type":"Task",
         "Resource":"arn:aws:states:::sns:publish",
         "Parameters":{
            "Message":{
               "Input":"Monthly sales marketing data refreshed successfully!"
            },
            "TopicArn":"arn:aws:sns:us-east-1:<account-id>:<sns-topic-name>"
         },
         "End":true
      }
   }
}

The following diagram is the visual representation of the state machine flow. With the Step Functions parallel task type, we created two parallel job runs for the sales and marketing data. When both flows are complete, they join to create an aggregated table in Athena and send an SNS notification to the end-users.

The following diagram is the visual representation of the state machine flow.

Creating an EventBridge scheduling rule

Now let’s integrate EventBridge to schedule the invocation of our Step Functions state machine on the first day of every month.

On the EventBridge console, under Events, choose Rules.
Choose Create a rule.
For Name, enter a name (for example, trigger-step-funcion-rule).
Under Define pattern, select Schedule.
Select Cron expression.
Enter 001** to specify that the job runs on the first day of every month at midnight.

In the Select targets section, for Target, choose Step Functions state machine
For State machine, choose your state machine.

For State machine, choose your state machine.

Now when the step function is being invoked, its run flow looks like the following screenshot, where blue represents the DataBrew jobs currently running.

When the job is complete, all the steps should be green.

You also receive the notification “Monthly sales marketing data refreshed successfully!”

Running an Athena query

Let’s validate the aggregated table output in Athena by running a simple SELECT query. The following screenshot shows the output.

Let’s validate the aggregated table output in Athena by running a simple SELECT query.

Creating reports in QuickSight

Now let’s do our final step of the architecture, which is creating BI reports through QuickSight by connecting to the Athena aggregated table.

On the QuickSight console, choose Athena as your data source.

On the QuickSight console, choose Athena as your data source.

Select the database and table name you have in Athena.

Select the database and table name you have in Athena.

Now you can create a quick report to visualize your output, as shown in the following screenshot.

If QuickSight is using SPICE storage, you need to refresh the dataset in QuickSight after you receive notification about the completion of data refresh. If the QuickSight report is running an Athena query for every request, you might see a “table not found” error when data refresh is in progress. We recommend leveraging SPICE storage to get better performance.

Conclusion

This post explains how to integrate a DataBrew job and Athena queries with Step Functions to implement a simple ETL pipeline that refreshes aggregated sales and marketing data for BI reporting.

I hope this gives you a great starting point for using this solution with your datasets and applying business rules to build a complete serverless data analytics pipeline.

About the Author

Sakti Mishra

Sakti Mishra is a Senior Data Lab Solution Architect at AWS, where he helps customers architect data analytics solutions, which gives them an accelerated path towards modernization initiatives. Outside of work, Sakti enjoys learning new technologies, watching movies, and visiting places.

GitHub Availability Report: December 2020

2021-01-06 Keith Ballinger

Post Syndicated from Keith Ballinger original https://github.blog/2021-01-06-github-availability-report-december-2020/

Introduction

In December, we experienced no incidents resulting in service downtime. This month’s GitHub Availability Report will provide a summary and follow-up details on how we addressed an incident mentioned in November’s report.

Follow-up to November 27 16:04 UTC (lasting one hour and one minute)

Upon further investigation around one of the incidents mentioned in November’s Availability Report, we discovered an edge case that triggered a large number of GitHub App token requests. This caused abnormal levels of replication lag within one of our MySQL clusters, specifically affecting the GitHub Actions service. This particular scenario resulted in amplified queries and increased the database lag, which impacted the database nodes that process GitHub App token requests.

When a GitHub Action is invoked, the Action is passed a GitHub App token to perform tasks on GitHub. In this case, the database lag resulted in the failure of some of those token requests because the database replicas did not have up to date information.

To help avoid this class of failure, we are updating the queries to prevent large quantities of token requests from overloading the database servers in the future.

In summary

Whether we’re introducing a system to manage flaky tests or improving our CI workflow, we’ve continued to invest in our engineering systems and overall reliability. To learn more about what we’re working on, visit GitHub’s engineering blog.

Field Notes: Comparing Algorithm Performance Using MLOps and the AWS Cloud Development Kit

2021-01-06 Moataz Gaber

Post Syndicated from Moataz Gaber original https://aws.amazon.com/blogs/architecture/field-notes-comparing-algorithm-performance-using-mlops-and-the-aws-cloud-development-kit/

Comparing machine learning algorithm performance is fundamental for machine learning practitioners, and data scientists. The goal is to evaluate the appropriate algorithm to implement for a known business problem.

Machine learning performance is often correlated to the usefulness of the model deployed. Improving the performance of the model typically results in an increased accuracy of the prediction. Model accuracy is a key performance indicator (KPI) for businesses when evaluating production readiness and identifying the appropriate algorithm to select earlier in model development. Organizations benefit from reduced project expenses, accelerated project timelines and improved customer experience. Nevertheless, some organizations have not introduced a model comparison process into their workflow which negatively impacts cost and productivity.

In this blog post, I describe how you can compare machine learning algorithms using Machine Learning Operations (MLOps). You will learn how to create an MLOps pipeline for comparing machine learning algorithms performance using AWS Step Functions, AWS Cloud Development Kit (CDK) and Amazon SageMaker.

First, I explain the use case that will be addressed through this post. Then, I explain the design considerations for the solution. Finally, I provide access to a GitHub repository which includes all the necessary steps for you to replicate the solution I have described, in your own AWS account.

Understanding the Use Case

Machine learning has many potential uses and quite often the same use case is being addressed by different machine learning algorithms. Let’s take Amazon Sagemaker built-in algorithms. As an example, if you are having a “Regression” use case, it can be addressed using (Linear Learner, XGBoost and KNN) algorithms. Another example for a “Classification” use case you can use algorithm such as (XGBoost, KNN, Factorization Machines and Linear Learner). Similarly for “Anomaly Detection” there are (Random Cut Forests and IP Insights).

In this post, it is a “Regression” use case to identify the age of the abalone which can be calculated based on the number of rings on its shell (age equals to number of rings plus 1.5). Usually the number of rings are counted through microscopes examinations.

I use the abalone dataset in libsvm format which contains 9 fields [‘Rings’, ‘Sex’, ‘Length’,’ Diameter’, ‘Height’,’ Whole Weight’,’ Shucked Weight’,’ Viscera Weight’ and ‘Shell Weight’] respectively.

The features starting from Sex to Shell Weight are physical measurements that can be measured using the correct tools. Therefore, using the machine learning algorithms (Linear Learner and XGBoost) to address this use case, the complexity of having to examine the abalone under microscopes to understand its age can be improved.

Benefits of the AWS Cloud Development Kit (AWS CDK)

The AWS Cloud Development Kit (AWS CDK) is an open source software development framework to define your cloud application resources.

The AWS CDK uses the jsii which is an interface developed by AWS that allows code in any language to naturally interact with JavaScript classes. It is the technology that enables the AWS Cloud Development Kit to deliver polyglot libraries from a single codebase.

This means that you can use the CDK and define your cloud application resources in typescript language for example. Then by compiling your source module using jsii, you can package it as modules in one of the supported target languages (e.g: Javascript, python, Java and .Net). So if your developers or customers prefer any of those languages, you can easily package and export the code to their preferred choice.

Also, the cdk tf provides constructs for defining Terraform HCL state files and the cdk8s enables you to use constructs for defining kubernetes configuration in TypeScript, Python, and Java. So by using the CDK you have a faster development process and easier cloud onboarding. It makes your cloud resources more flexible for sharing.

Prerequisites

An AWS account
Install git
Install python 3.x
Install CDK
Understanding of the machine learning development CI/CD cycle of building, training and deploying a Machine Learning Model with Amazon SageMaker.

Overview of solution

This architecture serves as an example of how you can build a MLOps pipeline that orchestrates the comparison of results between the predictions of two algorithms.

The solution uses a completely serverless environment so you don’t have to worry about managing the infrastructure. It also deletes resources not needed after collecting the predictions results, so as not to incur any additional costs.

Figure 1: Solution Architecture

Walkthrough

In the preceding diagram, the serverless MLOps pipeline is deployed using AWS Step Functions workflow. The architecture contains the following steps:

The dataset is uploaded to the Amazon S3 cloud storage under the /Inputs directory (prefix).
The uploaded file triggers AWS Lambda using an Amazon S3 notification event.
The Lambda function then will initiate the MLOps pipeline built using a Step Functions state machine.
The starting lambda will start by collecting the region corresponding training images URIs for both Linear Learner and XGBoost algorithms. These are used in training both algorithms over the dataset. It will also get the Amazon SageMaker Spark Container Image which is used for running the SageMaker processing Job.
The dataset is in libsvm format which is accepted by the XGBoost algorithm as per the Input/Output Interface for the XGBoost Algorithm. However, this is not supported by the Linear Learner Algorithm as per Input/Output interface for the linear learner algorithm. So we need to run a processing job using Amazon SageMaker Data Processing with Apache Spark. The processing job will transform the data from libsvm to csv and will divide the dataset into train, validation and test datasets. The output of the processing job will be stored under /Xgboost and /Linear directories (prefixes).

Figure 2: Train, validation and test samples extracted from dataset

6. Then the workflow of Step Functions will perform the following steps in parallel:

- Train both algorithms.
- Create models out of trained algorithms.
- Create endpoints configurations and deploy predictions endpoints for both models.
- Invoke lambda function to describe the status of the deployed endpoints and wait until the endpoints become in “InService”.
- Invoke lambda function to perform 3 live predictions using boto3 and the “test” samples taken from the dataset to calculate the average accuracy of each model.
- Invoke lambda function to delete deployed endpoints not to incur any additional charges.

7. Finally, a Lambda function will be invoked to determine which model has better accuracy in predicting the values.

The following shows a diagram of the workflow of the Step Functions:

Figure 3: AWS Step Functions workflow graph

The code to provision this solution along with step by step instructions can be found at this GitHub repo.

Results and Next Steps

After waiting for the complete execution of step functions workflow, the results are depicted in the following diagram:

Figure 4: Comparison results

This doesn’t necessarily mean that the XGBoost algorithm will always be the better performing algorithm. It just means that the performance was the result of these factors:

the hyperparameters configured for each algorithm
the number of epochs performed
the amount of dataset samples used for training

To make sure that you are getting better results from the models, you can run hyperparameters tuning jobs which will run many training jobs on your dataset using the algorithms and ranges of hyperparameters that you specify. This helps you allocate which set of hyperparameters which are giving better results.

Finally, you can use this comparison to determine which algorithm is best suited for your production environment. Then you can configure your step functions workflow to update the configuration of the production endpoint with the better performing algorithm.

Figure 5: Update production endpoint workflow

Conclusion

This post showed you how to create a repeatable, automated pipeline to deliver the better performing algorithm to your production predictions endpoint. This helps increase the productivity and reduce the time of manual comparison. You also learned to provision the solution using AWS CDK and to perform regular cleaning of deployed resources to drive down business costs. If this post helps you or inspires you to solve a problem, share your thoughts and questions in the comments. You can use and extend the code on the GitHub repo.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers

re:Invent – New security sessions launching soon

2021-01-06 Marta Taggart

Post Syndicated from Marta Taggart original https://aws.amazon.com/blogs/security/reinvent-new-security-sessions-launching-soon/

Where did the last month go? Were you able to catch all of the sessions in the Security, Identity, and Compliance track you hoped to see at AWS re:Invent? If you missed any, don’t worry—you can stream all the sessions released in 2020 via the AWS re:Invent website. Additionally, we’re starting 2021 with all new sessions that you can stream live January 12–15. Here are the new Security, Identity, and Compliance sessions—each session is offered at multiple times, so you can find the time that works best for your location and schedule.

Protecting sensitive data with Amazon Macie and Amazon GuardDuty – SEC210
Himanshu Verma, AWS Speaker

Tuesday, January 12 – 11:00 AM to 11:30 AM PST
Tuesday, January 12 – 7:00 PM to 7:30 PM PST
Wednesday, January 13 – 3:00 AM to 3:30 AM PST

As organizations manage growing volumes of data, identifying and protecting your sensitive data can become increasingly complex, expensive, and time-consuming. In this session, learn how Amazon Macie and Amazon GuardDuty together provide protection for your data stored in Amazon S3. Amazon Macie automates the discovery of sensitive data at scale and lowers the cost of protecting your data. Amazon GuardDuty continuously monitors and profiles S3 data access events and configurations to detect suspicious activities. Come learn about these security services and how to best use them for protecting data in your environment.

BBC: Driving security best practices in a decentralized organization – SEC211
Apurv Awasthi, AWS Speaker
Andrew Carlson, Sr. Software Engineer – BBC

Tuesday, January 12 – 1:15 PM to 1:45 PM PST
Tuesday, January 12 – 9:15 PM to 9:45 PM PST
Wednesday, January 13 – 5:15 AM to 5:45 AM PST

In this session, Andrew Carlson, engineer at BBC, talks about BBC’s journey while adopting AWS Secrets Manager for lifecycle management of its arbitrary credentials such as database passwords, API keys, and third-party keys. He provides insight on BBC’s secrets management best practices and how the company drives these at enterprise scale in a decentralized environment that has a highly visible scope of impact.

Get ahead of the curve with DDoS Response Team escalations – SEC321
Fola Bolodeoku, AWS Speaker

Tuesday, January 12 – 3:30 PM to 4:00 PM PST
Tuesday, January 12 – 11:30 PM to 12:00 AM PST
Wednesday, January – 7:30 AM to 8:00 AM PST

This session identifies tools and tricks that you can use to prepare for application security escalations, with lessons learned provided by the AWS DDoS Response Team. You learn how AWS customers have used different AWS offerings to protect their applications, including network access control lists, security groups, and AWS WAF. You also learn how to avoid common misconfigurations and mishaps observed by the DDoS Response Team, and you discover simple yet effective actions that you can take to better protect your applications’ availability and security controls.

Network security for serverless workloads – SEC322
Alex Tomic, AWS Speaker

Thursday, January 14 -1:30 PM to 2:00 PM PST
Thursday, January 14 – 9:30 PM to 10:00 PM PST
Friday, January 15 – 5:30 AM to 6:00 AM PST

Are you building a serverless application using services like Amazon API Gateway, AWS Lambda, Amazon DynamoDB, Amazon Aurora, and Amazon SQS? Would you like to apply enterprise network security to these AWS services? This session covers how network security concepts like encryption, firewalls, and traffic monitoring can be applied to a well-architected AWS serverless architecture.

Building your cloud incident response program – SEC323
Freddy Kasprzykowski, AWS Speaker

Wednesday, January 13 – 9:00 AM to 9:30 AM PST
Wednesday, January 13 – 5:00 PM to 5:30 PM PST
Thursday, January 14 – 1:00 AM to 1:30 AM PST

You’ve configured your detection services and now you’ve received your first alert. This session provides patterns that help you understand what capabilities you need to build and run an effective incident response program in the cloud. It includes a review of some logs to see what they tell you and a discussion of tools to analyze those logs. You learn how to make sure that your team has the right access, how automation can help, and which incident response frameworks can guide you.

Beyond authentication: Guide to secure Amazon Cognito applications – SEC324
Mahmoud Matouk, AWS Speaker

Wednesday, January 13 – 2:15 PM to 2:45 PM PST
Wednesday, January 13 – 10:15 PM to 10:45 PM PST
Thursday, January 14 – 6:15 AM to 6:45 AM PST

Amazon Cognito is a flexible user directory that can meet the needs of a number of customer identity management use cases. Web and mobile applications can integrate with Amazon Cognito in minutes to offer user authentication and get standard tokens to be used in token-based authorization scenarios. This session covers best practices that you can implement in your application to secure and protect tokens. You also learn about new Amazon Cognito features that give you more options to improve the security and availability of your application.

Event-driven data security using Amazon Macie – SEC325
Neha Joshi, AWS Speaker

Thursday, January 14 – 8:00 AM to 8:30 AM PST
Thursday, January 14 – 4:00 PM to 4:30 PM PST
Friday, January 15 – 12:00 AM to 12:30 AM PST

Amazon Macie sensitive data discovery jobs for Amazon S3 buckets help you discover sensitive data such as personally identifiable information (PII), financial information, account credentials, and workload-specific sensitive information. In this session, you learn about an automated approach to discover sensitive information whenever changes are made to the objects in your S3 buckets.

Instance containment techniques for effective incident response – SEC327
Jonathon Poling, AWS Speaker

Thursday, January 14 – 10:15 AM to 10:45 AM PST
Thursday, January 14 – 6:15 PM to 6:45 PM PST
Friday, January 15 – 2:15 AM to 2:45 AM PST

In this session, learn about several instance containment and isolation techniques, ranging from simple and effective to more complex and powerful, that leverage native AWS networking services and account configuration techniques. If an incident happens, you may have questions like “How do we isolate the system while preserving all the valuable artifacts?” and “What options do we even have?”. These are valid questions, but there are more important ones to discuss amidst a (possible) incident. Join this session to learn highly effective instance containment techniques in a crawl-walk-run approach that also facilitates preservation and collection of valuable artifacts and intelligence.

Trusted connects for government workloads – SEC402
Brad Dispensa, AWS Speaker

Wednesday, January 13 – 11:15 AM to 11:45 AM PST
Wednesday, January 13 – 7:15 PM to 7:45 PM PST
Thursday, January 14 – 3:15 AM to 3:45 AM PST

Cloud adoption across the public sector is making it easier to provide government workforces with seamless access to applications and data. With this move to the cloud, we also need updated security guidance to ensure public-sector data remain secure. For example, the TIC (Trusted Internet Connections) initiative has been a requirement for US federal agencies for some time. The recent TIC-3 moves from prescriptive guidance to an outcomes-based model. This session walks you through how to leverage AWS features to better protect public-sector data using TIC-3 and the National Institute of Standards and Technology (NIST) Cybersecurity Framework (CSF). Also, learn how this might map into other geographies.

I look forward to seeing you in these sessions. Please see the re:Invent agenda for more details and to build your schedule.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Unwinding the attack timeline

The contingency plan

Onboarding to Cloudflare

Recommendations for organizations

NEVER MISS A BLOG

Cloud managed services to consider for a completely remote or mostly remote workforce

Antivirus

Anti-malware and endpoint detection and response (EDR)

Vulnerability management

Asset management

Patch and software deployment

Data loss prevention

Future considerations

Explore Our Cloud-Based Security Solutions Today

Contents

Aranet wireless sensor network

Aranet Example Use Cases

Aranet for retail

Rimi

Benefits

Aranet for indoor air quality and COVID19 safety

Aranet4 PRO – a wireless COVID19 safety network

Partnership with Aranet

Questions & Answers

Get your copy of Wireframe issue 46

Use case overview

Architecture overview

Prerequisites

Creating a DataBrew project

Creating a DataBrew job

Creating a Step Functions state machine

Creating an EventBridge scheduling rule

Running an Athena query

Creating reports in QuickSight

Conclusion

About the Author

Introduction

Follow-up to November 27 16:04 UTC (lasting one hour and one minute)

In summary

Prerequisites

Overview of solution

Walkthrough

Results and Next Steps

Conclusion

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers

The collective thoughts of the interwebz