Gigabyte ME33-AR0 AMD EPYC 8004 Motherboard Review

Post Syndicated from Eric Smith original https://www.servethehome.com/giga-computing-gigabyte-me33-ar0-amd-epyc-8004-motherboard-review/

We review the Gigabyte ME33-AR0. This AMD EPYC 8004 motherboard is the third in the company’s single socket EATX motherboard line we have seen

The post Gigabyte ME33-AR0 AMD EPYC 8004 Motherboard Review appeared first on ServeTheHome.

[$] Smuggling email inside of email

Post Syndicated from jake original https://lwn.net/Articles/956533/

Normally, when a new vulnerability is discovered and releases are
coordinated with those affected, the announcement is done at
a convenient time—not generally right before the end-of-year holidays, for
example. The SMTP
Smuggling vulnerability
has taken a different path, however, with its
announcement landing on December 18. That may well have been
unpleasant for some administrators that had not yet updated, but it was
particularly problematic for
some projects that had not been made aware of the vulnerability at
all—though it was known to affect several open-source mailers.

Best Practices for Prompt Engineering with Amazon CodeWhisperer

Post Syndicated from Brendan Jenkins original https://aws.amazon.com/blogs/devops/best-practices-for-prompt-engineering-with-amazon-codewhisperer/

Generative AI coding tools are changing the way developers accomplish day-to-day development tasks. From generating functions to creating unit tests, these tools have helped customers accelerate software development. Amazon CodeWhisperer is an AI-powered productivity tools for the IDE and command line that helps improve developer productivity by providing code recommendations based on developers’ natural language comments and surrounding code. With CodeWhisperer, developers can simply write a comment that outlines a specific task in plain English, such as “create a lambda function to upload a file to S3.”

When writing these input prompts to CodeWhisperer like the natural language comments, one important concept is prompt engineering. Prompt engineering is the process of refining interactions with large language models (LLMs) in order to better refine the output of the model. In this case, we want to refine our prompts provided to CodeWhisperer to produce better code output.

In this post, we’ll explore how to take advantage of CodeWhisperer’s capabilities through effective prompt engineering in Python. A well-crafted prompt lets you tap into the tool’s full potential to boost your productivity and help generate the correct code for your use case. We’ll cover prompt engineering best practices like writing clear, specific prompts and providing helpful context and examples. We’ll also discuss how to iteratively refine prompts to produce better results.

Prompt Engineering with CodeWhisperer

We will demonstrate the following best practices when it comes to prompt engineering with CodeWhisperer.

  • Keep your prompt specific and concise
  • Additional context in prompts
  • Utilizing multiple comments
  • Context taken from comments and code
  • Generating unit tests with cross file context
  • Prompts with cross file context

Prerequisites

The following prerequisites are required to experiment locally:

CodeWhisperer User Actions

Reference the following user actions documentation for CodeWhisperer user actions according to your IDE. In this documentation, you will see how to accept a recommendation, cycle through recommendation options, reject a recommendation, and manually trigger CodeWhisperer.

Keep prompts specific & concise

In this section, we will cover keeping your prompt specific and concise. When crafting prompts for CodeWhisperer, conciseness while maintaining objectives in your prompt is important.  Overly complex prompts lead to poor results. A good prompt contains just enough information to convey the request clearly and concisely. For example, if you prompt CodeWhisperer “create a function that eliminates duplicates lines in a text file”. This is an example of a specific and concise prompt. On the other hand, a prompt such as “create a function to look for lines of code that are seen multiple times throughout the file and delete them” may be unclear and overly wordy. In summary, focused, straightforward prompts helps CodeWhisperer understand exactly what you want and provide better outputs.

In this example, we would like to write a function in Python that will open a CSV file and store the contents into a dictionary. We will use the following simple and concise prompt that will guide CodeWhisperer to generate recommendations. Please use the left/right arrow key to cycle through the various recommendations before you hit tab to accept the recommendation.

Example 1:

Sample comment:

#load the csv file content in a dictionary

Sample solution:

#load the csv file content in a dictionary
import csv
def csv_to_dict(csv_file):
    with open(csv_file, 'r') as f:
        reader = csv.DictReader(f)
        return list(reader)

Simple and concise prompts are crucial in prompt engineering because they help CodeWhisperer understand the key information without confusion from extraneous details. Simplicity and brevity enable faster iteration and allow prompts to maximize impact within character limits.

Additional context in prompts

In this section, we will cover how additional context can aid in prompt engineering. While specific and concise prompts are crucial, some additional context can aid CodeWhisperer comprehension. Concrete examples also guide CodeWhisperer if it struggles to infer expectations from just a brief prompt.

In this example, we would like to add additional context to Example 1 where we stored the CSV file content into a dictionary. Now, we have additional requirements to store the csv file content in alphabetical order and return the list keys from the dictionary. Take a look at the sample prompt below. Judicious context helps CodeWhisperer to produce higher-quality, tailored results.

Example 2:

Sample comment:

#load the csv file content in a dictionary in alphabetical order and return the list of keys

Sample solution:

#load the csv file content in a dictionary in alphabetical order and return the list of keys
import csv
def csv_to_dict(file_name):
    def read_csv_file(file_name):
    with open(file_name, 'r') as csv_file:
        csv_reader = csv.DictReader(csv_file)
        csv_dict = {}
        for row in csv_reader:
            csv_dict[row['name']] = row
            return csv_dict

Providing additional context through background details and examples can be beneficial when crafting prompts for CodeWhisperer, as long as the extra information adds useful clarity rather than obscuring the core request. The right balance of brevity and pointed contextual signals can help CodeWhisperer generate more tailored, high-quality results.

Utilizing multiple comments

In this section, we will cover how multiple comments can be a useful technique in prompt engineering. When used strategically, multiple comments allow prompt engineers to offer more context without sacrificing brevity or cluttering the prompt.

Say we would like to open a CSV file and return the list of lines in alphabetical order, remove duplicate lines, and insert a period at the end of each line from the CSV file. Take a look at the sample CodeWhisperer prompt below. Notice how you can break up multiple requirements into separate comments.

Example 3:

Sample comment:

#open a csv file and return a list of lines in alphabetical order
#Remove duplicate lines
#Insert a period at the end of each line

Sample solution:

#open a csv file and return a list of lines in alphabetical order
#Remove duplicate lines
#Insert a period at the end of each line
def open_csv(filename):
    with open(filename) as f:
        lines = f.readlines()
        lines = list(set(lines))
        lines = sorted(lines)
        for i in range(len(lines)):
            lines[i] = lines[i].rstrip() + '.'
    return lines

Multiple comments allow prompt engineers to add extended context and guidance for CodeWhisperer while keeping prompts succinct.

Context taken from comments and code

In this section, we will cover how CodeWhisperer’s context goes beyond just your comment and also looks at the surrounding code, including other functions, imports, and more. This broader context helps guide CodeWhisperer towards implementing the use case you intend with your comment.

We will now see how additional code in our project affects the responses. This time around, we will import the Pandas library to see how it effects our recommendation as compared to the previous section.

Example 4:

Sample Comment:

import pandas as pd
#open a csv file and return a list of lines in alphabetical order
#Insert a period at the end of each line
#Replace duplicate lines with a single line

Sample solution:

import pandas as pd
#open a csv file and return a list of lines in alphabetical order
#Insert a period at the end of each line
#Replace duplicate lines with a single line
def open_csv(filename):
    df = pd.read_csv(filename)
    df = df.sort_values(by='line')
    df = df.drop_duplicates(subset='line')
    df['line'] = df['line'] + '.'
    return df['line'].tolist()

By seeing Pandas imported, CodeWhisperer understands our intent is likely to leverage it in the solution. This allows it to provide a more relevant recommendation using Pandas functions like read_csv(), sort_values(), and drop_duplicates().

Overall, surrounding code context gives CodeWhisperer additional clues for the implementation you have in mind with your high-level instructions.

Prompts with cross file context

In the previous section, we saw how we were able to utilize the surrounding code that CodeWhisperer takes in as context to generate a function for our use case. In this section, we will use that same concept to generate unit tests for the function we’ve built utilizing CodeWhisperers cross file context capability. This section will demonstrate how we can use cross file context with our prompts for use cases like test driven development.

In this example, we will prompt CodeWhisperer to write a comment referencing the open_csv function in order to write unit tests. In this case, there will be a new python file in the project directory for unit testing. We will assume the file we have been working out of is “example4.py”, and the new file is “unittest.py”. We will incorporate the following comments in “unittest.py”.

Example 5:

Sample Comment in unittest.py:

#create unit tests for the open_csv function from example4.py file

Sample solution in unittest.py:

#create unit tests for the open_csv function from example4.py file
class TestOpenCsv(unittest.TestCase):
   def test_open_csv(self):
       self.assertEqual(open_csv('example4.csv'), ['a.', 'b.', 'c.'])
       self.assertEqual(open_csv('example4.csv'), ['a.', 'b.', 'c.', 'd.'])

Notice how CodeWhisperer is using context from one file to generate code recommendation in another file. By specifying the open_csv function within the comment of the “unittest.py”, CodeWhisperer was able to analyze the function, understand its purpose and interface, and generate a set of basic unit tests to validate it. With our prompts, we were able to utilize CodeWhisperers cross file context to help us generate unit tests.

Chain of thought prompting

Chain of thought prompting is a prompt engineering technique that allows large language models (LLMs) to have more natural, contextual outputs by linking multiple prompts together to solve a complex problem. In regard to CodeWhisperer, we can use this technique to break a very complex coding task down into smaller steps, allowing CodeWhisperer to provide more accurate suggestions to the use case.

In this example, we will examine a code snippet generated by CodeWhisperer that contains multiple comments. Our intention is to prompt the user to input a filename, ensure it’s a csv and alphanumeric, then process it with the logging.info method. We will analyze the output using both the chain of thought approach and a single comment.

Example 6:

Chain of thought prompt approach:

Comment 1 – “Take a user’s input using the input() function and store it in a variable called filename”

import logging
# Take a users input using the input() function and store it in a variable called filename
filename = input("Enter the name of the file you want to read: ")

Comment 2 – create a function that will take a filename as an input

import logging
# Take a users input using the input() function and store it in a variable called filename
filename = input("Enter the name of the file you want to read: ")

#create a function that will take a filename as an input
def open_file(filename):

Comment 3 – validate the input using the isalnum() method and ensure the file ends in .csv then process the file using logging.info()

import logging
# Take a users input using the input() function and store it in a variable called filename
filename = input("Enter the name of the file you want to read: ")

#create a function that will take a filename as an input
def open_file(filename):
    #validate the input using the isalnum() method and ensure the file ends in .csv then process the file using logging.info() 
    if filename.isalnum() and filename.endswith('.csv'):
        lines = open_csv(filename)
        logging.info(lines)
        return lines
    else:
        print('Invalid file name')
        return None

Single prompt approach:

import logging
'''
Using the input() function and store it in a variable called filename and create a function 
that will validate the input using the isalnum() method and ensure the file ends in .csv then process 
the file accordingly. 
'''
def validate_file(filename):
    if filename.isalnum() and filename.endswith('.csv'):
        return True
    else:
        return False

When analyzing these side-by-side, we see that with the chain of thought prompt approach, we used multiple comments to allow CodeWhisperer to implement all our requirements including the user input, input validation, .csv verification, and logging as we broke it down into steps for CodeWhisperer to implement. On the other hand, in the case where we had a single comment implementing multiple requirements, it didn’t take all the requirements into account for this slightly more complex problem.

In conclusion, chain of thought prompting allows large language models like CodeWhisperer to produce more accurate code pertaining to the use case by breaking down complex problems into logical steps. Guiding the model through comments and prompts helps it focus on each part of the task sequentially. This results in code that is more accurate to the desired functionality compared to a single broad prompt.

Conclusion

Effective prompt engineering is key to getting the most out of powerful AI coding assistants like Amazon CodeWhisperer. Following prompt best practices, we’ve covered like using clear language, providing context, and iteratively refining prompts can help CodeWhisperer generate high-quality code tailored to your specific needs. Analyzing all the code options CodeWhisperer provides you flexibility to select the optimal approach.

About the authors:

Brendan Jenkins

Brendan Jenkins is a Solutions Architect at Amazon Web Services (AWS) working with Enterprise AWS customers providing them with technical guidance and helping achieve their business goals. He has an area of specialization in DevOps and Machine Learning technology.

Riya Dani

Riya Dani is a Solutions Architect at Amazon Web Services (AWS), responsible for helping Enterprise customers on their journey in the cloud. She has a passion for learning and holds a Bachelor’s and Master’s degree from Virginia Tech in Computer Science with focus in Deep Learning. In her free time, she enjoys staying active and reading.

Genie Aladdin Connect Retrofit Garage Door Opener: Multiple Vulnerabilities

Post Syndicated from Deral Heiland original https://blog.rapid7.com/2024/01/03/genie-aladdin-connect-retrofit-garage-door-opener-multiple-vulnerabilities/

Genie Aladdin Connect Retrofit Garage Door Opener: Multiple Vulnerabilities

Rapid7, Inc. (Rapid7) discovered vulnerabilities in Aladdin Connect retrofit kit garage door opener and Android mobile application produced by Genie. The affected products are:

  • Aladdin Garage door smart retrofit kit, Model ALDCM
  • Android Mobile application ALADDIN Connect, Version 5.65 Build 2075

Rapid7 initially reported these issues to Overhead Door — the parent company of The Genie Company — on August 22nd 2023. Since then, members of our research team have worked alongside the vendor to discuss the impact, resolution, and a coordinated response for these vulnerabilities.

Product description

The Aladdin Connect garage door opener (Retrofit-kit) is a smart IoT solution which allows standard electric garage doors to be upgraded to support smart technology for remote access and use of mobile applications for opening and closing of the garage door.

Credit

The vulnerabilities in Genie Aladdin Connect retrofit garage door opener and mobile application were discovered by Deral Heiland, Principal IoT Researcher at Rapid7. They are being disclosed in accordance with Rapid7’s vulnerability disclosure policy after coordination with the vendor.

Vendor statement

Trusted for generations by millions of homeowners, The Genie Company is committed to security, and we collaborate with valued researchers, such as Rapid7, to respond to and resolve vulnerabilities on behalf of our customers.

Exploitation and remediation

This section details the potential for exploitation and our remediation guidance for the issues discovered and reported by Rapid7, so that defenders of this technology can gauge the impact of, and mitigations around, these issues appropriately.

Android Application Insecure Storage (CVE-2023-5879) – FIXED

While examining the Android mobile application, Aladdin Connect, for general security issues, Rapid7 found that the user’s password was stored in clear text in the following file:

  • /data/data/com.geniecompany.AladdinConnect/shared_prefs/com.genie.gdocntl.MainActivity.xml

The persistence of this data was tested by logging out and rebooting the device. Typically logging out and rebooting a mobile device leads to the data being purged from the device. In this case neither the file, nor its contents, were purged. Figure 2 is copy of file content after logout and reboot:

Genie Aladdin Connect Retrofit Garage Door Opener: Multiple Vulnerabilities
Figure 2: Clear text Stored User Credentials

Exploitation

An attacker with physical access to the user’s smartphone (i.e., via a lost or stolen phone), would be able to potentially extract this critical data, allowing access to the user’s service account to control the garage door opener.

Remediation

To mitigate this vulnerability, users should set a password pin code on the mobile devices to restrict access.

Additional Note from Vendor

This vulnerability is tied to the biometric capability (touch or face recognition).

Mitigation: Update to the latest app upgrade available in the play store. App version v5.73

Cross-site Scripting (XSS) injected into Aladdin Connect garage door opener (Retrofit-Kit) configuration setup web server console via broadcast SSID name (CVE-2023-5880)

When the Aladdin connect device is placed into Wi-Fi configuration mode, the user web interface used for configuring the device is vulnerable to XSS injection via broadcast SSID names containing HTML and or JavaScript.

Exploitation

This XSS attack via SSID injection method can be done by running a software-based Wi-Fi access point to broadcast HTML or JavaScript as the SSID name such as:

  • </script><svg onload=alert(1)>

An example of this is shown in Figure 3, using airbase-ng to broadcast the HTML and or JavaScript code:

Genie Aladdin Connect Retrofit Garage Door Opener: Multiple Vulnerabilities
Figure 3: SSID Name Injection Method

In the example found in Figure 4, a simple alert box is triggered on the Aladdin base unit Wi-Fi configuration webpage from the above SSID name. Also, the image on the right of Figure 4 shows the actual web page source delivered to the end user. No user interaction is needed to trigger this, they only need to view the web page during configuration mode.

Genie Aladdin Connect Retrofit Garage Door Opener: Multiple Vulnerabilities
Figure 4: XSS Injection using SSID Injection Method

Also, a denial of service (DoS) of the Wi-Fi configuration page can be accomplished by just broadcasting an SSID containing </script> preventing the web page from being used to configure the device’s setup. This corrupted web page is shown in Figure 5:

Genie Aladdin Connect Retrofit Garage Door Opener: Multiple Vulnerabilities
Figure 5: Corrupted Wi-Fi Configuration Page

Remediation

To mitigate this vulnerability, users should avoid running setup if any oddly named SSIDs are being broadcast in the general vicinity, such as SSIDs containing HTML markup language and/or JavaScript code in their names.

Also, in general the mobile application can be used to set up and configure the Garage Door opener. This will avoid any direct interaction with the vulnerable  “ Garage Door Control Setup” configuration page.

Additional Notes from the Vendor

This is a very low-impact vulnerability with minimal risk. This can only occur when the owner places the device in the wifi configuration mode for a limited period and the intruder operates within the 2.4 GHz band distance range during that limited configuration period.  The device will not be impacted by the misconfiguration if that were to occur and it is fully capable of recovering from misconfiguration. The device cannot be operated with a misconfigured SSID as the device can only be claimed by the owner using the mobile app. There is no vulnerability in the mobile app which is the approved mode of device provisioning.

Mitigation: Use mobile app to configure the device.

Unauthenticated access allowed to web interface for “Garage Door Control Module Setup” page (CVE-2023-5881) – FIXED

This vulnerability allows a user with network access to connect to the Aladdin Connect device web server’s “Garage Door Control Module Setup” web page and alter the Garage doors connected WIFI SSID settings without authenticating.

Exploitation

The device allows unauthenticated access to Garage Door Control Module Setup configuration page on TCP Port 80, This allows anyone with network access to reconfigure the Wi-Fi settings without being challenged to authenticate. A sample of this access to the configuration web page is shown in Figure 6:

Genie Aladdin Connect Retrofit Garage Door Opener: Multiple Vulnerabilities
Figure 6: Unauthenticated Configuration Services Access Port 80

Remediation

To prevent exploitation, users should only attach the Aladdin Garage door smart retrofit kit to a network they own and control. Also, access to this network should not be allowed from any other network source such as the Internet.

Additional Notes from the Vendor

This is a very low-impact vulnerability with minimal risk. This can only occur when the intruder has access to the same local network as the retrofit kit (use the same network router), so the attack vector is limited to local. This web interface is not accessible from the internet. The device cannot be operated with a misconfigured SSID, as the device can only be claimed using the mobile app that an owner would use.

Mitigation: Update the Retrofit device to the latest software version, 14.1.1. Fix was automatically updated on all online devices as of December 2023. Please reach out to customer service to confirm if your device has the update.

Authenticated user access to others users data via service API – FIXED

An authenticated user can gain unauthorized access to other users’ data by querying the following API using a different device ID than their own.

  • https://pxdqkls7aj.execute-api.us-east-1.amazonaws.com/Android/devices/879267

Here are sample fields that are potentially viewable data using this method:

Genie Aladdin Connect Retrofit Garage Door Opener: Multiple Vulnerabilities
Genie Aladdin Connect Retrofit Garage Door Opener: Multiple Vulnerabilities
Figure 1: Enumeration of User Data

Additional Notes from the Vendor

This was resolved immediately after our internal penetration testing detected the issue. This happened because of a recent software update. The fix was applied to the API on 07/25/2023.

Mitigation: None

The Drive Stats of Backblaze Storage Pods

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/the-drive-stats-of-backblaze-storage-pods/

A decorative image showing the Backblaze logo on a cloud over a pattern representing a network.

Since 2009, Backblaze has written extensively about the data storage servers we created and deployed which we call Backblaze Storage Pods. We not only wrote about our Storage Pods, we open sourced the design, published a parts list, and even provided instructions on how to build one. Many people did. Of the six storage pod versions we produced, four of them are still in operation in our data centers today. Over the last few years, we began using storage servers from Dell and, more recently, Supermicro, as they have proven to be economically and operationally viable in our environment. 

Since 2013, we have also written extensively about our Drive Stats, sharing reports on the failure rates of the HDDs and SSDs in our legion of storage servers. We have examined the drive failure rates by manufacturer, size, age, and so on, but we have never analyzed the drive failure rates of the storage servers—until now. Let’s take a look at the Drive Stats for our fleet of storage servers and see what we can learn.

Storage Pods, Storage Servers, and Backblaze Vaults

Let’s start with a few definitions:

  • Storage Server: A storage server is our generic name for a server from any manufacturer which we use to store customer data. We use storage servers from Backblaze, Dell, and Supermicro.
  • Storage Pod: A Storage Pod is the name we gave to the storage servers Backblaze designed and had built for our data centers. The first Backblaze Storage Pod version was announced in September 2009. Subsequent versions are 2.0, 3.0, 4.0, 4.5, 5.0, 6.0, and 6.1. All but 6.1 were announced publicly. 
  • Backblaze Vault: A Backblaze Vault is 20 storage servers grouped together for the purpose of data storage. Uploaded data arrives at a given storage server within a Backblaze Vault and is encoded into 20 parts with a given part being either a data blob or parity. Each of the 20 parts (shards) is then stored on one of the 20 storage servers. 

As you review the charts and tables here are a few things to know about Backblaze Vaults.

  • There are currently six cohorts of storage servers in operation today: Supermicro, Dell, Backblaze 3.0, Backblaze 5.0, Backblaze 6.0, and Backblaze 6.1.
  • A given Vault will always be made up from one of the six cohorts of storage servers noted above. For example, Vault 1016 is made up of 20 Backblaze 5.0 Storage Pods and Vault 1176 is made of the 20 Supermicro servers. 
  • A given Vault is made up of storage servers that contain the same number of drives as follows:
    • Dell servers: 26 drives.
    • Backblaze 3.0 and Backblaze 5.0 servers: 45 drives.
    • Backblaze 6.0, Backblaze 6.1, and Supermicro servers: 60 drives.
  • All of the hard drives in a Backblaze Vault will be logically the same size; so, 16TB drives for example.

Drive Stats by Backblaze Vault Cohort

With the background out of the way, let’s get started. As of the end of Q3 2023, there were a total of 241 Backblaze Vaults divided into the six cohorts, as shown in the chart below. The chart includes the server cohort, the number of Vaults in the cohort, and the percentage that cohort is of the total number of Vaults.

A pie chart showing the types of Backblaze Vaults by percentage.

Vaults consisting of Backblaze servers still comprise 68% of the vaults in use today (shaded from orange to red), although that number is dropping as older Vaults are being replaced with newer server models, typically the Supermicro systems.

The table below shows the Drive Stats for the different Vault cohorts identified above for Q3 2023.

A chart showing the Drive Stats for Backblaze Vaults.

The Avg Age (months) column is the average age of the drives, not the average age of the Vaults. The two may seem to be related, that’s not entirely the case. It is true the Backblaze 3.0 Vaults were deployed first followed in order by the 5.0 and 6.0 Vaults, but that’s where things get messy. There was some overlap between the Dell and Backblaze 6.1 deployments as the Dell systems were deployed in our central Europe data center, while the 6.1 Vaults continued to be deployed in the U.S. In addition, some migrations from the Backblaze 3.0 Vaults were initially done to 6.1 Vaults while we were also deploying new drives in the Supermicro Vaults. 

The AFR for each of the server versions does not seem to follow any pattern or correlation to the average age of the drives. This was unexpected because, in general, as drives pass about four years in age, they start to fail more often. This should mean that Vaults with older drives, especially those with drives whose average age is over four years (48 months), should have a higher failure rate. But, as we can see, the Backblaze 5.0 Vaults defy that expectation. 

To see if we can determine what’s going on, let’s expand on the previous table and dig into the different drive sizes that are in each Vault cohort, as shown in the table below.

A table showing Drive Stats by server version and drive size.

Observations for Each Vault Cohort

  • Backblaze 3.0: Obviously these Vaults have the oldest drives and, given their AFR is nearly twice the average for all of the drives (1.53%), it would make sense to migrate off of these servers. Of course the 6TB drives seem to be the exception, but at some point they will most likely “hit the wall” and start failing.
  • Backblaze 5.0: There are two Backblaze 5.0 drive sizes (4TB and 8TB) and the AFR for each is well below the average AFR for all of the drives (1.53%). The average age of the two drive sizes is nearly seven years or more. When compared to the Backblaze 6.0 Vaults, it would seem that migrating the 5.0 Vaults could wait, but there is an operational consideration here. The Backblaze 5.0 Vaults each contain 45 drives, and from the perspective of data density per system, they should be migrated to 60 drive servers sooner rather than later to optimize data center rack space.
  • Backblaze 6.0: These Vaults as a group don’t seem to make any of the five different drive sizes happy. Only the AFR of the 4TB drives (1.42%) is just barely below the average AFR for all of the drives. The rest of the drive groups are well above the average.
  • Backblaze 6.1: The 6.1 servers are similar to the 6.0 servers, but with an upgraded CPU and faster NIC cards. Is that why their annualized failure rates are much lower than the 6.0 systems? Maybe, but the drives in the 6.1 systems are also much younger, about half the age of those in the 6.0 systems, so we don’t have the full picture yet.
  • Dell: The 14TB drives in the Dell Vaults seem to be a problem at a 5.46% AFR. Much of that is driven by two particular Dell vaults which have a high AFR, over 8% for Q3. This appears to be related to their location in the data center. All 40 of the Dell servers which make up these two Vaults were relocated to the top of 52U racks, and it appears that initially they did not like their new location. Recent data indicates they are doing much better, and we’ll publish that data soon. We’ll need to see what happens over the next few quarters. That said, if you remove these two Vaults from the Dell tally, the AFR is a respectable 0.99% for the remaining Vaults.
  • Supermicro: This server cohort is mostly 16TB drives which are doing very well with an AFR of 0.62%. The one 14TB Vault is worth our attention with an AFR of 1.95%, and the 22TB Vault is too new to do any analysis.

Drive Stats by Drive Size and Vault Cohort

Another way to look at the data is to take the previous table and re-sort it by drive size. Before we do that let’s establish the AFR for the different drive sizes aggregated over all Vaults.

A bar chart showing annualized failure rates for Backblaze Vaults by drive size.

As we can see in Q3 the 6TB and 22TB Vaults had zero failures (AFR = 0%). Also, the 10TB Vault is indeed only one Vault, so there are no other 10TB Vaults to compare it to. Given this, for readability, we will remove the 6TB, 10TB, and 22TB Vaults from the next table which compares how each drive size has fared in each of the six different Vault cohorts.

A table showing the annualized failure rates of servers by drive size and server version, not displaying the 6TB, 10TB, and 22TB Vaults.

Currently we are migrating the 4TB drive Vaults to larger Vaults, replacing them with drives of 16TB and above. The migrations are done using an in-house system which we’ll expand upon in a future post. The specific order of migrations is based on failure rates and durability of the existing 4TB Vaults with an eye towards removing the Backblaze 3.0 systems first as they are nearly 10 years old in some cases, and many of the non-drive replacement parts are no longer available. Whether we give away, destroy, or recycle the retired Backblaze 3.0 Storage Pods (sans drives) is still being debated.

For the 8TB drive Vaults, the Backblaze 5.0 Vaults are up first for migration when the time comes. Yes, their AFR is lower then the Backblaze 6.0 Vaults, but remember: the 5.0 Vaults are 45 drive units which are not as efficient storage density-wise versus the 60 drive systems. 

Speaking of systems with less than 60 drives, the Dell servers are 26 drives. Those 26 drives are in a 2U chassis versus a 4U chassis for all of the other servers. The Dell servers are not quite as dense as the 60 drive units, but their 2U form factor gives us some flexibility in filling racks, especially when you add utility servers (1U or 2U) and networking gear to the mix. That’s one of the reasons the two Dell Vaults we noted earlier were moved to the top of the 52U racks. FYI, those two Vaults hold 14TB drives and are two of the four 14TB Dell Vaults making up the 5.46% AFR. The AFR for the Dell Vaults with 12TB and 16TB drives is 0.76% and 0.92% respectively. As noted earlier, we expect the AFR for 14TB Dell Vaults to drop over the coming months.

What Have We Learned?

Our goal today was to see what we can learn about the drive failure rates of the storage servers we use in our data centers. All of our storage servers are grouped in operational systems we call Backblaze Vaults. There are six different cohorts of storage servers with each vault being composed of the same type of storage server, hence there are six types of vaults. 

As we dug into data, we found that the different cohorts of Vaults had different annualized failure rates. What we didn’t find was a correlation between the age of the drives used in the servers and the annualized failure rates of the different Vault cohorts. For example, the Backblaze 5.0 Vaults have a much lower AFR of 0.99%  versus the Backblaze 6.0 Vault AFR at 2.14%—even though the drives in the 5.0 Vaults are nearly twice as old on average than the drives in the 6.0 Vaults.

This suggests that while our initial foray into the annualized failure rates of the different Vault cohorts is a good first step, there is more to do here.

Where Do We Go From Here?

In general, all of the Vaults in a given cohort were manufactured to the same specifications, used the same parts, and were assembled using the same processes. One obvious difference is that different drive models are used in each Vault cohort. For example, the 16TB vaults are composed of seven different drive models. Do some drive models work better in one Vault cohort versus another? Over the next couple of quarters we’ll dig into the data and let you know what we find. Hopefully it will add to our understanding of the annualized failures rates of the different Vault cohorts. Stay tuned.

The post The Drive Stats of Backblaze Storage Pods appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Amazon OpenSearch Serverless now supports automated time-based data deletion 

Post Syndicated from Satish Nandi original https://aws.amazon.com/blogs/big-data/amazon-opensearch-serverless-now-supports-automated-time-based-data-deletion/

We recently announced a new enhancement to OpenSearch Serverless for managing data retention of Time Series collections and Indexes. OpenSearch Serverless for Amazon OpenSearch Service makes it straightforward to run search and analytics workloads without having to think about infrastructure management. With the new automated time-based data deletion feature, you can specify how long they want to retain data and OpenSearch Serverless automatically manages the lifecycle of the data based on this configuration.

To analyze time series data such as application logs and events in OpenSearch, you must create and ingest data into indexes. Typically, these logs are generated continuously and ingested frequently, such as every few minutes, into OpenSearch. Large volumes of logs can consume a lot of the available resources such as storage in the clusters and therefore need to be managed efficiently to maximize optimum performance. You can manage the lifecycle of the indexed data by using automated tooling to create daily indexes. You can then use scripts to rotate the indexed data from the primary storage in clusters to a secondary remote storage to maintain performance and control costs, and then delete the aged data after a certain retention period.

The new automated time-based data deletion feature in OpenSearch Serverless minimizes the need to manually create and manage daily indexes or write data lifecycle scripts. You can now create a single index and OpenSearch Serverless will handle creating a timestamped collection of indexes under one logical grouping automatically. You only need to configure the desired data retention policies for your time series data collections. OpenSearch Serverless will then efficiently roll over indexes from primary storage to Amazon Simple Storage Service(Amazon S3) as they age, and automatically delete aged data per the configured retention policies, reducing the operational overhead and saving costs.

In this post we discuss the new data lifecycle polices and how to get started with these polices in OpenSearch Serverless

Solution Overview

Consider a use case where the fictitious  company Octank Broker collects logs from its web services and ingests them into OpenSearch Serverless for service availability analysis. The company is interested in tracking web access and root cause when failures are seen with error types 4xx and 5xx. Generally, the server issues are of interest within an immediate timeframe, say in a few days. After 30 days, these logs are no longer of interest.

Octank wants to retain their log data for 7 days. If the collections or indexes are configured for 7 days’ data retention, then after 7 days, OpenSearch Serverless deletes the data. The indexes are no longer available for search. Note: Document counts in search results might reflect data that is marked for deletion for a short time.

You can configure data retention by creating a data lifecycle policy. The retention time can be unlimited, or a you can provide a specific time length in Days and Hours with a minimum retention of 24 hours and a maximum of 10 years. If the retention time is unlimited, as the name suggests, no data is deleted.

To start using data lifecycle policies in OpenSearch Serverless, you can follow the steps outlined in this post.

Prerequisites

This post assumes that you have already set up an OpenSearch Serverless collection. If not, refer to Log analytics the easy way with Amazon OpenSearch Serverless for instructions.

Create a data lifecycle policy

You can create a data lifecycle policy from the AWS Management Console, the AWS Command Line Interface (AWS CLI), AWS CloudFormation, AWS Cloud Development Kit (AWS CDK), and Terraform. To create a data lifecycle policy via the console, complete the following steps:

  • On the OpenSearch Service console, choose Data lifecycle policies under Serverless in the navigation pane.
  • Choose Create data lifecycle policy.
  • For Data lifecycle policy name, enter a name (for example, web-logs-policy).
  • Choose Add under Data lifecycle.
  • Under Source Collection, choose the collection to which you want to apply the policy (for example, web-logs-collection).
  • Under Indexes, enter the index or index patterns to apply the retention duration (for example, web-logs).
  • Under Data retention, disable Unlimited (to set up the specific retention for the index pattern you defined).
  • Enter the hours or days after which you want to delete data from Amazon S3.
  • Choose Create.

The following graphic gives a quick demonstration of creating the OpenSearch Serverless Data lifecycle policies via the preceding steps.

View the data lifecycle policy

After you have created the data lifecycle policy, you can view the policy by completing the following steps:

  • On the OpenSearch Service console, choose Data lifecycle policies under Serverless in the navigation pane.
  • Select the policy you want to view (for example, web-logs-policy).
  • Choose the hyperlink under Policy name.

This page will show you the details such as the index pattern and its retention period for a specific index and collection. The following graphic gives a quick demonstration of viewing the OpenSearch Serverless data lifecycle policies via the preceding steps.

Update the data lifecycle policy

After you have created the data lifecycle policy, you can modify and update it to add more rules. For example, you can add another index pattern or add a new collection with a new index pattern to set up the retention. The following example shows the steps to add another rule in the policy for syslog index under syslogs-collection.

  • On the OpenSearch Service console, choose Data lifecycle policies under Serverless in the navigation pane.
  • Select the policy you want to edit (for example, web-logs-policy), then choose Edit.
  • Choose Add under Data lifecycle.
  • Under Source Collection, choose the collection you are going to use for setting up the data lifecycle policy (for example, syslogs-collection).
  • Under Indexes, enter index or index patterns you are going to set retention for (for example, syslogs).
  • Under Data retention, disable Unlimited (to set up specific retention for the index pattern you defined).
  • Enter the hours or days after which you want to delete data from Amazon S3.
  • Choose Save.

The following graphic gives a quick demonstration of updating existing data lifecycle policies via the preceding steps.

Delete the data lifecycle policy

Delete the existing data lifecycle policy with the following steps:

  • On the OpenSearch Service console, choose Data lifecycle policies under Serverless in the navigation pane.
  • Select the policy you want to edit (for example, web-logs-policy).
  • Choose Delete.

Data lifecycle policy rules

In a data lifecycle policy, you specify a series of rules. The data lifecycle policy lets you manage the retention period of data associated to indexes or collections that match these rules. These rules define the retention period for data in an index or group of indexes. Each rule consists of a resource type (index), a retention period, and a list of resources (indexes) that the retention period applies to.

You define the retention period with one of the following formats:

  • “MinIndexRetention”: “24h” – OpenSearch Serverless retains the index data for a specified period in hours or days. You can set this period to be from 24 hours (24h) to 3,650 days (3650d).
  • “NoMinIndexRetention”: true – OpenSearch Serverless retains the index data indefinitely.

When data lifecycle policy rules overlap, within or across policies, the rule with a more specific resource name or pattern for an index overrides a rule with a more general resource name or pattern for any indexes that are common to both rules. For example, in the following policy, two rules apply to the index index/sales/logstash. In this situation, the second rule takes precedence because index/sales/log* is the longest match to index/sales/logstash. Therefore, OpenSearch Serverless sets no retention period for the index.

Summary

Data lifecycle policies provide a consistent and straightforward way to manage indexes in OpenSearch Serverless. With data lifecycle policies, you can automate data management and avoid human errors. Deleting non-relevant data without manual intervention reduces your operational load, saves storage costs, and helps keep the system performant for search.


About the authors

Prashant Agrawal is a Senior Search Specialist Solutions Architect with Amazon OpenSearch Service. He works closely with customers to help them migrate their workloads to the cloud and helps existing customers fine-tune their clusters to achieve better performance and save on cost. Before joining AWS, he helped various customers use OpenSearch and Elasticsearch for their search and log analytics use cases. When not working, you can find him traveling and exploring new places. In short, he likes doing Eat → Travel → Repeat.

Satish Nandi is a Senior Product Manager with Amazon OpenSearch Service. He is focused on OpenSearch Serverless and has years of experience in networking, security and ML/AI. He holds a Bachelor degree in Computer Science and an MBA in Entrepreneurship. In his free time, he likes to fly airplanes, hang gliders and ride his motorcycle.

Lenôtre: Maestro – Introduction

Post Syndicated from jake original https://lwn.net/Articles/956699/

On his blog,
Luc Lenôtre introduces
Maestro
, “a Unix-like kernel and operating system written from
scratch in Rust
“. Maestro is intended to be
lightweight and compatible-enough with Linux to be usable in everyday
life
“. The project began, in C, back in 2018, but switched over to
Rust after a year-and-a-half. The current status:

Maestro is a monolithic kernel, supporting only the x86 (in 32 bits)
architecture for now.

At the time of writing, 135 out of 437 Linux system calls
(roughly 31%) are
more or less implemented. The project has 48 800 lines of code
across 615
files (all repositories combined, counted using the cloc command).

There is a Hacker
News discussion
of the project as well.

Vim 9.1 released

Post Syndicated from corbet original https://lwn.net/Articles/956696/

Version 9.1 of the
Vim editor has been released. “This release is dedicated to Bram
Moolenaar, Vims lead developer for more than 30 years, who passed away half
a year ago. The Vim project wouldn’t exist without his work
“. Changes
include new support for classes and objects in the scripting language,
smooth scrolling support, an EditorConfig plugin, and more.

How Transfer Family can help you build a secure, compliant managed file transfer solution

Post Syndicated from John Jamail original https://aws.amazon.com/blogs/security/how-transfer-family-can-help-you-build-a-secure-compliant-managed-file-transfer-solution/

Building and maintaining a secure, compliant managed file transfer (MFT) solution to securely send and receive files inside and outside of your organization can be challenging. Working with a competent, vigilant, and diligent MFT vendor to help you protect the security of your file transfers can help you address this challenge. In this blog post, I will share how AWS Transfer Family can help you in that process, and I’ll cover five ways to use the security features of Transfer Family to get the most out of this service. AWS Transfer Family is a fully managed service for file transfers over SFTP, AS2, FTPS, and FTP for Amazon Simple Storage Service (Amazon S3) and Amazon Elastic File System (Amazon EFS).

Benefits of building your MFT on top of Transfer Family

As outlined in the AWS Shared Responsibility Model, security and compliance are a shared responsibility between you and Transfer Family. This shared model can help relieve your operational burden because AWS operates, manages, and controls the components from the application, host operating system, and virtualization layer down to the physical security of the facilities in which the service operates. You are responsible for the management and configuration of your Transfer Family server and the associated applications outside of Transfer Family.

AWS follows industry best practices, such as automated patch management and continuous third-party penetration testing, to enhance the security of Transfer Family. This third-party validation and the compliance of Transfer Family with various regulatory regimes (such as SOC, PCI, HIPAA, and FedRAMP) integrates with your organization’s larger secure, compliant architecture.

One example of a customer who benefited from using Transfer Family is Regeneron. Due to their needs for regulatory compliance and security, and their desire for a scalable architecture, they moved their file transfer solution to Transfer Family. Through this move, they achieved their goal of a secure, compliant architecture and lowered their overall costs by 90%. They were also able to automate their malware scanning process for the intake of files. For more information on their success story, see How Regeneron built a secure and scalable file transfer service using AWS Transfer Family. There are many other documented success stories from customers, including Liberty Mutual, Discover, and OpenGamma.

Steps you can take to improve your security posture with Transfer Family

Although many of the security improvements that Transfer Family makes don’t require action on your part to use, you do need to take action on a few for compatibility reasons. In this section, I share five steps that you should take to adopt a secure, compliant architecture on Transfer Family.

  • Use strong encryption for data in transit — The first step in building a secure, compliant MFT service is to use strong encryption for data in transit. To help with this, Transfer Family now offers a strong set of available ciphers, including post-quantum ciphers that have been designed to resist decryption from future, fault-tolerant quantum computers that are still several years from production. Transfer Family will offer this capability by default for newly created servers after January 31, 2024. Existing customers can select this capability today by choosing the latest Transfer Family security policy. We review the choice of the default security policy for Transfer Family periodically to help ensure the best security posture for customers. For information about how to check what security policy you’re using and how to update it, see Security policies for AWS Transfer Family.
  • Duplicate your server’s host key — You need to make sure that a threat actor can’t impersonate your server by duplicating your server’s host key. Your server’s host key is a vital component of your secure, compliant architecture to help prevent man-in-the-middle style events where a threat actor can impersonate your server and convince your users to provide sensitive login information and data. To help prevent this possibility, we recommend that Transfer Family SFTP servers use at least a 4,096-bit RSA, ED25519, or ECDSA host key. As part of our shared responsibility model to help you build a secure global infrastructure, Transfer Family will increase its default host key size to 4,096 bits for newly created servers after January 31, 2024. To make key rotation as simple as possible for those with weaker keys, Transfer Family supports the use of multiple host keys of multiple types on a single server. However, you should deprecate the weaker keys as soon as possible because your server is only as secure as its weakest key. To learn what keys you’re using and how to rotate them, see Key management.

The next three steps apply if you use the custom authentication option in Transfer Family, which helps you use your existing identity providers to lift and shift workflows onto Transfer Family.

  • Require both a password and a key — To increase your security posture, you can require the use of both a password and key to help protect your clients from password scanners and a threat actor that might have stolen their key. For details on how to view and configure this, see Create an SFTP-enabled server.
  • Use Base64 encoding for passwords — The next step to improve your security posture is to use or update your custom authentication templates to use Base64 encoding for your passwords. This allows for a wider variety of characters and makes it possible to create more complex passwords. In this way, you can be more inclusive of a global audience that might prefer to use different character sets for their passwords. A more diverse character set for your passwords also makes your passwords more difficult for a threat actor to guess and compromise. The example templates for Transfer Family make use of Base64 encoding for passwords. For more details on how to check and update your templates to password encoding to use Base64, see Authenticating using an API Gateway method.
  • Set your API Gateway method’s authorizationType property to AWS_IAM — The final recommended step is to make sure that you set your API Gateway method’s authorizationType property to AWS_IAM to require that the caller submit the user’s credentials to be authenticated. With IAM authorization, you sign your requests with a signing key derived from your secret access key, instead of your secret access key itself, helping to ensure that authorization requests to your identity provider use AWS Signature Version 4. This provides an extra layer of protection for your secret access key. For details on how to set up AWS_IAM authorization, see Control access to an API with IAM permissions.

Conclusion

Transfer Family offers many benefits to help you build a secure, compliant MFT solution. By following the steps in this post, you can get the most out of Transfer Family to help protect your file transfers. As the requirements for a secure, compliant architecture for file transfers evolve and threats become more sophisticated, Transfer Family will continue to offer optimized solutions and provide actionable advice on how you can use them. For more information, see Security in AWS Transfer Family.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

John Jamail

John Jamail

John is the Head of Engineering for AWS Transfer Family. Prior to joining AWS, he spent eight years working in data security focused on security incident and event monitoring (SIEM), governance, risk, and compliance (GRC), and data loss prevention (DLP).

Best practices for scaling AWS CDK adoption within your organization

Post Syndicated from David Hessler original https://aws.amazon.com/blogs/devops/best-practices-for-scaling-aws-cdk-adoption-within-your-organization/

Enterprises are constantly seeking ways to accelerate their journey to the cloud. Infrastructure as code (IaC) is crucial for automating and managing cloud resources efficiently. The AWS Cloud Development Kit (AWS CDK) lets you define your cloud infrastructure as code in your favorite programming language and deploy it using AWS CloudFormation. In this post, we will discuss strategies and best practices for accelerating CDK adoption within your organization. Our discussion begins after your organization has successfully completed a pilot. In this post, you will learn how to scale the lessons learned from the pilot project across your organization through platform engineering. You will learn how to reduce complexity through building reusable components, deploy with speed and safety via builder tooling, and accelerate project startup with an internal developer portal (IDP). We will conclude by discussing ways to participate in and benefit from the broader CDK community.

Before we dive in, let’s briefly discuss a new trend in technology: Platform Engineering. DevOps practices have helped IT organizations deliver software to customers more frequently and with higher quality. A recent evolution in DevOps is the introduction of platform engineering teams to build services, toolchains, and documentation to support workload teams. An important responsibility of the platform engineering team is governance of the software delivery process.

At Amazon, we have a long and storied history of leveraging platform engineering to accelerate deployments. This is why we are able to maintain 143 different compliance certifications and attestations while deploying 150 million times per year. Platform engineering increases productivity, reduces friction between ideas and implementation, and improves agility by accelerating the delivery of workloads via a secure, scalable, and reusable set of resources and components through self-service portals and developer tools. Platform Engineering is comprised of seven capabilities: Platform Architecture, Data Architecture, Platform Product Engineering, Data Engineering, Provisioning & Orchestration, Modern App Development and CI/CD. For more information on platform engineering visit the AWS Cloud Adoption Framework.

Establishing these capabilities takes several platform and workload teams working together. From an operating model standpoint, a workload team interacts with Platform Engineering in one of the three following ways (for more information, see Building a Cloud Operating Model):

Image describes a three different cloud operating models. The first model is a transitional model where Application Engineering and Application Operations teams both supported by Cloud Platform Engineering. The second model is strategic where Application Engineering and Cloud Platform Engineering equally own the responsibility. The third model is also strategic where Application Engineering and Cloud Platform Engineering jointly own responsibility but Application Engineering owns most of the responsibility.

Reduce Builder Complexity and Cognitive load with Reusable Components

So, how can the platform team incorporate CDK to accomplish their goals? One of the common objectives of the Platform Engineering team is to publish and curate reusable patterns called Constructs. Constructs provide a mechanism to create reusable, extensible, and common components that can be shared across multiple teams and projects.

Many customers write their own implementations for constructs to enforce security best practices such as encryption and specific AWS Identity and Access Management policies. For example, you might create a MyCompanyBucket that implements your organizations security requirements in place of the default Amazon S3 Bucket construct. This bucket configuration can be implemented and extended by multiple teams to ensure they are using components that are validated by your security and compliance teams.

For customers focused on data governance, CDK constructs can automatically add in best practices for recovery time objectives and recovery point objectives by ensuring backups and architecture meet an organization’s resilience policies. For advance customers looking to enforce data lifecycle policies, create uniform access controls, or emit required KPIs, CDK constructs can provide avenues to create safe and secure configuration by default. Applying CDK constructs to DataOps, customers can benefit from templated ETL pipelines that ensure data lineage metadata is maintained and data cleansing occurs.

Customers also build constructs for non-AWS resources. Teams can build Constructs for third-party builder tooling, observability systems, testing apparatuses and more. In this way, workload teams can codify AWS and non-AWS resources in one code base. There is a balance required when writing your own constructs between ensuring standardization and providing the freedom and flexibility of taking advantage of the growing ecosystems of CDK packages. Examples of this balance include AWS Solutions Constructs, as these are typically built upon standard constructs. Without extending standard constructs, the constructs you build will be harder for consumer to integrate with the larger CDK ecosystem since it uses standardize interfaces.

Construct Hub is a central destination for discovering and sharing cloud application design patterns and reference architectures defined for CDK, that are built and published by the AWS community. While AWS provides a public Construct Hub, enterprises can maintain their private Construct Hub inside their own AWS accounts (see construct-hub, the GitHub repository, or the CDK Workshop for more details). The primary objective in either case remains consistent: to provide shared libraries that can be readily utilized by different workload teams. This approach ensures enhanced consistency, reusability, and ultimately leads to cost reduction and faster development timelines.

One of the pitfalls customers often have with leveraging this approach is that Platform Engineering cannot keep up building reusable components to leverage the latest technology enhancements. This is where leveraging the lessons learned from a pilot really can help. A pilot team works with platform engineering to research and implement security best practices. Some customers have the platform engineering team act as approvers for new constructs in addition to authors of new constructs. In this model, a pilot team works to build construct(s) for a new technology. The platform engineers approve the new construct(s). Platform engineers ensure the pilot team meets required standards such as enforcing encryption at rest, encryption in transit, and least privilege. When approval occurs, the pilot team can publish the new construct(s) to Construct Hub. In this way, platform engineering can enable experimentation and innovation, rather than become a gatekeeper. Additionally, platform engineering teams can encourage and curate an inner-sourcing model for construct creation rather than being the sole creator of constructs.

Deploy Applications Using DevSecOps Best Practices

Application builders are most productive when their expertise is channeled towards writing code that directly addresses business challenges. While creating applications is a skill well within the grasp of many software developers, the complex task of deploying and operating these applications in line with organizational standards can be overwhelming, especially for those new to a team. This complexity often acts as a bottleneck, slowing down the experimentation process and delaying the realization of value from new application initiatives.

A solution to this challenge lies in automating the deployment pipeline and operational model. By employing thoroughly tested CDK (Cloud Development Kit) components that are shared across teams and validated through a robust CI/CD (Continuous Integration/Continuous Deployment) process, the burden on developers is significantly reduced. They no longer need to delve into the complexities of the organization’s deployment strategies, allowing them to concentrate on writing unique, innovative code. This approach not only streamlines the development process but also bridges the gap between development and operations, leading to more cohesive teams and faster, more efficient releases.

One key to high-quality software delivery is to have a proper Continuous Integration and Continuous Delivery (CI/CD) process in place. You can see CDK Pipelines: Continuous delivery for AWS CDK applications for practical examples. This high-level construct, powered by AWS CodePipeline, comes in handy when you need to go beyond test deployments with the cdk deploy command and build automated pipelines for production deployments to multiple environments in different regions and/or accounts.

Whenever you commit your AWS CDK app’s source code into AWS CodeCommit, GitHub, GitLab, BitBucket, or Amazon CodeCatalyst source repository, AWS CDK Pipelines automatically builds, tests, and deploys a new version of the application. This pipeline automatically reconfigures itself to deploy as the resources in stacks changes or the environments being deployed to change. For GitHub Actions users, see CDK Pipelines for GitHub Workflows.

A number of teams are extending these pipelines and adding their own stages to ensure deployed code meets the organization’s quality, security, risk, compliance and cloud financial management criteria. For best practices of what automation to put inside the pipeline, see the AWS Deployment Pipeline Reference Architecture. By creating fully functional pipelines, platform engineering teams can reduce the cognitive load place on development teams and increase the developer experience. This strategy has two implementations: QuickStart pipelines and golden pipelines.

In QuickStart pipelines, these pipelines are created as a construct in your Construct Hub and treated similar to the above discussion on reusable components. While these pipelines offer simplified interfaces and a reduction in cognitive load, workload teams remain in control of the pipeline and are free to modify it. As a result, quality gates such as security or compliance tooling can be disabled by workload teams and controls inside the pipeline aren’t provable. This is suboptimal for organizations looking to reduce costs of compliance and audit. As the number of versions of the construct grows, teams can have difficulty governing which versions are used to ensure teams consume.

In golden pipelines, the pipelines are created as constructs, but deployed via a centralized team. Workload teams cannot control or modify these pipelines, so quality gates such as security and compliance tooling cannot be disabled. These controls become provable to stakeholders in security, risk and compliance such as auditors. Removing permissions from workload teams comes with costs. With golden pipelines, platform engineering teams often spend a majority of their time troubleshooting workload teams’ deployments. With so much time spent on troubleshooting, teams have little time to introduce new tooling to raise the security and quality standard, improve environment setup and organizational consistency, or improve audit evidence and enforcement.

Two mechanisms can augment these strategies. Traditional change control boards (CCB) can provide provability in situations where gathering evidence and enforcement are difficult. CCBs can benefit from CDK constructs that integrate IT Service Management (ITSM) approvals and fleet management processes into the pipeline and account creation processes. Alternatively, there is an emerging story with Software Supply Chain Level Artifacts (SLSA). These artifacts can be used as digital proof. In the Kubernetes space, we see this pattern with tools like Tekton chains where attestations associated with OCI images and Kyverno is used for to enforcement the presence of attestations (see Protect the pipe! Secure CI/CD pipelines with a policy-based approach using Tekton and Kyverno for details).

Multi-account and cross-region deployment with CDK

DevOps best practices suggest multiple stages of deployment and testing before deploying to production. On top of that, AWS recommends a dedicated account for each stage to simplify resource isolation and access control. This multi-account strategy helps organizations make best use of AWS resources and provides fine-grain controls (see Recommended OUs and accounts).

Often, you will have a designated AWS account, where all CI/CD pipelines reside. A deployment is executed by these pipelines to publish to other AWS accounts, which may correspond to development, staging, or production stages. For more information about a cross-account strategy in reference to CI/CD pipelines on AWS, see Building a Secure Cross-Account Continuous Delivery Pipeline.

Automated Governance

Many enterprise customers leverage CDK to enforce security controls and policies and can prevent security issues before deployment with tooling to analyze code as part of the deployment pipeline. Using the industry standard tooling of cdk-nag, many teams check applications for best practices using a combination of available rule packs. We are also seeing enterprises build their own Aspects to enforce additional requirements such as tagging requirements to manage and organize their deployed resources.

Customers can create CDK synthesized CloudFormation and add additional checkpoints with CloudFormation Guard to verify the output using policy-as-code domain-specific language (DSL) rules. Platform Engineering teams can build the rules and workload team can consume rules and run CloudFormation Guard inside the pipeline. There is an official construct that supports makes it easy to add CloudFormation Guard checks to your application.

With AWS CDK, infrastructure is code. So, the standard tooling you already use to ensure quality and improve the builder experience should be used with CDK. If your organization has a code quality program, treat CDK applications no differently than web applications or microservices. Similarly, with Amazon CodeGuru Security and Amazon CodeWhisperer, builders can get actionable recommendations on how to improve both the security and quality on their CDK code as they would with any other type of application.

With Aspects, cdk-nag, and code quality tools, organizations can prevent security issues before they are deployed. However, it is also important to create controls that work after a deployment occurs. AWS CloudFormation Hooks allow customers to inspect resources prior to create, update, or delete CloudFormation Stacks or CDK Applications. With CloudFormation Hooks, Platform Engineering teams can provide warnings or prevent provisioning resources for non-compliant resources. These hooks can be created via CDK (see Build and Deploy CloudFormation Hooks using A CI/CD Pipeline for details).

Finally, you can deploy AWS Config’s conformance packs via CDK. These collections of rules you’re your organization insist on security standards at scale. If your organization wishes to build custom rules, teams can build reactive controls using higher level constructs for AWS Config Rules. While many of these patterns existed prior to CDK, CDK helps accelerate building and deploying cloud applications and controls by leveraging reusable components that are shared within the enterprise or by the community at large.

Operate the Application using Observability

The open-source community provides high-level construct libraries that expand basic monitoring capabilities for CDK applications. The cdk-monitoring-constructs project makes it easy to monitor CDK apps. Similarly, Cdk-wakeful takes that a step further, adding many additional services and provides easily configurable interfaces to automatically be notified by AWS System Manager Incident Manager, AWS Chatbot, or Amazon Simple Notification Service. By leveraging prebuilt solutions from the open-source community, you can focus on creating custom metrics and thresholds around your business logic. Platform Engineering teams can modify and extends 1open-source projects to help workload teams simplify their operations and emit health and status to centralized systems.

Accelerate New Project Startup with an Internal Developer Platform

An Internal Developer Platform (IDP) is built by platform engineering teams to build golden paths and enable developer self-service. These golden paths are expressed as a series of templates that the structure of a source control repository and files stored inside the repository. When the IDP uses these templates to create source code repositories, the resultant repository contains the following:

  • A getting-started tutorial (usually in a README.md)
  • Reference documentation
  • Skeleton source code
  • Dependency Management
  • CI/CD pipeline template
  • IaC template
  • Observability configuration

With CDK, the CI/CD pipeline, IaC template, and observability configuration can all be a part of a single CDK application.

Platform engineering teams build golden paths and expose them using tools like Backstage, Humanitec, or Port. When building golden paths, there are two common approaches to the underlying project structure. Some organizations choose the approach where their IaC code repository is separate from the application code. Others choose to include everything in one repository. There is a healthy tension between how much to place inside a golden path vs a reusable component. In both strategies, platform engineering teams can avoid code duplication by leveraging CDK. The approach your organization chooses will dictate how you organize your reusable components. Below, we will walk through both options and the implications on reusable constructs.

Option 1: Everything in one repository

In this approach, all the code is contained in one repository: infrastructure, application, configuration, and deployment. This approach enables builders to collaborate, build features, and innovate together quickly, which is why it is the recommended approach. For more details, refer to the Best practices documentation. For examples, see AWS Deployment Reference Architecture for Applications.

This approach works best in teams that are “value-stream aligned.” Value-stream aligned teams have development and operations capabilities within the same team. These teams are organized around solving problems for customers rather than technical capabilities. Within the project, teams can organize around logical units such as application tier (API, database, etc.) or business capabilities (order management, product catalog, delivery services, etc.). In organizations that are value stream aligned, larger, highly conventionalized reusable components are better. An extreme example of this type of constructs is a single construct that contains all the code for an entire microservice. In these teams, the cognitive load focuses on the customer problem, so reducing the complexity of developing applications is critical to success.

Option 2: Separated application code pipeline

In this alternative approach, you can decouple your application code from your infrastructure by storing them in separate repositories and having separate pipelines. Separating the pipelines often leads to siloes and less collaboration between workload builders, who shift focus to developing features, and infrastructure engineers, who limit their efforts to building the infrastructure on which those applications run.

This approach works best in teams that are “matrixed.” A matrix organization is structured around technical capabilities (development, operations, security, business, etc.). In these cases, more modular constructs work better than constructs that are highly conventionalized. Experts from each organization can use CDK constructs as mechanisms to share their expertise across the entire organization. Examples of these types of constructs are monitoring, alerting, or security constructs prebuilt with hooks to plug in to centralized monitoring.

Building a Community of Practice with Platform Engineering

Scaling any new technology within a large organization requires the creation and enablement of a community that fosters collaboration, establishes best practices, and stays up to date with the changes in the ecosystem. In order to enable the creation of these communities of practice within your organization, AWS supports multiple public communities centered around the creation of content to educate and enable CDK users. Members of your organization’s community of practice can connect with other CDK development teams around the world through these public AWS supported communities.

Communities of Practice

A Community of Practice (CoP) is a group of people with shared interest who come together to learn, collaborate and develop expertise in a specific domain through informal interactions and knowledge sharing. Within your organization, establishing communities of practice around CDK has been proven to enable mentorship, problem solving, and reusable assets. To get started, your platform engineering team – the creators of reusable constructs and builder tooling with CDK – become early content creators for the community of practice. This establishes a feedback loop where CDK creators publicize their achievements via the CoP and consumers can ask questions and provide direct guidance to creators. Once the CoP has sustainably expanded by the initial group that established it, the CoP can start to add hack-a-thons or game days within your organization, which can bring innovation and solve organization-wide challenges. Fully mature communities of practices own curated wikis or databases of knowledge. They use mechanisms such as townhalls, office hours, newsletters, and chat channels to keep the community up to date. In this way, CDK expertise is diffused across the organization. At AWS, this diffusion of expertise has led to teams other than platform engineering becoming creators of reusable constructs. By expanding who can create reusable constructs, we are able to accelerate our own innovation.

Communities

There is a growing community that supports CDK, with many different platforms available providing content, code, examples and meetups. CDK is currently maintained by AWS with support from the community on AWS CDK GitHub page where you can contribute to the platform, raise issues, see the backlog and join discussions with active community members.

CDK.dev is the community driven hub around the CDK ecosystem. This site brings together all the latest blogs, videos, and educational content. It also provides links to join the community Slack platform.

CDK Patterns houses an open source collection of AWS Serverless architecture patterns built with CDK for developers to use. These patterns are sources via AWS Community Builders / AWS Heroes.

Finally, AWS re:Post provides a question-and-answer portal for the community to resolve.

The AWS Community Builders program offers technical resources, education, and networking opportunities to AWS technical enthusiasts and emerging thought leaders who are passionate about sharing knowledge and connecting with the technical community.

Communities of practice can leverage AWS public communities like cdk.dev to fill gaps in knowledge. Townhalls can benefit from speakers from AWS Heroes or community builders, frequent contributors to GitHub or re:Post, or speakers from CDK Day. Newsletters can aggregate and summarize the latest news from across all AWS channels. Once your community of practice establishes CDK competencies, this collaboration can also be bidirectional. For example, experts in your organization’s community can become AWS Heroes. Success stories can be shared via CDK Day, guest blog posts, and you might even speak at one of our major events such as AWS Summits, AWS re:Invent, AWS re:Inforce, or AWS re:Mars.

Final Thoughts

As we’ve said throughout this blog, with CDK, Infrastructure is code. This has enabled a paradigm shift in the infrastructure management space. Today, we see many customers such as Liberty Mutual, Scenario, Checkmarx, and Registers of Scotland establishing mature ecosystems using CDK. With an active open-source community, an AWS dev team for long term support, and multiple platforms for knowledge sharing, your builders can quickly learn, build, and innovate. Due to successful pilots, many organizations adopt CDK, become more agile, and innovate faster. This is exactly what happened at Amazon, where CDK is the first choice for building new services.

Organizations often scale and reduce complexity through platform engineering. These teams build higher level constructs by applying best practices, and provide CI/CD pipelines to accelerate deployments. Your deployment is safer using unit testing on your infrastructure as code and through robust security controls to provide guidance to builders at every stage: from author to operate.

Finally, establishing a community enables your organization to build its own mature ecosystem. Through both internal and open-source communities your builders can connect, discover, and grow.

Photo of David Hessler

David Hessler

Prior to joining AWS, David spent a decade serving as a principal technologist and establishing Platform Engineering and SRE teams for the United States government. Since joining AWS in 2020, David has spent his time helping customers accelerate deployment speed and safety for some of AWS’s largest commercial and public sector customers. Today, as a part of the DevSecOps team within Global Services Security, he is building the next generation of DevSecOps tooling for AWS customers.

Amritha Shetty

Amritha Shetty

Amritha is a Solutions architect at AWS. She works with public sector customers to help migrate and modernize in the cloud. She loves helping citizens get more from public sector institutions through rapid innovation in the cloud. She brings over twelve years of software design and development experience and passionate about helping customers implement the next-generation development experience.

Photo of Chris Scudder

Chris Scudder

Chris is a Senior Solutions Architect with the UK Public Sector team. His primary focus is helping Public Sector customers adopt cloud technologies for their workloads, helping them streamline their development and operational processes. He has a background in application development and has created multiple Industry Solutions for UK Local Government. He has an interesting in Machine Learning and delivers AWS DeepRacer events alongside his day-to-day role.

Photo of Kumar Karra

Kumar Karra

Kumar Karra is a Senior Field Solutions Architect for AWS Small and Medium Business Customers. He has a strong background in designing and developing applications for small consumer facing customers to large mission critical applications for enterprises. He specialized in NextGen Developer Experience tools and enjoys helping customer shorten their time to value by guiding them on strategies to implement fast, repeatable, testable, and scalable tools and architectures.

Facial Recognition Systems in the US

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/01/facial-recognition-systems-in-the-us.html

A helpful summary of which US retail stores are using facial recognition, thinking about using it, or currently not planning on using it. (This, of course, can all change without notice.)

Three years ago, I wrote that campaigns to ban facial recognition are too narrow. The problem here is identification, correlation, and then discrimination. There’s no difference whether the identification technology is facial recognition, the MAC address of our phones, gait recognition, license plate recognition, or anything else. Facial recognition is just the easiest technology right now.

eIDAS 2.0, QWACs And The Security Of The Web

Post Syndicated from Bozho original https://techblog.bozho.net/eidas-2-0-qwacs-and-the-security-of-the-web/

Tension has been high in the past months regarding a proposed change to the European eIDAS regulation which defines trust services, digital identity, and the so-called QWACs – qualified website authentication certificates. The proposal aims at making sure that EU-issued certificates are recognized by browsers. Here’s a summary from Scott Helme, and a discussion with Troy Hunt, and another good post by Eric Rescorla, former Firefox CTO so I’ll skip the intro.

Objections

Early in the process, Mozilla issued a position paper that raises some issues with the proposal. One of them is that what the EU suggests is basically an Extended Validation certificate – something that we had in the past (remember the big green address bars?), and which we have abandoned some time ago, and for a reason – multiple studies found that they do not bring any benefits. The EU says “QWACs (EVs) give the user more trust because they know which legal entity is behind a given website”. And the expert community says “well, in which scenario is that useful, and what about faking it – opening an entity with the same name in a different jurisdiction?”.

Later in the process, an additional limitation was added for browser vendors – that they cannot mandate additional security requirements than those specified by the EU standards body – ETSI. This is, to me, counterintuitive policy-wise, because in general, you set minimum requirements in regulations, not maximum. Of course, this prevents browser vendors from having arbitrary requirements. Not that they’ve had such requirements per se, but for example their CA inclusion page says “Mozilla is under no obligation to explain the reasoning behind any inclusion decision.” For me, this is not an acceptable process for something as important.

Mozilla (and various experts) also note, correctly, that if a CA gets compromised, this affects the entire world – the traffic to any website can be sniffed through man-in-the-middle attacks. And this has happened before. The Electronic Frontier Foundation, a respected digital rights organization, also objected to the approach.

Then Mozilla launched a campaign website against the amendment, which has the wrong tone and has some gross oversimplifications and factually incorrect statements (for example, it’s not true that QTSPs are not independently vetted). Then the European Signature Dialog (basically, an association of EU CAs, called QTSPs – qualified trust service providers), responded to it in a similarly inappropriate way. It said “Mozilla is generally perceived as a Google satellite, paving the way for Google to push through its own commercial interests” (which is false, but let’s not go into that).

The statements that QWACs are better against phishing, is arguably not true, even if you consult the paper that the ESD linked. It says: “Our analysis shows that it is generally impossible to differentiate between benign sites and phishing sites based on the content of their certificates alone. However, we present empirical evidence that current phishing websites for popular targets do typically not replicate the issuer and subject information”. So the fact that phishing sites don’t bother using EVs is somehow a reasons that EVs(QWACs) help against phishing? I’m disappointed by this ESD piece – they know better. Mozilla also knows better, as this negative campaign website introduces a tone that’s not constructive.

Insufficient assessment

What becomes apparent from the impact assessment study, another study, and the subsequent impact assessment is that there have been efforts to agree with browser vendors on including EU issued certificates without the CAs having to go through the root program process of the browsers, which they have refused.

I think this is not a good impact assessment. It does not assess impact. It doesn’t try to find out what will happen once this is passed, nor it tries to compare root programs with current ETSI standards to find the gaps. Neither the initial study, nor the impact assessment review the security aspects of the change.

For example, due the current usage patterns of QWACs for internal API-based communication between EU institutions, QWACs have sometimes been issued to private addresses (e.g. 192.186.xx.xx). Once they become automatically approved by the browsers, a security risk arises – what if I have a trusted certificate for your local router IP?

Also, it’s not a good process to include additional limitations in the trialogue, which is an informal process between the EU parliament, commission and council. I, as a Bulgarian member of parliament, requested from our government the drafts from the trialogue, and I was not granted access (due to EU rules). This is unacceptable as a legislative process, which should be fully transparent.

I have criticized this process and insufficient impact assessments before – for the copyright directive introduction of a requirement for automated content takedown, and for the introduction of mandatory fingerprints in ID cards.. There just doesn’t seem to be enough technical justification for regulations that have a very significant technical impact – not just in the EU, but in the world (as browsers have a global trusted CA list, not a EU one).

Technical or political debate?

The debate, as it seems, has conflated two separate issues – the technical and the political one. What Mozilla (and I presume other browser vendors) are implying is that they are responsible for the security in their browsers and they should be able to enforce security rules, while the EU is saying – private US entities (for-profit or non-profit) cannot have full control over who gets trusted and who doesn’t. Both are valid arguments. The EU seems to be pursuing a digital sovereignty agenda here, which, strategically, is a good idea.

But the question is whether that’s the best approach, and if not – how to improve it.

Some data and an anecdote to further illustrate the status quo. The Certinomis French QTSP (CA) has been distrusted by Mozilla a while ago. It is, however, on the EU trusted list. With the changes, Mozilla and others should re-trust it. The concerns raised by Mozilla seem legitimate, and so the fact that EU conformity assessment bodies do regular audits may not be sufficient for the purposes of web security (but this assumption needs to also be critically assessed).

Currently, there are 146 root/subordinate CA certificates listed for QWAC QTSPs. Of those 146, 115 are included in one or more root trust stores, and between 57 and 80 are missing from one or more. It’s far from an ideal picture, but these numbers should have been in the impact assessment and the problem statement in the first place. So that the legislators can identify the reasons for not including a CA in one or more root program. Is it a technical shortcoming of the CA, or it it a vendor discretion? Certainly, there doesn’t seem to be a ban on EU CAs/QTSPs by browser vendors.

So, essentially, the political results here is that EU CAs will get a fast-track into trust stores. I’m sure other jurisdictions will try to pass similar legislation, which will complicate the scene even further. Some of them will not be as democratic as the EU. And if a browser vendor thinks some CA is not trustworthy by their standards, they may come up with very clever workarounds of the regulation.

The first one – ignoring it, as there are no fines. But they can also introduce different paddock colors for different cases – e.g. a QWAC from a browser-approved CA gets a green paddock, a QWAC that has not passed the browser root program gets a yellow paddock. Compared to the current grey one, the yellow color may be perceived as less trustworthy. And then we’ll have to argue whether yellow is a clear enough indication and whether it shows trust or not so much. Time and time again I have stated that you can’t really regulate exact features and UI.

A technical solution to the political question?

I’ve heard many times that technical solutions to political problems are wrong. And I’ve seen many cases where, if you delve into the right level of detail, there is a solution that is both technically good and serves the political goal. In this case this is the so-called “Certificate transparency”. In a nutshell, each certificate, before being issued, is placed in a public verifiable data structure (merkle tree – used in blockchain implementations), and gets a signed certificate timestamp (SCT) as a response, which is then included as an X.509 attribute. This mitigates the risk of compromised CAs issuing certificates for websites that do not belong to the one requesting the certificate. In no time (up to 24 hours) they will be caught and distrusted, which raises the bar significantly.

Unfortunately, ETSI hasn’t included Certificate transparency in the current QWAC standard. The ESD association mentioned above says in another document that “The Browsers can easily bring any additional rules they want to impose on QTSPs such as Certificate Transparency to ETSI and other international standards bodies to be adopted through an open process of consensus by the internet community”, which I think is the wrong approach – Certificate transparency is an IETF RFC and is a de-facto standard. What will surely help is moving it (they are actually 2 RFCs) out of the “Experimental” status in IETF, but we can’t require every standard to be mirrored by ETSI.

I don’t know why CT has not been referenced by ETSI so far. It’s true that certificate log servers are based mostly in the US (and one in China), but nothing stops an organization from running a CT log. It “just” takes some infrastructure and bandwidth to support the load, but I think it’s a price EU QTSPs can pay, e.g. by sharing the costs for a couple of CT logs.

As a sidenote, I think it’s worth noting that the EU can also do more towards the adoption of DANE – a standard, which gets rid of CAs, as the public key is stored in a DNS record. It relies on DNSSEC, which doesn’t have huge adoption yet, and both are trickier to implement than it sounds, but if we want to be independent from browser decisions on which CA to trust, we can remove CAs from the trust equation. I’m fully aware it’s far from simple, and we’ll have to support PKI/CAs for a long, long time, but it’s a valid policy direction – mandate DNSSEC and DANE support.

Conclusion

Certificate transparency requirements can be added now to the eIDAS Annex IV. We are too late in the legislation process for that to be done smoothly, but I’d appreciate if the legislative bodies tried it. It would be just one additional point – “(k) details about the inclusion of the certificate in a public ledger” (note that the same eIDAS 2.0 regulates ledgers, and a CT log is a ledger, so that can be leveraged).

If not in Annex IV, then I’d strongly suggest including CT in the next version of the relevant ETSI standard. And I think it would be good if the EU Commission or ETSI to do a gap analysis of current root programs and the current ETSI standards to see if something important is missing.

Furthermore, the European Commission should initiate a series of objective studies on the effectiveness of extended validation certificates. The studies that I’ve read are not in favour of the EV/QWAC approach, but if we are to argue EV/QWACs are worth it, we need better justification.

A compromise is possible, that would make browsers confident that there will be no rogue CAs while at the same time giving Europe a say over trust in the web.

The post eIDAS 2.0, QWACs And The Security Of The Web appeared first on Bozho's tech blog.

[$] LWN’s guide to 2024

Post Syndicated from corbet original https://lwn.net/Articles/954544/

The calendar has flipped over into 2024 — another year has begun. Here at
LWN, we do not have a better idea of what this year will bring than anybody
else does, but that doesn’t keep us from going out on a shaky limb and
making predictions anyway. Here, for the curious, are a few things that we
think may be in store for 2024.

Dealing with weird ELF libraries

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/69070.html

Libraries are collections of code that are intended to be usable by multiple consumers (if you’re interested in the etymology, watch this video). In the old days we had what we now refer to as “static” libraries, collections of code that existed on disk but which would be copied into newly compiled binaries. We’ve moved beyond that, thankfully, and now make use of what we call “dynamic” or “shared” libraries – instead of the code being copied into the binary, a reference to the library function is incorporated, and at runtime the code is mapped from the on-disk copy of the shared object[1]. This allows libraries to be upgraded without needing to modify the binaries using them, and if multiple applications are using the same library at once it only requires that one copy of the code be kept in RAM.

But for this to work, two things are necessary: when we build a binary, there has to be a way to reference the relevant library functions in the binary; and when we run a binary, the library code needs to be mapped into the process.

(I’m going to somewhat simplify the explanations from here on – things like symbol versioning make this a bit more complicated but aren’t strictly relevant to what I was working on here)

For the first of these, the goal is to replace a call to a function (eg, printf()) with a reference to the actual implementation. This is the job of the linker rather than the compiler (eg, if you use the -c argument to tell gcc to simply compile to an object rather than linking an executable, it’s not going to care about whether or not every function called in your code actually exists or not – that’ll be figured out when you link all the objects together), and the linker needs to know which symbols (which aren’t just functions – libraries can export variables or structures and so on) are available in which libraries. You give the linker a list of libraries, it extracts the symbols available, and resolves the references in your code with references to the library.

But how is that information extracted? Each ELF object has a fixed-size header that contains references to various things, including a reference to a list of “section headers”. Each section has a name and a type, but the ones we’re interested in are .dynstr and .dynsym. .dynstr contains a list of strings, representing the name of each exported symbol. .dynsym is where things get more interesting – it’s a list of structs that contain information about each symbol. This includes a bunch of fairly complicated stuff that you need to care about if you’re actually writing a linker, but the relevant entries for this discussion are an index into .dynstr (which means the .dynsym entry isn’t sufficient to know the name of a symbol, you need to extract that from .dynstr), along with the location of that symbol within the library. The linker can parse this information and obtain a list of symbol names and addresses, and can now replace the call to printf() with a reference to libc instead.

(Note that it’s not possible to simply encode this as “Call this address in this library” – if the library is rebuilt or is a different version, the function could move to a different location)

Experimentally, .dynstr and .dynsym appear to be sufficient for linking a dynamic library at build time – there are other sections related to dynamic linking, but you can link against a library that’s missing them. Runtime is where things get more complicated.

When you run a binary that makes use of dynamic libraries, the code from those libraries needs to be mapped into the resulting process. This is the job of the runtime dynamic linker, or RTLD[2]. The RTLD needs to open every library the process requires, map the relevant code into the process’s address space, and then rewrite the references in the binary into calls to the library code. This requires more information than is present in .dynstr and .dynsym – at the very least, it needs to know the list of required libraries.

There’s a separate section called .dynamic that contains another list of structures, and it’s the data here that’s used for this purpose. For example, .dynamic contains a bunch of entries of type DT_NEEDED – this is the list of libraries that an executable requires. There’s also a bunch of other stuff that’s required to actually make all of this work, but the only thing I’m going to touch on is DT_HASH. Doing all this re-linking at runtime involves resolving the locations of a large number of symbols, and if the only way you can do that is by reading a list from .dynsym and then looking up every name in .dynstr that’s going to take some time. The DT_HASH entry points to a hash table – the RTLD hashes the symbol name it’s trying to resolve, looks it up in that hash table, and gets the symbol entry directly (it still needs to resolve that against .dynstr to make sure it hasn’t hit a hash collision – if it has it needs to look up the next hash entry, but this is still generally faster than walking the entire .dynsym list to find the relevant symbol). There’s also DT_GNU_HASH which fulfills the same purpose as DT_HASH but uses a more complicated algorithm that preforms even better. .dynamic also contains entries pointing at .dynstr and .dynsym, which seems redundant but will become relevant shortly.

So, .dynsym and .dynstr are required at build time, and bother are required along with .dynamic at runtime. This seems simple enough, but obviously there’s a twist and I’m sorry it’s taken so long to get to this point.

I bought a Synology NAS for home backup purposes (my previous solution was a single external USB drive plugged into a small server, which had uncomfortable single point of failure properties). Obviously I decided to poke around at it, and I found something odd – all the libraries Synology ships were entirely lacking any ELF section headers. This meant no .dynstr, .dynsym or .dynamic sections, so how was any of this working? nm asserted that the libraries exported no symbols, and readelf agreed. If I wrote a small app that called a function in one of the libraries and built it, gcc complained that the function was undefined. But executables on the device were clearly resolving the symbols at runtime, and if I loaded them into ghidra the exported functions were visible. If I dlopen()ed them, dlsym() couldn’t resolve the symbols – but if I hardcoded the offset into my code, I could call them directly.

Things finally made sense when I discovered that if I passed the --use-dynamic argument to readelf, I did get a list of exported symbols. It turns out that ELF is weirder than I realised. As well as the aforementioned section headers, ELF objects also include a set of program headers. One of the program header types is PT_DYNAMIC. This typically points to the same data that’s present in the .dynamic section. Remember when I mentioned that .dynamic contained references to .dynsym and .dynstr? This means that simply pointing at .dynamic is sufficient, there’s no need to have separate entries for them.

The same information can be reached from two different locations. The information in the section headers is used at build time, and the information in the program headers at run time[3]. I do not have an explanation for this. But if the information is present in two places, it seems obvious that it should be able to reconstruct the missing section headers in my weird libraries? So that’s what this does. It extracts information from the DYNAMIC entry in the program headers and creates equivalent section headers.

There’s one thing that makes this more difficult than it might seem. The section header for .dynsym has to contain the number of symbols present in the section. And that information doesn’t directly exist in DYNAMIC – to figure out how many symbols exist, you’re expected to walk the hash tables and keep track of the largest number you’ve seen. Since every symbol has to be referenced in the hash table, once you’ve hit every entry the largest number is the number of exported symbols. This seemed annoying to implement, so instead I cheated, added code to simply pass in the number of symbols on the command line, and then just parsed the output of readelf against the original binaries to extract that information and pass it to my tool.

Somehow, this worked. I now have a bunch of library files that I can link into my own binaries to make it easier to figure out how various things on the Synology work. Now, could someone explain (a) why this information is present in two locations, and (b) why the build-time linker and run-time linker disagree on the canonical source of truth?

[1] “Shared object” is the source of the .so filename extension used in various Unix-style operating systems
[2] You’ll note that “RTLD” is not an acryonym for “runtime dynamic linker”, because reasons
[3] For environments using the GNU RTLD, at least – I have no idea whether this is the case in all ELF environments

comment count unavailable comments