Detect Python and Java code security vulnerabilities with Amazon CodeGuru Reviewer

Post Syndicated from Haider Naqvi original https://aws.amazon.com/blogs/devops/detect-python-and-java-code-security-vulnerabilities-with-codeguru-reviewer/

with Aaron Friedman (Principal PM-T for xGuru services)

Amazon CodeGuru is a developer tool that uses machine learning and automated reasoning to catch hard to find defects and security vulnerabilities in application code. The purpose of this blog is to show how new CodeGuru Reviewer features help improve the security posture of your Python applications and highlight some of the specific categories of code vulnerabilities that CodeGuru Reviewer can detect. We will also cover newly expanded security capabilities for Java applications.

Amazon CodeGuru Reviewer can detect code vulnerabilities and provide actionable recommendations across dozens of the most common and impactful categories of code security issues (as classified by industry-recognized standards, Open Web Application Security, OWASP , “top ten” and Common Weakness Enumeration, CWE. The following are some of the most severe code vulnerabilities that CodeGuru Reviewer can now help you detect and prevent:

  • Injection weaknesses typically appears in data-rich applications. Every year, hundreds of web servers are compromised using SQL Injection. An attacker can use this method to bypass access control and read or modify application data.
  • Path Traversal security issues occur when an application does not properly handle special elements within a provided path name. An attacker can use it to overwrite or delete critical files and expose sensitive data.
  • Null Pointer Dereference issues can occur due to simple programming errors, race conditions, and others. Null Pointer Dereference can cause availability issues in rare conditions. Attackers can use it to read and modify memory.
  • Weak or broken cryptography is a risk that may compromise the confidentiality and integrity of sensitive data.

Security vulnerabilities present in source code can result in application downtime, leaked data, lost revenue, and lost customer trust. Best practices and peer code reviews aren’t sufficient to prevent these issues. You need a systematic way of detecting and preventing vulnerabilities from being deployed to production. CodeGuru Reviewer Security Detectors can provide a scalable approach to DevSecOps, a mechanism that employs automation to address security issues early in the software development lifecycle. Security detectors automate the detection of hard-to-find security vulnerabilities in Java and now Python applications, and provide actionable recommendations to developers.

By baking security mechanisms into each step of the process, DevSecOps enables the development of secure software without sacrificing speed. However, false positive issues raised by Static Application Security Testing (SAST) tools often must be manually triaged effectively and work against this value. CodeGuru uses techniques from automated reasoning, and specifically, precise data-flow analysis, to enhance the precision of code analysis. CodeGuru therefore reports fewer false positives.

Many customers are already embracing open-source code analysis tools in their DevSecOps practices. However, integrating such software into a pipeline requires a heavy up front lift, ongoing maintenance, and patience to configure. Furthering the utility of new security detectors in Amazon CodeGuru Reviewer, this update adds integrations with Bandit  and Infer, two widely-adopted open-source code analysis tools. In Java code bases, CodeGuru Reviewer now provide recommendations from Infer that detect null pointer dereferences, thread safety violations and improper use of synchronization locks. And in Python code, the service detects instances of SQL injection, path traversal attacks, weak cryptography, or the use of compromised libraries. Security issues found and recommendations generated by these tools are shown in the console, in pull requests comments, or through CI/CD integrations, alongside code recommendations generated by CodeGuru’s code quality and security detectors. Let’s dive deep and review some examples of code vulnerabilities that CodeGuru Reviewer can help detect.

Injection (Python)

Amazon CodeGuru Reviewer can help detect the most common injection vulnerabilities including SQL, XML, OS command, and LDAP types. For example, SQL injection occurs when SQL queries are constructed through string formatting. An attacker could manipulate the program inputs to modify the intent of the SQL query. The following python statement executes a SQL query constructed through string formatting and can be an attack vector:

import sqlite3
from flask import request

def removing_product():
    productId = request.args.get('productId')
    str = 'DELETE FROM products WHERE productID = ' + productId
    return str

def sql_injection():
    connection = psycopg2.connect("dbname=test user=postgres")
    cur = db.cursor()
    query = removing_product()
    cur.execute(query)

CodeGuru will flag a potential SQL injection using Bandit security detector, will make the following recommendation:

>> We detected a SQL command that might use unsanitized input. 
This can result in an SQL injection. To increase the security of your code, 
sanitize inputs before using them to form a query string.

To avoid this, the user should correct the code to use a parameter sanitization mechanism that guards against SQL injection as done below:

import sqlite3
from flask import request

def removing_product():
    productId = sanitize_input(request.args.get('productId'))
    str = 'DELETE FROM products WHERE productID = ' + productId
    return str

def sql_injection():
    connection = psycopg2.connect("dbname=test user=postgres")
    cur = db.cursor()
    query = removing_product()
    cur.execute(query)

In the above corrected code, user supplied sanitize_input method will take care of sanitizing user inputs.

Path Traversal (Python)

When applications use user input to create a path to read or write local files, an attacker can manipulate the input to overwrite or delete critical files or expose sensitive data. These critical files might include source code, sensitive, or application configuration information.

@app.route('/someurl')
def path_traversal():
    file_name = request.args["file"]
    f = open("./{}".format(file_name))
    f.close()

In above example, file name is directly passed to an open API without checking or filtering its content.

CodeGuru’s recommendation:

>> Potentially untrusted inputs are used to access a file path.
To protect your code from a path traversal attack, verify that your inputs are
sanitized.

In response, the developer should sanitize data before using it for creating/opening file.

@app.route('/someurl')
def path_traversal():
file_name = sanitize_data(request.args["file"])
f = open("./{}".format(file_name))
f.close()

In this modified code, input data file_name has been clean/filtered by sanitized_data api call.

Null Pointer Dereference (Java)

Infer detectors are a new addition that complement CodeGuru Reviewer native Java Security Detectors. Infer detectors, based on the Facebook Infer static analyzer, include rules to detect null pointer dereferences, thread safety violations, and improper use of synchronization locks. In particular, the null-pointer-dereference rule detects paths in the code that lead to null pointer exceptions in Java. Null pointer dereference is a very common pitfall in Java and is considered one of 25 most dangerous software weaknesses.

The Infer null-pointer-dereference rule guards against unexpected null pointer exceptions by detecting locations in the code where pointers that could be null are dereferenced. CodeGuru augments the Infer analyzer with knowledge about the AWS APIs, which allows the security detectors to catch potential null pointer exceptions when using AWS APIs.

For example, the AWS DynamoDBMapper class provides a convenient abstraction for mapping Amazon DynamoDB tables to Java objects. However, developers should be aware that DynamoDB Mapper load operations can return a null pointer if the object was not found in the table. The following code snippet updates a record in a catalog using a DynamoDB Mapper:

DynamoDBMapper mapper = new DynamoDBMapper(client);
// Retrieve the item.
CatalogItem itemRetrieved = mapper.load(
CatalogItem.class, 601);
// Update the item.
itemRetrieved.setISBN("622-2222222222");
itemRetrieved.setBookAuthors(
new HashSet<String>(Arrays.asList(
"Author1", "Author3")));
mapper.save(itemRetrieved);

CodeGuru will protect against a potential null dereference by making the following recommendation:

object `itemRetrieved` last assigned on line 88 could be null
and is dereferenced at line 90.

In response, the developer should add a null check to prevent the null pointer dereference from occurring.

DynamoDBMapper mapper = new DynamoDBMapper(client);
// Retrieve the item.
CatalogItem itemRetrieved = mapper.load(CatalogItem.class, 601);
// Update the item.
if (itemRetrieved != null) {
itemRetrieved.setISBN("622-2222222222");
itemRetrieved.setBookAuthors(
new HashSet<String>(Arrays.asList(
"Author1","Author3")));
mapper.save(itemRetrieved);
} else {
throw new CatalogItemNotFoundException();
}

Weak or broken cryptography  (Python)

Python security detectors support popular frameworks along with built-in APIs such as cryptography, pycryptodome etc. to identify ciphers related vulnerability. As suggested in CWE-327 , the use of a non-standard/inadequate key length algorithm is dangerous because attacker may be able to break the algorithm and compromise whatever data has been protected. In this example, `PBKDF2` is used with a weak algorithm and may lead to cryptographic vulnerabilities.
from Crypto.Protocol.KDF import PBKDF2
from Crypto.Hash import SHA1

def risky_crypto_algorithm(password):
salt = get_random_bytes(16)
keys = PBKDF2(password, salt, 64, count=1000000,
hmac_hash_module=SHA1)

SHA1 is used to create a PBKDF2, however, it is insecure hence not recommended for PBKDF2. CodeGuru’s identifies the issue and makes the following recommendation:

>> The `PBKDF2` function is using a weak algorithm which might
lead to cryptographic vulnerabilities. We recommend that you use the
`SHA224`, `SHA256`, `SHA384`,`SHA512/224`, `SHA512/256`, `BLAKE2s`,
`BLAKE2b`, `SHAKE128`, `SHAKE256` algorithms.

In response, the developer should use the correct SHA algorithm to protect against potential cipher attacks.

from Crypto.Protocol.KDF import PBKDF2
from Crypto.Hash import SHA512

def risky_crypto_algorithm(password):
salt = get_random_bytes(16)
keys = PBKDF2(password, salt, 64, count=1000000,
hmac_hash_module=SHA512)

This modified example uses high strength SHA512 algorithm.

Conclusion

This post reviewed Amazon CodeGuru Reviewer security detectors and how they automatically check your code for vulnerabilities and provide actionable recommendations in code reviews. We covered new capabilities for detecting issues in Python applications, as well as additional security features from Bandit and Infer. Together CodeGuru Reviewer’s security features provide a scalable approach for customers embracing DevSecOps, a mechanism that requires automation to address security issues earlier in the software development lifecycle. CodeGuru automates detection and helps prevent hard-to-find security vulnerabilities, accelerating DevSecOps processes for application development workflow.

You can get started from the CodeGuru console by running a full repository scan or integrating CodeGuru Reviewer with your supported CI/CD pipeline. Code analysis from Infer and Bandit is included as part of the standard CodeGuru Reviewer service.

For more information about automating code reviews and application profiling with Amazon CodeGuru, check out the AWS DevOps Blog. For more details on how to get started, visit the documentation.

Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application

Post Syndicated from Andrew Iwamaye original https://blog.rapid7.com/2021/10/28/sneaking-through-windows-infostealer-malware-masquerades-as-windows-application/

Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application

This post also includes contributions from Reese Lewis, Andrew Christian, and Seth Lazarus.

Rapid7’s Managed Detection and Response (MDR) team leverages specialized toolsets, malware analysis, tradecraft, and collaboration with our colleagues on the Threat Intelligence and Detection Engineering (TIDE) team to detect and remediate threats.

Recently, we identified a malware campaign whose payload installs itself as a Windows application after delivery via a browser ad service and bypasses User Account Control (UAC) by abusing a Windows environment variable and a native scheduled task to ensure it persistently executes with elevated privileges. The malware is classified as a stealer, which intends to steal sensitive data from an infected asset (such as browser credentials and cryptocurrency), prevent browser updates, and allow for arbitrary command execution.

Detection

The MDR SOC first became aware of this malware campaign upon analysis of “UAC Bypass – Disk Cleanup Utility” and “Suspicious Process – TaskKill Multiple Times” alerts (authored by Rapid7’s TIDE team) within Rapid7’s InsightIDR platform.

As the “UAC Bypass – Disk Cleanup Utility” name implies, the alert identified a possible UAC bypass using the Disk Cleanup utility due to a vulnerability in some versions of Windows 10 that allows a native scheduled task to execute arbitrary code by modifying the content of an environment variable. Specifically, the alert detected a PowerShell command spawned by a suspicious executable named HoxLuSfo.exe. We determined that HoxLuSfo.exe was spawned by sihost.exe, a background process that launches and maintains the Windows action and notification centers.

Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 1: PowerShell command identified by Rapid7’s MDR on infected assets

We determined the purpose of the PowerShell command was, after sleeping, to attempt to perform a Disk Cleanup Utility UAC bypass. The command works because, on some Windows systems, it is possible for the Disk Cleanup Utility to run via the native scheduled task “SilentCleanup” that, when triggered, executes the following command with elevated privileges:

%windir%\system32\cleanmgr.exe /autoclean /d %systemdrive%

The PowerShell command exploited the use of the environment variable %windir% in the path specified in the “SilentCleanup” scheduled task by altering the value set for the environment variable %windir%. Specifically, the PowerShell command deleted the existing %windir% environment variable and replaced it with a new %windir% environment variable set to:

%LOCALAPPDATA%\Microsoft\OneDrive\setup\st.exe REM

The environment variable replacement therefore configured the scheduled task “SilentCleanup” to execute the following command whenever the task “SilentCleanup” was triggered:

%LOCALAPPDATA%\Microsoft\OneDrive\setup\st.exe REM\system32\cleanmgr.exe /autoclean /d %systemdrive%

The binary st.exe was a copied version of HoxLuSfo.exe from the file path C:\Program Files\WindowsApps\3b76099d-e6e0-4e86-bed1-100cc5fa699f_113.0.2.0_neutral__7afzw0tp1da5e\HoxLuSfo\.

The trailing “REM” at the end of the Registry entry commented out the rest of the native command for the “SilentCleanup” scheduled task, effectively configuring the task to execute:

%LOCALAPPDATA%\Microsoft\OneDrive\setup\st.exe

After making the changes to the %windir% environment variable, the PowerShell command ran the “SilentCleanup” scheduled task, thereby hijacking the “SilentCleanup” scheduled task to run st.exe with elevated privileges.

The alert for “Suspicious Process – TaskKill Multiple Times” later detected st.exe spawning multiple commands attempting to kill any process named Google*, MicrosoftEdge*, or setu*.

Analysis of HoxLuSfo.exe

Rapid7’s MDR could not remotely acquire the files HoxLuSfo.exe and st.exe from the infected assets because they were no longer present at the time of the investigation. However, we obtained a copy of the executable from VirusTotal based on its MD5 hash, 1cc0536ae396eba7fbde9f35dc2fc8e3.

Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 2: Overview of HoxLuSfo.exe (originally named TorE.exe) within dnSpy and its partially obfuscated contents.

Rapid7’s MDR concluded that HoxLuSfo.exe had the following characteristics and behaviors:

  • 32-bit Microsoft Visual Studio .NET executable containing obfuscated code
  • Originally named TorE.exe
  • At the time of writing, only 10 antivirus solutions detected HoxLuSfo.exe as malicious

Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 3: Low detection rate for HoxLuSfo.exe on VirusTotal

  • Fingerprints the infected asset
  • Drops and leverages a 32-bit Microsoft Visual Studio .NET DLL, JiLuT64.dll (MD5: 14ff402962ad21b78ae0b4c43cd1f194), which is an Agile .NET obfuscator signed by SecureTeam Software Ltd, likely to (de)obfuscate contents
  • Modifies the hosts file on the infected asset to prevent correct resolution of common browser update URLs to prevent browser updates

Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 4: Modifications made to the hosts file on infected assets
  • Enumerates installed browsers and steals credentials from installed browsers
  • Kills processes named Google*, MicrosoftEdge*, setu*
  • Contains functionality to steal cryptocurrency
  • Contains functionality for the execution of arbitrary commands on the infected asset
Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 5: Sample of the functionality within HoxLuSfo.exe to execute arbitrary commands

  • Communicates with s1.cleancrack[.]tech and s4.cleancrack[.]tech (both of which resolve to 172.67.187[.]162 and 104.21.92[.]68 at the time of analysis) via AES-encrypted messages with a key of e84ad660c4721ae0e84ad660c4721ae0. The encryption scheme employed appears to be reused code from here.
  • Has a PDB path of E:\msix\ChromeRceADMIN4CB\TorE\obj\Release\TorE.pdb.

Rapid7’s MDR interacted with s4.cleancrack[.]tech and discovered what appears to be a login portal for the attacker to access stolen data.

Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 6: Login page hosted at hXXps://s4.cleancrack[.]tech/login

Source of infection

Rapid7’s MDR observed the execution of chrome.exe just prior to HoxLuSfo.exe spawning the PowerShell command we detected with our alert.

In one of our investigations, our analysis of the user’s Chrome browser history file showed redirects to suspicious domains before initial infection:
hXXps://getredd[.]biz/ →
hXXps://eu.postsupport[.]net →
hXXp://updateslives[.]com/

In another investigation, DNS logs showed a redirect chain that followed a similar pattern:
hXXps://getblackk[.]biz/ →
hXXps://eu.postsupport[.]net →
hXXp://updateslives[.]com/ →
hXXps://chromesupdate[.]com

In the first investigation, the user’s Chrome profile revealed that the site permission settings for a suspicious domain, birchlerarroyo[.]com, were altered just prior to the redirects. Specifically, the user granted permission to the site hosted at birchlerarroyo[.]com to send notifications to the user.

Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 7: Notifications enabled for birchlerarroyo[.]com within the user’s site settings of Chrome

Rapid7’s MDR visited the website hosted at birchlerarroyo[.]com and found that the website presented a browser notification requesting permission to show notifications to the user.

Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 8: Website hosted at birchlerarroyo[.]com requesting permission to show notifications to the user

We suspect that the website hosted at birchlerarroyo[.]com was compromised, as its source code contained a reference to a suspicious JavaScript file hosted at fastred[.]biz:

Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 9: Suspicious JavaScript file found within the source code of the website hosted at birchlerarroyo[.]com

We determined that the JavaScript file hosted at fastred[.]biz was responsible for the notification observed at birchlerarroyo[.]com via the code in Figure 10.

Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 10: Partial contents of the JavaScript file hosted at fastred[.]biz

Pivoting off of the string “Код RedPush” within the source code of birchlerarroyo[.]com (see highlighted lines in Figure 9), as well as the workerName and applicationServerKey settings within the JavaScript file in Figure 10, Rapid7’s MDR discovered additional websites containing similar source code: ostoday[.]com and magnetline[.]ru.

Rapid7’s MDR analyzed the websites hosted at each of birchlerarroyo[.]com, ostoday[.]com, and magnetline[.]ru and found that each:

  • Displayed the same type of browser notification shown in Figure 8
  • Was built using WordPress and employed the same WordPress plugin, “WP Rocket”
  • Had source code that referred to similar Javascript files hosted at either fastred[.]biz or clickmatters[.]biz and the JavaScript files had the same applicationServerKey: BIbjCoVklTIiXYjv3Z5WS9oemREJPCOFVHwpAxQphYoA5FOTzG-xOq6GiK31R-NF--qzgT3_C2jurmRX_N6nY4g

Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 11: Partial contents of the JavaScript file hosted at clickmatters[.]biz. The unicode in the “text” key decodes to “Нажмите \”Разрешить\”, чтобы получать уведомления”, which translates to “Click \ “Allow \” to receive notifications”.

  • Had source code that contained a similar rbConfig parameter referencing takiparkrb[.]site and a varying rotator value

Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 12: Example rbConfig parameter found in website source code

  • Had source code that contained references to either “Код RedPush” (translates to “Redpush code”), “Код РБ” (translates to “CodeRB”), or “Код нативного ПУШа RB” (translates to “Native PUSH code RB”)

Pivoting off of the similar strings of “CodeRB” and “Redpush” within source code led to other findings.

First, Rapid7’s MDR discovered an advertising business, RedPush (see redpush[.]biz). RedPush provides its customers with advertisement code to host on customers’ websites. The code produces pop-up notifications to allow for advertisements to be pushed to users browsing the customers’ websites. RedPush’s customers make a profit based on the number of advertisement clicks generated from their websites that contain RedPush’s code.

Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 13: Summary of RedPush’s ad delivery model via push notifications

Second, Rapid7’s MDR discovered a publication by Malwaretips describing a browser pop-up malware family known as Redpush. Upon visiting a website compromised with Redpush code, the code presents a browser notification requesting permission to send notifications to the user. After the user grants permission, the compromised site appears to gain the ability to push toast notifications, which could range from spam advertisements to notifications for malicious fake software updates. Similar publications by McAfee here and here describe that threat actors have recently been employing toast notifications that advertise fake software updates to trick users into installing malicious Windows applications.

Rapid7’s MDR could not reproduce a push of a malicious software after visiting the compromised website at birchlerarroyo[.]com, possibly for several reasons:

  • Notification-enabled sites may send notifications at varying frequencies, as explained here, and varying times of day.
  • Malicious packages are known to be selectively pushed to users based on geolocation, as explained here. (Note: Rapid7’s MDR interacted with the website using IP addresses having varying geolocations in North America and Europe.)
  • The malware was no longer being served at the time of investigation.

However, the malware delivery techniques described by Malwaretips and McAfee were likely employed to trick the users in our investigations into installing the malware while they were browsing the Internet. As explained in the “Forensic analysis” section, in one of our investigations, there was evidence of an initial toast notification, a fake update masquerade, and installation of a malicious Windows application. Additionally, the grandparent process of the PowerShell command we detected, sihost.exe, indicated to us that the malware may have leveraged the Windows Notification Center during the infection chain.

Forensic analysis

Analysis of the User’s Chrome profile and Microsoft-Windows-PushNotifications-Platform Windows Event Logs suggests that upon the user enabling notifications to be sent from the compromised site at birchlerarroyo[.]com, the user was presented with and cleared a toast notification. We could not determine what the contents of the toast notification were based on available evidence.

Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 14: Windows Event Log for the user clearing a toast notification to proceed with the malware’s infection chain

Based on our analysis of timestamp evidence, the user was likely directed to each of getredd[.]biz, postsupport[.]net, and updateslives[.]com after clicking the toast notification, and presented a fake update webpage.

Similar to the infection mechanism described by McAfee, the installation path of the malware on disk within C:\Program Files\WindowsApps\ suggests that the users were tricked into installing a malicious Windows application. The Microsoft-Windows-AppXDeploymentServerOperational and Microsoft-Windows-AppxPackagingOperational Windows Event logs contained suspicious entries confirming installation of the malware as a Windows application, as shown in Figures 15-19.

Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 15: Windows Event Log displaying the reading of the contents of a suspicious application package “3b76099d-e6e0-4e86-bed1-100cc5fa699f_113.0.2.0_neutral__7afzw0tp1da5e”
Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 16: Windows Event Log displaying the deployment of the application package and the passing of suspicious installation parameters to the application via App Installer, as explained here. (Note: Rapid7’s MDR noticed the value of the ran parameter changed across separate and distinct interactions with the threat actor’s infrastructure, suggesting the ran parameter may be employed for tracking purposes.)
Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 17: Windows Event Log that appears to show the URI installation parameter being processed by the application via App Installer
Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 18: Windows Event Log showing validation of the application package’s digital signature. See here for more information about signatures for Windows application packages
Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 19: Windows Event Log showing successful deployment of the application package

The events in Figures 15-19 illustrate that the malicious Windows application was distributed through the web with App Installer as a MSIX file, oelgfertgokejrgre.msix.

Analysis of oelgfertgokejrgre.msix

Rapid7’s MDR visited chromesupdate[.]com in a controlled environment and discovered that it was hosting a convincing Chrome-update-themed webpage.

Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 20: Lure hosted at chromesupdate[.]com

The website title, “Google Chrome – Download the Fast, Secure Browser from Google,” was consistent with those we observed of the redirect URLs getredd[.]biz, postsupport[.]net, and updateslives[.]com. The users in our investigations likely arrived at the website in Figure 20 after clicking a malicious toast notification, and proceeded to click the “Install” link presented on the website to initiate the Windows application installation.

The “Install” link presented at the website led to a Windows application installer URL (similar to that seen in Figure 17), which is consistent with MSIX distribution via the web.

Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 21: Portion of the source code of the webpage hosted at chromesupdate[.]com showing a Windows application installer URL for a malicious MSIX package

Rapid7’s MDR obtained the MSIX file, oelgfertgokejrgre.msix, hosted at chromesupdate[.]com, and confirmed that it was a Windows application package.

Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 22: Extracted contents of oelgfertgokejrgre.msix

Analysis of the contents extracted from oelgfertgokejrgre.msix revealed the following notable characteristics and features:

  • Two files, HoxLuSfo.exe and JiLutime.dll, were contained within the HoxLuSfo subdirectory. JiLutime.dll (MD5: 60bb67ebcffed2f406ac741b1083dc80) was a 32-bit Agile .NET obfuscator DLL signed by SecureTeam Software Ltd, likely to (de)obfuscate contents.
  • The AppxManifest.xml file contained more references to the Windows application’s masquerade as a Google Chrome update, as well as details related to its package identity and signature.

Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 24: Partial contents of AppxManifest.xml

  • The DeroKuilSza.build.appxrecipe file contained strings that referenced a project “DeroKuilSza,” which is likely associated with the malware author.

Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 25: References to a “DeroKuilSza” project found within DeroKuilSza.build.appxrecipe

Our dynamic analysis of oelgfertgokejrgre.msix provided clarity around the malware’s installation process. Detonation of oelgfertgokejrgre.msix caused a Windows App Installer window to appear, which displayed information about a fake Google Chrome update.

Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 26: Windows App Installer window showing a fake Google Chrome update installation prompt

The information displayed to the user in Figure 26 is spoofed to masquerade as a legitimate Google Chrome update. The information correlates to the AppxManifest.xml configuration shown in Figure 24.

Once we proceeded with the installation, the MSIX package registered a notification sender via App Installer and immediately presented a notification to launch the fake Google Chrome update.

Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 27: Registration of App Installer as a notification sender and notification to launch the fake Google Chrome update

Since the malicious Windows application package installed by the MSIX file was not hosted on the Microsoft Store, a prompt is presented to enable installation of sideload applications, if not already enabled, to allow for installation of applications from unofficial sources.

Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 28: Requirement for sideload apps mode to be enabled to proceed with installation

Sneaking Through Windows: Infostealer Malware Masquerades as Windows Application
Figure 29: Menu presented to the user to enable sideload apps mode to complete the installation of the malware

The malware needs the enablement of “Sideload apps” to complete its installation.

Pulling off the mask

The malware we summarized in this blog post has several tricks up its sleeve. Its delivery mechanism via an ad service as a Windows application (which does not leave typical web-based download forensic artifacts behind), Windows application installation path, and UAC bypass technique by manipulation of an environment variable and native scheduled task can go undetected by various security solutions or even by a seasoned SOC analyst. Rapid7’s MDR customers can rest assured that, by leveraging our attacker behavior analytics detection methodology, our analysts will detect and respond to this infection chain before the malware can steal valuable data.

IOCs

Type Indicator
Domain Name updateslives[.]com
Domain Name getredd[.]biz
Domain Name postsupport[.]net
Domain Name eu.postsupport[.]net
Domain Name cleancrack[.]tech
Domain Name s1.cleancrack[.]tech
Domain Name s4.cleancrack[.]tech
Domain Name getblackk[.]biz
Domain Name chromesupdate[.]com
Domain Name fastred[.]biz
Domain Name clickmatters[.]biz
Domain Name takiparkrb[.]site
IP Address 172.67.187[.]162
IP Address 104.21.92[.]68
IP Address 104.21.4[.]200
IP Address 172.67.132[.]99
Directory C:\Program Files\WindowsApps\3b76099d-e6e0-4e86-bed1-100cc5fa699f_113.0.2.0_neutral__7afzw0tp1da5e\HoxLuSfo
Filepath C:\Program Files\WindowsApps\3b76099d-e6e0-4e86-bed1-100cc5fa699f_113.0.2.0_neutral__7afzw0tp1da5e\HoxLuSfo\HoxLuSfo.exe
Filename HoxLuSfo.exe
MD5 1cc0536ae396eba7fbde9f35dc2fc8e3
SHA1 b7ac2fd5108f69e90ad02a1c31f8b50ab4612aa6
SHA256 5dc8aa3c906a469e734540d1fea1549220c63505b5508e539e4a16b841902ed1
Filepath %USERPROFILE%\AppData\Local\Microsoft\OneDrive\setup\st.exe
Filename st.exe
Registry Value + Registry Data HKCU\Environment.%windir% –> %LOCALAPPDATA%\Microsoft\OneDrive\setup\st.exe
Filename oelgfertgokejrgre.msix
MD5 6860c43374ad280c3927b16af66e3593
SHA1 94658e04988b02c395402992f46f1e975f9440e1
SHA256 0a127dfa75ecdc85e88810809c94231949606d93d232f40dad9823d3ac09b767

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

How Parametric Built Audit Surveillance using AWS Data Lake Architecture

Post Syndicated from Raghavarao Sodabathina original https://aws.amazon.com/blogs/architecture/how-parametric-built-audit-surveillance-using-aws-data-lake-architecture/

Parametric Portfolio Associates (Parametric), a wholly owned subsidiary of Morgan Stanley, is a registered investment adviser. Parametric provides investment advisory services to individual and institutional investors around the world. Parametric manages over 100,000 client portfolios with assets under management exceeding $400B (as of 9/30/21).

As a registered investment adviser, Parametric is subject to numerous regulatory requirements. The Parametric Compliance team conducts regular reviews on the firm’s portfolio management activities. To accomplish this, the organization needs both active and archived audit data to be readily available.

Parametric’s on-premises data lake solution was based on an MS-SQL server. They used an Apache Hadoop platform for their data storage, data management, and analytics. Significant gaps existed with the on-premises solution, which complicated audit processes. They were spending a large amount of effort on system maintenance, operational management, and software version upgrades. This required expensive consulting services and challenges with keeping the maintenance windows updated. This limited their agility, and also impacted their ability to derive more insights and value from their data. In an environment of rapid growth, adoption of more sophisticated analytics tools and processes has been slower to evolve.

In this blog post, we will show how Parametric implemented their Audit Surveillance Data Lake on AWS with purpose-built fully managed analytics services. With this solution, Parametric was able to respond to various audit requests within hours rather than days or weeks. This resulted in a system with a cost savings of 5x, with no data growth. Additionally, this new system can seamlessly support a 10x data growth.

Audit surveillance platform

The Parametric data management office (DMO) was previously running their data workloads using an on-premises data lake, which ran on the Hortonworks data platform of Apache Hadoop. This platform wasn’t up to date, and Parametric’s hardware was reaching end-of-life. Parametric was faced with a decision to either reinvest in their on-premises infrastructure or modernize their infrastructure using a modern data analytics platform on AWS. After doing a detailed cost/benefit analysis, the DMO calculated a 5x cost savings by using AWS. They decided to move forward and modernize with AWS due to these cost benefits, in addition to elasticity and security features.

The PPA compliance team asked the DMO to provide an enterprise data service to consume data from a data lake. This data was destined for downstream applications and ad-hoc data querying capabilities. It was accessed via standard JDBC tools and user-friendly business intelligence dashboards. The goal was to ensure that seven years of audit data would be readily available.

The DMO team worked with AWS to conceptualize an audit surveillance data platform architecture and help accelerate the implementation. They attended a series of AWS Immersion Days focusing on AWS fundamentals, Data Lakes, Devops, Amazon EMR, and serverless architectures. They later were involved in a four-day AWS Data Lab with AWS SMEs to create a data lake. The first use case in this Lab was creating the Audit Surveillance system on AWS.

Audit surveillance architecture on AWS

The following diagram shows the Audit Surveillance data lake architecture on AWS by using AWS purpose-built analytics services.

Figure 1. Audit Surveillance data lake architecture diagram

Figure 1. Audit Surveillance data lake architecture diagram

Architecture flow

  1. User personas: As first step, the DMO team identified three user personas for the Audit Surveillance system on AWS.
    • Data service compliance users who would like to consume audit surveillance data from the data lake into their respective applications through an enterprise data service.
    • Business users who would like to create business intelligence dashboards using a BI tool to audit data for compliance needs.
    • Complaince IT users who would like to perform ad-hoc queries on the data lake to perform analytics using an interactive query tool.
  2. Data ingestion: Data is ingested into Amazon Simple Storage Service (S3) from different on-premises data sources by using AWS Lake Formation blueprints. AWS Lake Formation provides workflows that define the data source and schedule to import data into the data lake. It is a container for AWS Glue crawlers, jobs, and triggers that are used to orchestrate the process to load and update the data lake.
  3. Data storage: Parametric used Amazon S3 as a data storage to build an Audit Surveillance data lake, as it has unmatched 11 nines of durability and 99.99% availability. The existing Hadoop storage was replaced with Amazon S3. The DMO team created a drop zone (raw), an analytics zone (transformed), and curated (enriched) storage layers for their data lake on AWS.
  4. Data cataloging: AWS Glue Data Catalog was the central catalog used to store and manage metadata for all datasets hosted in the Audit Surveillance data lake. The existing Hadoop metadata store was replaced with AWS Glue Data Catalog. AWS services such as AWS Glue, Amazon EMR, and Amazon Athena, natively integrate with AWS Glue Data Catalog.
  5. Data processing: Amazon EMR and AWS Glue process the raw data and places it into analytics zones (transformed) and curated zones (enriched) S3 buckets. Amazon EMR was used for big data processing and AWS Glue for standard ETL processes. AWS Lambda and AWS Step Functions were used to initiate monitoring and ETL processes.
  6. Data consumption: After Audit Surveillance data was transformed and enriched, the data was consumed by various personas within the firm as follows:
    • AWS Lambda and Amazon API Gateway were used to support consumption for data service compliance users.
    • Amazon QuickSight was used to create business intelligence dashboards for compliance business users.
    • Amazon Athena was used to query transformed and enriched data for compliance IT users.
  7. Security: AWS Key Management Service (KMS) customer managed keys were used for encryption at rest, and TLS for encryption at transition. Access to the encryption keys is controlled using AWS Identity and Access Management (IAM) and is monitored through detailed audit trails in AWS CloudTrail. Amazon CloudWatch was used for monitoring, and thresholds were created to determine when to send alerts.
  8. Governance: AWS IAM roles were attached to compliance users that permitted the administrator to grant access. This was only given to approved users or programs that went through authentication and authorization through AWS SSO. Access is logged and permissions can be granted or denied by the administrator. AWS Lake Formation is used for fine-grained access controls to grant/revoke permissions at the database, table, or column-level access.

Conclusion

The Parametric DMO team successfully replaced their on-premises Audit Surveillance Data Lake. They now have a modern, flexible, highly available, and scalable data platform on AWS, with purpose-built analytics services.

This change resulted in a 5x cost savings, and provides for a 10x data growth. There are now fast responses to internal and external audit requests (hours rather than days or weeks). This migration has given the company access to a wider breadth of AWS analytics services, which offers greater flexibility and options.

Maintaining the on-premises data lake would have required significant investment in both hardware upgrade costs and annual licensing and upgrade vendor consulting fees. Parametric’s decision to migrate their on-premises data lake has yielded proven cost benefits. And it has introduced new functions, service, and capabilities that were previously unavailable to Parametric DMO.

You may also achieve similar efficiencies and increase scalability by migrating on-premises data platforms into AWS. Read more and get started on building Data Lakes on AWS.

Data Engineers of Netflix — Interview with Pallavi Phadnis

Post Syndicated from Netflix Technology Blog original https://netflixtechblog.com/data-engineers-of-netflix-interview-with-pallavi-phadnis-a1fcc5f64906

Data Engineers of Netflix — Interview with Pallavi Phadnis

This post is part of our “Data Engineers of Netflix” series, where our very own data engineers talk about their journeys to Data Engineering @ Netflix.

Pallavi Phadnis is a Senior Software Engineer at Netflix.

Pallavi Phadnis is a Senior Software Engineer on the Product Data Science and Engineering team. In this post, Pallavi talks about her journey to Netflix and the challenges that keep the work interesting.

Pallavi received her master’s degree from Carnegie Mellon. Before joining Netflix, she worked in the advertising and e-commerce industries as a backend software engineer. In her free time, she enjoys watching Netflix and traveling.

Her favorite shows: Stranger Things, Gilmore Girls, and Breaking Bad.

Pallavi, what’s your journey to data engineering at Netflix?

Netflix’s unique work culture and petabyte-scale data problems are what drew me to Netflix.

During earlier years of my career, I primarily worked as a backend software engineer, designing and building the backend systems that enable big data analytics. I developed many batch and real-time data pipelines using open source technologies for AOL Advertising and eBay. I also built online serving systems and microservices powering Walmart’s e-commerce.

Those years of experience solving technical problems for various businesses have taught me that data has the power to maximize the potential of any product.

Before I joined Netflix, I was always fascinated by my experience as a Netflix member which left a great impression of Netflix engineering teams on me. When I read Netflix’s culture memo for the first time, I was impressed by how candid, direct and transparent the work culture sounded. These cultural points resonated with me most: freedom and responsibility, high bar for performance, and no hiring of brilliant jerks.

Over the years, I followed the big data open-source community and Netflix tech blogs closely, and learned a lot about Netflix’s innovative engineering solutions and active contributions to the open-source ecosystem. In 2017, I attended the Women in Big Data conference at Netflix and met with several amazing women in data engineering, including our VP. At this conference, I also got to know my current team: Consolidated Logging (CL).

CL provides an end-to-end solution for logging, processing, and analyzing user interactions on Netflix apps from all devices. It is critical for fast-paced product innovation at Netflix since CL provides foundational data for personalization, A/B experimentation, and performance analytics. Moreover, its petabyte scale also brings unique engineering challenges. The scope of work, business impact, and engineering challenges of CL were very exciting to me. Plus, the roles on the CL team require a blend of data engineering, software engineering, and distributed systems skills, which align really well with my interests and expertise.

What is your favorite project?

The project I am most proud of is building the Consolidated Logging V2 platform which processes and enriches data at a massive scale (5 million+ events per sec at peak) in real-time. I re-architected our legacy data pipelines and built a new platform on top of Apache Flink and Iceberg. The scale brought some interesting learnings and involved working closely with the Apache Flink and Kafka community to fix core issues. Thanks to the migration to V2, we were able to improve data availability and usability for our consumers significantly. The efficiency achieved in processing and storage layers brought us big savings on computing and storage costs. You can learn more about it from my talk at the Flink forward conference. Over the last couple of years, we have been continuously enhancing the V2 platform to support more logging use cases beyond Netflix streaming apps and provide further analytics capabilities. For instance, I am recently working on a project to build a common analytics solution for 100s of Netflix studio and internal apps.

How’s data engineering similar and different from software engineering?

The data engineering role at Netflix is similar to the software engineering role in many aspects. Both roles involve designing and developing large-scale solutions using various open-source technologies. In addition to the business logic, they need a good understanding of the framework internals and infrastructure in order to ensure production stability, for example, maintaining SLA to minimize the impact on the upstream and downstream applications. At Netflix, it is fairly common for data engineers and software engineers to collaborate on the same projects.

In addition, data engineers are responsible for designing data logging specifications and optimized data models to ensure that the desired business questions can be answered. Therefore, they have to understand both the product and the business use cases of the data in depth.

In other words, data engineers bridge the gap between data producers (such as client UI teams) and data consumers (such as data analysts and data scientists.)

Learning more

Interested in learning more about data roles at Netflix? You’re in the right place! Keep an eye out for our open roles in Data Science and Engineering by visiting our jobs site here. Our culture is key to our impact and growth: read about it here. To learn more about our Data Engineers, check out our chats with Dhevi Rajendran, Samuel Setegne, and Kevin Wylie.


Data Engineers of Netflix — Interview with Pallavi Phadnis was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Forensic investigation environment strategies in the AWS Cloud

Post Syndicated from Sol Kavanagh original https://aws.amazon.com/blogs/security/forensic-investigation-environment-strategies-in-the-aws-cloud/

When a deviation from your secure baseline occurs, it’s crucial to respond and resolve the issue quickly and follow up with a forensic investigation and root cause analysis. Having a preconfigured infrastructure and a practiced plan for using it when there’s a deviation from your baseline will help you to extract and analyze the information needed to determine the impact, scope, and root cause of an incident and return to operations confidently.

Time is of the essence in understanding the what, how, who, where, and when of a security incident. You often hear of automated incident response, which has repeatable and auditable processes to standardize the resolution of incidents and accelerate evidence artifact gathering.

Similarly, having a standard, pristine, pre-configured, and repeatable forensic clean-room environment that can be automatically deployed through a template allows your organization to minimize human interaction, keep the larger organization separate from contamination, hasten evidence gathering and root cause analysis, and protect forensic data integrity. The forensic analysis process assists in data preservation, acquisition, and analysis to identify the root cause of an incident. This approach can also facilitate the presentation or transfer of evidence to outside legal entities or auditors. AWS CloudFormation templates—or other infrastructure as code (IaC) provisioning tools—help you to achieve these goals, providing your business with consistent, well-structured, and auditable results that allow for a better overall security posture. Having these environments as a permanent part of your infrastructure allows them to be well documented and tested, and gives you opportunities to train your teams in their use.

This post provides strategies that you can use to prepare your organization to respond to secure baseline deviations. These strategies take the form of best practices around Amazon Web Services (AWS) account structure, AWS Organizations organizational units (OUs) and service control policies (SCPs), forensic Amazon Virtual Private Cloud (Amazon VPC) and network infrastructure, evidence artifacts to be collected, AWS services to be used, forensic analysis tool infrastructure, and user access and authorization to the above. The specific focus is to provide an environment where Amazon Elastic Compute Cloud (Amazon EC2) instances with forensic tooling can be used to examine evidence artifacts.

This post presumes that you already have an evidence artifact collection procedure or that you are implementing one and that the evidence can be transferred to the accounts described here. If you are looking for advice on how to automate artifact collection, see How to automate forensic disk collection for guidance.

Infrastructure overview

A well-architected multi-account AWS environment is based on the structure provided by Organizations. As companies grow and need to scale their infrastructure with multiple accounts, often in multiple AWS Regions, Organizations offers programmatic creation of new AWS accounts combined with central management and governance that helps them to do so in a controlled and standardized manner. This programmatic, centralized approach should be used to create the forensic investigation environments described in the strategy in this blog post.

The example in this blog post uses a simplified structure with separate dedicated OUs and accounts for security and forensics, shown in Figure 1. Your organization’s architecture might differ, but the strategy remains the same.

Note: There might be reasons for forensic analysis to be performed live within the compromised account itself, such as to avoid shutting down or accessing the compromised instance or resource; however, that approach isn’t covered here.

Figure 1: AWS Organizations forensics OU example

Figure 1: AWS Organizations forensics OU example

The most important components in Figure 1 are:

  • A security OU, which is used for hosting security-related access and services. The security OU and the associated AWS accounts should be owned and managed by your security organization.
  • A forensics OU, which should be a separate entity, although it can have some similarities and crossover responsibilities with the security OU. There are several reasons for having it within a separate OU and account. Some of the more important reasons are that the forensics team might be a different team than the security team (or a subset of it), certain investigations might be under legal hold with additional access restrictions, or a member of the security team could be the focus of an investigation.

When speaking about Organizations, accounts, and the permissions required for various actions, you must first look at SCPs, a core functionality of Organizations. SCPs offer control over the maximum available permissions for all accounts in your organization. In the example in this blog post, you can use SCPs to provide similar or identical permission policies to all the accounts under the forensics OU, which is being used as a resource container. This policy overrides all other policies, and is a crucial mechanism to ensure that you can explicitly deny or allow any API calls desired. Some use cases of SCPs are to restrict the ability to disable AWS CloudTrail, restrict root user access, and ensure that all actions taken in the forensic investigation account are logged. This provides a centralized way to avoid changing individual policies for users, groups, or roles. Accessing the forensic environment should be done using a least-privilege model, with nobody capable of modifying or compromising the initially collected evidence. For an investigation environment, denying all actions except those you want to list as exceptions is the most straightforward approach. Start with the default of denying all, and work your way towards the least authorizations needed to perform the forensic processes established by your organization. AWS Config can be a valuable tool to track the changes made to the account and provide evidence of these changes.

Keep in mind that once the restrictive SCP is applied, even the root account or those with administrator access won’t have access beyond those permissions; therefore, frequent, proactive testing as your environment changes is a best practice. Also, be sure to validate which principals can remove the protective policy, if required, to transfer the account to an outside entity. Finally, create the environment before the restrictive permissions are applied, and then move the account under the forensic OU.

Having a separate AWS account dedicated to forensic investigations is best to keep your larger organization separate from the possible threat of contamination from the incident itself, ensure the isolation and protection of the integrity of the artifacts being analyzed, and keeping the investigation confidential. Separate accounts also avoid situations where the threat actors might have used all the resources immediately available to your compromised AWS account by hitting service quotas and so preventing you from instantiating an Amazon EC2 instance to perform investigations.

Having a forensic investigation account per Region is also a good practice, as it keeps the investigative capabilities close to the data being analyzed, reduces latency, and avoids issues of the data changing regulatory jurisdictions. For example, data residing in the EU might need to be examined by an investigative team in North America, but the data itself cannot be moved because its North American architecture doesn’t align with GDPR compliance. For global customers, forensics teams might be situated in different locations worldwide and have different processes. It’s better to have a forensic account in the Region where an incident arose. The account as a whole could also then be provided to local legal institutions or third-party auditors if required. That said, if your AWS infrastructure is contained within Regions only in one jurisdiction or country, then a single re-creatable account in one Region with evidence artifacts shared from and kept in their respective Regions could be an easier architecture to manage over time.

An account created in an automated fashion using a CloudFormation template—or other IaC methods—allows you to minimize human interaction before use by recreating an entirely new and untouched forensic analysis instance for each separate investigation, ensuring its integrity. Individual people will only be given access as part of a security incident response plan, and even then, permissions to change the environment should be minimal or none at all. The post-investigation environment would then be either preserved in a locked state or removed, and a fresh, blank one created in its place for the subsequent investigation with no trace of the previous artifacts. Templating your environment also facilitates testing to ensure your investigative strategy, permissions, and tooling will function as intended.

Accessing your forensics infrastructure

Once you’ve defined where your investigative environment should reside, you must think about who will be accessing it, how they will do so, and what permissions they will need.

The forensic investigation team can be a separate team from the security incident response team, the same team, or a subset. You should provide precise access rights to the group of individuals performing the investigation as part of maintaining least privilege.

You should create specific roles for the various needs of the forensic procedures, each with only the permissions required. As with SCPs and other situations described here, start with no permissions and add authorizations only as required while establishing and testing your templated environments. As an example, you might create the following roles within the forensic account:

Responder – acquire evidence

Investigator – analyze evidence

Data custodian – manage (copy, move, delete, and expire) evidence

Analyst – access forensics reports for analytics, trends, and forecasting (threat intelligence)

You should establish an access procedure for each role, and include it in the response plan playbook. This will help you ensure least privilege access as well as environment integrity. For example, establish a process for an owner of the Security Incident Response Plan to verify and approve the request for access to the environment. Another alternative is the two-person rule. Alert on log-in is an additional security measure that you can add to help increase confidence in the environment’s integrity, and to monitor for unauthorized access.

You want the investigative role to have read-only access to the original evidence artifacts collected, generally consisting of Amazon Elastic Block Store (Amazon EBS) snapshots, memory dumps, logs, or other artifacts in an Amazon Simple Storage Service (Amazon S3) bucket. The original sources of evidence should be protected; MFA delete and S3 versioning are two methods for doing so. Work should be performed on copies of copies if rendering the original immutable isn’t possible, especially if any modification of the artifact will happen. This is discussed in further detail below.

Evidence should only be accessible from the roles that absolutely require access—that is, investigator and data custodian. To help prevent potential insider threat actors from being aware of the investigation, you should deny even read access from any roles not intended to access and analyze evidence.

Protecting the integrity of your forensic infrastructures

Once you’ve built the organization, account structure, and roles, you must decide on the best strategy inside the account itself. Analysis of the collected artifacts can be done through forensic analysis tools hosted on an EC2 instance, ideally residing within a dedicated Amazon VPC in the forensics account. This Amazon VPC should be configured with the same restrictive approach you’ve taken so far, being fully isolated and auditable, with the only resources being dedicated to the forensic tasks at hand.

This might mean that the Amazon VPC’s subnets will have no internet gateways, and therefore all S3 access must be done through an S3 VPC endpoint. VPC flow logging should be enabled at the Amazon VPC level so that there are records of all network traffic. Security groups should be highly restrictive, and deny all ports that aren’t related to the requirements of the forensic tools. SSH and RDP access should be restricted and governed by auditable mechanisms such as a bastion host configured to log all connections and activity, AWS Systems Manager Session Manager, or similar.

If using Systems Manager Session Manager with a graphical interface is required, RDP or other methods can still be accessed. Commands and responses performed using Session Manager can be logged to Amazon CloudWatch and an S3 bucket, this allows auditing of all commands executed on the forensic tooling Amazon EC2 instances. Administrative privileges can also be restricted if required. You can also arrange to receive an Amazon Simple Notification Service (Amazon SNS) notification when a new session is started.

Given that the Amazon EC2 forensic tooling instances might not have direct access to the internet, you might need to create a process to preconfigure and deploy standardized Amazon Machine Images (AMIs) with the appropriate installed and updated set of tooling for analysis. Several best practices apply around this process. The OS of the AMI should be hardened to reduce its vulnerable surface. We do this by starting with an approved OS image, such as an AWS-provided AMI or one you have created and managed yourself. Then proceed to remove unwanted programs, packages, libraries, and other components. Ensure that all updates and patches—security and otherwise—have been applied. Configuring a host-based firewall is also a good precaution, as well as host-based intrusion detection tools. In addition, always ensure the attached disks are encrypted.

If your operating system is supported, we recommend creating golden images using EC2 Image Builder. Your golden image should be rebuilt and updated at least monthly, as you want to ensure it’s kept up to date with security patches and functionality.

EC2 Image Builder—combined with other tools—facilitates the hardening process; for example, allowing the creation of automated pipelines that produce Center for Internet Security (CIS) benchmark hardened AMIs. If you don’t want to maintain your own hardened images, you can find CIS benchmark hardened AMIs on the AWS Marketplace.

Keep in mind the infrastructure requirements for your forensic tools—such as minimum CPU, memory, storage, and networking requirements—before choosing an appropriate EC2 instance type. Though a variety of instance types are available, you’ll want to ensure that you’re keeping the right balance between cost and performance based on your minimum requirements and expected workloads.

The goal of this environment is to provide an efficient means to collect evidence, perform a comprehensive investigation, and effectively return to safe operations. Evidence is best acquired through the automated strategies discussed in How to automate incident response in the AWS Cloud for EC2 instances. Hashing evidence artifacts immediately upon acquisition is highly recommended in your evidence collection process. Hashes, and in turn the evidence itself, can then be validated after subsequent transfers and accesses, ensuring the integrity of the evidence is maintained. Preserving the original evidence is crucial if legal action is taken.

Evidence and artifacts can consist of, but aren’t limited to:

Access to the control plane logs mentioned above—such as the CloudTrail logs—can be accessed in one of two ways. Ideally, the logs should reside in a central location with read-only access for investigations as needed. However, if not centralized, read access can be given to the original logs within the source account as needed. Read access to certain service logs found within the security account, such as AWS Config, Amazon GuardDuty, Security Hub, and Amazon Detective, might be necessary to correlate indicators of compromise with evidence discovered during the analysis.

As previously mentioned, it’s imperative to have immutable versions of all evidence. This can be achieved in many ways, including but not limited to the following examples:

  • Amazon EBS snapshots, including hibernation generated memory dumps:
    • Original Amazon EBS disks are snapshotted, shared to the forensics account, used to create a volume, and then mounted as read-only for offline analysis.
  • Amazon EBS volumes manually captured:
    • Linux tools such as dc3dd can be used to stream a volume to an S3 bucket, as well as provide a hash, and then made immutable using an S3 method from the next bullet point.
  • Artifacts stored in an S3 bucket, such as memory dumps and other artifacts:
    • S3 Object Lock prevents objects from being deleted or overwritten for a fixed amount of time or indefinitely.
    • Using MFA delete requires the requestor to use multi-factor authentication to permanently delete an object.
    • Amazon S3 Glacier provides a Vault Lock function if you want to retain immutable evidence long term.
  • Disk volumes:
    • Linux: Mount in read-only mode.
    • Windows: Use one of the many commercial or open-source write-blocker applications available, some of which are specifically made for forensic use.
  • CloudTrail:
  • AWS Systems Manager inventory:
  • AWS Config data:
    • By default, AWS Config stores data in an S3 bucket, and can be protected using the above methods.

Note: AWS services such as KMS can help enable encryption. KMS is integrated with AWS services to simplify using your keys to encrypt data across your AWS workloads.

An example use case of Amazon EBS disks being shared as evidence to the forensics account, the following figure—Figure 2—is a simplified S3 bucket folder structure you could use to store and work with evidence.

Figure 2 shows an S3 bucket structure for a forensic account. An S3 bucket and folder is created to hold incoming data—for example, from Amazon EBS disks—which is streamed to Incoming Data > Evidence Artifacts using dc3dd. The data is then copied from there to a folder in another bucket—Active Investigation > Root Directory > Extracted Artifacts—to be analyzed by the tooling installed on your forensic Amazon EC2 instance. Also, there are folders under Active Investigation for any investigation notes you make during analysis, as well as the final reports, which are discussed at the end of this blog post. Finally, a bucket and folders for legal holds, where an object lock will be placed to hold evidence artifacts at a specific version.

Figure 2: Forensic account S3 bucket structure

Figure 2: Forensic account S3 bucket structure

Considerations

Finally, depending on the severity of the incident, your on-premises network and infrastructure might also be compromised. Having an alternative environment for your security responders to use in case of such an event reduces the chance of not being able to respond in an emergency. Amazon services such as Amazon Workspaces—a fully managed persistent desktop virtualization service—can be used to provide your responders a ready-to-use, independent environment that they can use to access the digital forensics and incident response tools needed to perform incident-related tasks.

Aside from the investigative tools, communications services are among the most critical for coordination of response. You can use Amazon WorkMail and Amazon Chime to provide that capability independent of normal channels.

Conclusion

The goal of a forensic investigation is to provide a final report that’s supported by the evidence. This includes what was accessed, who might have accessed it, how it was accessed, whether any data was exfiltrated, and so on. This report might be necessary for legal circumstances, such as criminal or civil investigations or situations requiring breach notifications. What output each circumstance requires should be determined in advance in order to develop an appropriate response and reporting process for each. A root cause analysis is vital in providing the information required to prepare your resources and environment to help prevent a similar incident in the future. Reports should not only include a root cause analysis, but also provide the methods, steps, and tools used to arrive at the conclusions.

This article has shown you how you can get started creating and maintaining forensic environments, as well as enable your teams to perform advanced incident resolution investigations using AWS services. Implementing the groundwork for your forensics environment, as described above, allows you to use automated disk collection to begin iterating on your forensic data collection capabilities and be better prepared when security events occur.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on one of the AWS Security, Identity, and Compliance forums or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Sol Kavanagh

Sol is a senior solutions architect and CISSP with 20+ years of experience in the enterprise space. He is passionate about security and helping customers build solutions in the AWS Cloud. In his spare time he enjoys distance cycling, adventure travelling, Buddhist philosophy, and Muay Thai.

[$] Debian’s which hunt

Post Syndicated from original https://lwn.net/Articles/874049/rss

One does not normally expect to see a great deal of angst over a one-page
shell script, even on the Internet. But Debian is special, so it has been
having an extended discussion over the fate of the which command
that has been escalated to the Debian Technical Committee. The amount of
attention that has been given to a small, nonstandard utility shines a
light on Debian’s governance processes and the interaction of tradition
with standards.

Keeping your Zabbix templates up to date

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/keeping-your-zabbix-templates-up-to-date/16412/

Have you recently updated your Zabbix environment but are still wondering – why haven’t the templates been updated? Where can I obtain the latest official Zabbix templates, and how should I update them? In this blog post, we will discuss why it is vital to keep your templates up to date and how we the template update process looks like.

Updating your templates

“Will updating Zabbix also update my templates?” is a question that I receive quite often. The answer to that question is – no changes are made to your templates whenever you update your Zabbix instance – be it a minor or a major update. The reasoning behind that is quite simple – we always recommend that you tune the out-of-the-box templates as per your particular requirements. That may consist of changing update intervals, disabling items/triggers, or even changing the existing trigger expressions or adding whole new entities to the template.

This is where the current behavior with template updates starts to make more sense. If Zabbix were to automatically update your templates, there could be a chance of overwriting your custom changes and could potentially disrupt the monitoring of your environment. That is something that we definitely wish to avoid.

The question still stands – Then how am I supposed to update my templates?

The answer – you can find the latest official Zabbix templates on our official git page – https://git.zabbix.com/

First, navigate to the Zabbix repository and open the Templates folder. Then, select the release branch that matches your Zabbix instance version. Here you can find all of our official templates and also the official media types under the media folder. All you have to do now is open the template up and download the raw template file.

Zabbix 5.4 release git templates/db folder

Once that is done, we can import the template into our Zabbix environment

Template template_db_oracle_agent2 import

Don’t forget to back up your existing templates, especially if you have made some custom changes to them! Ideally – add a prefix to their names, so the new and old templates can live side by side, and you can then manually copy over the changes from the latest official template to your custom template.

The benefits of keeping templates up to date

But what is the point of updating your templates – what do you get out of it? Well, that varies on the specific fixes or improvements we make to the particular template over time. Sometimes the updated template will provide improved trigger expressions or preprocessing logic. Other times the updated template will provide extra value to your monitoring with completely new items and triggers. In the case of Webhook media types – the updates usually contain fixes or improvements for some particular use cases, for example – fixing a compatibility issue for a specific OS.

You can always track these changes either in the release notes of a particular Zabbix version or by looking up a specific bug or a feature request in our bug tracker – https://support.zabbix.com

Some of the template changes in Zabbix 5.4 major update

Zabbix self-monitoring templates

Another key aspect of why it’s important to keep your templates up to date is so you can implement the changes made to the Zabbix self-monitoring templates. For example, if we compare Zabbix 5.0 to Zabbix 5.4, there are multiple new Zabbix processes and caches added to Zabbix 5.4, such as report writer/manager process, availability manager process, trend function caches, and other new components.

Zabbix server health template version 5.0 and 5.4 difference in the number of entities

So, if you update from Zabbix 5.0 to 5.4 (or Zabbix 6.0 if you’re sticking with LTS versions), you WILL NOT be monitoring these processes and caches if you don’t update your Zabbix server and Zabbix proxy templates to the current Zabbix versions. This means that you will be completely unaware of any potential performance issues related to these processes or caches.

Tracking template changes

With Zabbix 5.4 and later, you will notice some great improvements to the template import process. If you’re wondering what has changed when comparing an older template version with a newer one, you will now be able to see the changes during the import process. The added and removed elements will be highlighted in red or green accordingly.

Preview of the changes made during the template import process

How often should you update the templates? Ideally, you would follow the Zabbix update release notes and take note of any changes made to the templates that are of use in your environment. At the very least – definitely check for changes in the self-monitoring templates when moving to a newer major version of Zabbix. Otherwise, you risk losing track of potential issues in your Zabbix environment.

Now that you know the answer to the question “How can I update my Zabbix templates?” try and think back to when you last updated your Zabbix instance to a new version – did you also check the official templates for updates? If not, then don’t hesitate and visit https://git.zabbix.com/ to find the latest templates for your Zabbix version. Chances are that you will be pleasantly surprised with a set of new and updated templates for your monitoring endpoints and new webhook media types to help you integrate Zabbix with your existing systems.

Hands-On IoT Hacking: Rapid7 at DefCon IoT Village, Part 2

Post Syndicated from Deral Heiland original https://blog.rapid7.com/2021/10/28/hands-on-iot-hacking-rapid7-at-defcon-29-iot-village-pt-2/

Hands-On IoT Hacking: Rapid7 at DefCon IoT Village, Part 2

In our last post, we discussed how we set up Rapid7’s hands-on exercise at the Defcon 29 IoT Village. Now, with that foundation laid, we’ll get into how to determine whether the header we created is UART.

When trying to determine baud rate for IoT devices, I often just guess. Generally, for typical IoT hardware, the baud rate is going to be one of the following:

  • 9600
  • 19200
  • 38400
  • 57600
  • 115200

Typically, 115200 and 57600 are the most commonly encountered baud rates on consumer-grade IoT devices. Other settings that need to be made are data bits, stop bits, and parity bits. Typically, these will be set to the following standard defaults, as shown in Figure 5:

Hands-On IoT Hacking: Rapid7 at DefCon IoT Village, Part 2
Figure 5: Logic 2 Async Serial Decoder Settings

Once all the correct settings have been determined, and if the test point is UART, then the decoder in the Logic 2 application should decode the bit stream and reveal console text data for the device booting up. An example of this is shown in Figure 6:

Hands-On IoT Hacking: Rapid7 at DefCon IoT Village, Part 2
Figure 6: UART Decode of Channel 1

FTDI UART Setup

Once you’ve properly determined the header is UART and identified transmit, receive, and ground pins, you can next hook up a USB to UART FTDI and start analyzing and hacking on the IoT device. During the IoT Village exercise, we used a Shikra for UART connection. Unfortunately, the Shikra appears to no longer be available, but any USB to UART FTDI device supporting 3.3vdc can be used for this exercise. However, I do recommend purchasing a multi-voltage FTDI device if possible. It’s common to encounter IoT devices that require either 1.8, 3.3, or 5 vdc, so having a product that can support these voltage levels is the best solution.

The software we used to connect to the FTDI device for the exercise at Defcon IoT Village was GtkTerm running on an Ubuntu Linux — but again, any terminal software that supports tty terminal connection will work for this. For example, I have also used CoolTerm or Putty, which both work fine. So just find a terminal software that works best for you and substitute it for what is referenced here.

The next step is to attach the Shikra (pin out) or whatever brand of FTDI USB device you’re using to the UART header on Luma (Figure 7) using this table:

Shikra Pins Luma Header J19 Pins
Pin 1 TXT Pin 2 RCV
Pin 2 RCV Pin 3 TXT
Pin 18 GND Pin 1 GND

Hands-On IoT Hacking: Rapid7 at DefCon IoT Village, Part 2
Figure 7: LUMA UART PINOUT

Once the UART to USB device is connected to the LUMA, double-click on the GtkTerm icon located on the Linux Desktop and configure the application by selecting Configuration on the menu bar followed by Port in the drop-down menu. From there, set the Port (/dev/ttyUSB0) and Baud Rate (115200) to match the figure below (Figure 8), and click OK.

Hands-On IoT Hacking: Rapid7 at DefCon IoT Village, Part 2
Figure 8: Serial Port Settings

Once configured, power on the LUMA device. At this point, you should start to see the device’s boot process logged to the UART console. For the Defcon IoT hacking exercise, we had preconfigured the devices to disable the console, so once we loaded U-boot and started the system kernel image, the console became disabled, as shown in Figure 9:

Hands-On IoT Hacking: Rapid7 at DefCon IoT Village, Part 2
Figure 9: Console Stops at Kernel Starting

We made these changes so the attendees working on the exercise would experience a common setting often encountered, where the UART console is disabled during the booting process, and they’d have the chance to conduct another common attack that would allow them to break out of this lockdown.

For example, during the boot sequence, it’s often possible to force the device to break out of the boot process and to drop into a U-Boot console. For standard U-Boot, this will often happen when the Kernel image is inaccessible, causing the boot process to error out and drop into a U-Boot console prompt. This condition can sometimes be forced by shorting the data line (serial out) from the flash memory chip containing the kernel image to ground during the boot process. This prevents the boot process from loading the kernel into memory. Figure 5 shows a pin-out image of the flash memory chip currently in use on this device. The data out from the flash memory chip is Serial Out (SO) on pin 2.

Hands-On IoT Hacking: Rapid7 at DefCon IoT Village, Part 2
Figure 10: Flash Memory Pinout

Also, I would like to note that during the Defcon IoT Village exercises, I had a conversation with several like-minded IoT hackers who said that they typically do this same attack but use the clock pin (SCLK). So, that is another viable option when conducting this type of attack on an IoT device to gain access to the U-Boot console.

During our live exercises at Defcon IoT Village, to help facilitate the process of grounding the data line Pin 2 Serial Out (SO) — and to avoid ending up with a bunch of dead devices because of accidentally grounding the wrong pins — we attached a lead from pin 2 of the flash memory chip, as shown in Figure 11:

Hands-On IoT Hacking: Rapid7 at DefCon IoT Village, Part 2
Figure 11: Pin Glitch Lead Connected to Flash

To conduct this “pin glitch” attack to gain access to the U-Boot console, you will need to first power down the device. Then, restart the device by powering it back on while also monitoring the UART Console for the U-boot to start loading. Once you see the U-Boot loading, hold the shorting lead against the metal shielding or some other point of ground within the device, as shown in Figure 12:

Hands-On IoT Hacking: Rapid7 at DefCon IoT Village, Part 2
Figure 12: Short Pin2 Serial Out (SO) to Ground

Shorting this or the clock pin to ground will prevent U-Boot from being able to load the kernel. If your timing is accurate, you should be successful and now see U-Boot console prompt IPQ40xx, as shown below in Figure 13. Once you see this prompt, you can lay the shorting lead to the side. If this prompt does not show up, then you will need to repeat this process.

Hands-On IoT Hacking: Rapid7 at DefCon IoT Village, Part 2
Figure 13: U-Boot Console Prompt

With the LUMA device used in this example, this attack is more forgiving and easier to carry out successfully. The main reason, in my opinion, is because the U-Boot image and the kernel image are on separate flash memory chips. In my experience, this seems to cause more of a delay between U-Boot load and kernel loading, allowing for a longer window of time for the pin glitch to succeed.

Alter U-boot environment variables

U-boot environment variables are used to control the boot process of the devices. During this phase of the exercise, we used the following three U-Boot console commands to view, alter, and save changes made to the U-Boot environment variables to re-enable the console, which we had disabled before the exercise.

  • “Printenv” is used to list the current environment variable settings.
  • “Setenv” is used to create or modify environment variables.
  • “Saveenv” is used to write the environment variables back to memory so they are permanent.

When connected into the U-Boot console to view the device’s configured environment variables, the “printenv” command is used. This command will return something that looks like the following Figure 14 below. Scrolling down and viewing the environment settings will reveal a lot about how the device boot process is configured. In the case of the Defcon IoT Village, we had attendees pay close attention to the bootargs variable, because this is where the console was disabled from.

Hands-On IoT Hacking: Rapid7 at DefCon IoT Village, Part 2
Figure 14: printenv

With a closer look at the bootargs variable as shown below in Figure 15, we can see that the console had been set to off. This is the reason the UART console halted during the boot process once the kernel was loaded.

Hands-On IoT Hacking: Rapid7 at DefCon IoT Village, Part 2
Figure 15: bootargs Environment Variable Setting

In our third post, we’ll cover the next phase of our IoT Village exercise: turning the console back on and achieving single-user mode. Check back with us next week!

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Creating AWS Lambda environment variables from AWS Secrets Manager

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/creating-aws-lambda-environmental-variables-from-aws-secrets-manager/

This post is written by Andy Hall, Senior Solutions Architect.

AWS Lambda layers and extensions are used by third-party software providers for monitoring Lambda functions. A monitoring solution needs environmental variables to provide configuration information to send metric information to an endpoint.

Managing this information as environmental variables across thousands of Lambda functions creates operational overhead. Instead, you can use the approach in this blog post to create environmental variables dynamically from information hosted in AWS Secrets Manager.

This can help avoid managing secret rotation for individual functions. It ensures that values stay encrypted until runtime, and abstracts away the management of the environmental variables.

Overview

This post shows how to create a Lambda layer for Node.js, Python, Ruby, Java, and .NET Core runtimes. It retrieves values from Secrets Manager and converts the secret into an environmental variable that can be used by other layers and functions. The Lambda layer uses a wrapper script to fetch information from Secrets Manager and create environmental variables.

Solution architecture

The steps in the process are as follows:

  1. The Lambda service responds to an event and initializes the Lambda context.
  2. The wrapper script is called as part of the Lambda init phase.
  3. The wrapper script calls a Golang executable passing in the ARN for the secret to retrieve.
  4. The Golang executable uses the Secrets Manager API to retrieve the decrypted secret.
  5. The wrapper script converts the information into environmental variables and calls the next step in processing.

All of the code for this post is available from this GitHub repo.

The wrapper script

The wrapper script is the main entry-point for the extension and is called by the Lambda service as part of the init phase. During this phase, the wrapper script will read in basic information from the environment and call the Golang executable. If there was an issue with the Golang executable, the wrapper script will log a statement and exit with an error.

# Get the secret value by calling the Go executable
values=$(${fullPath}/go-retrieve-secret -r "${region}" -s "${secretArn}" -a "${roleName}" -t ${timeout})
last_cmd=$?

# Verify that the last command was successful
if [[ ${last_cmd} -ne 0 ]]; then
    echo "Failed to setup environment for Secret ${secretArn}"
    exit 1
fi

Golang executable

This uses Golang to invoke the AWS APIs since the Lambda execution environment does natively provide the AWS Command Line Interface. The Golang executable can be included in a layer so that the layer works with a number of Lambda runtimes.

The Golang executable captures and validates the command line arguments to ensure that required parameters are supplied. If Lambda does not have permissions to read and decrypt the secret, you can supply an ARN for a role to assume.

The following code example shows how the Golang executable retrieves the necessary information to assume a role using the AWS Security Token Service:

client := sts.NewFromConfig(cfg)

return client.AssumeRole(ctx,
    &sts.AssumeRoleInput{
        RoleArn: &roleArn,
        RoleSessionName: &sessionName,
    },
)

After obtaining the necessary permissions, the secret can be retrieved using the Secrets Manager API. The following code example uses the new credentials to create a client connection to Secrets Manager and the secret:

client := secretsmanager.NewFromConfig(cfg, func(o *secretsmanager.Options) {
    o.Credentials = aws.NewCredentialsCache(credentials.NewStaticCredentialsProvider(*assumedRole.Credentials.AccessKeyId, *assumedRole.Credentials.SecretAccessKey, *assumedRole.Credentials.SessionToken))
})
return client.GetSecretValue(ctx, &secretsmanager.GetSecretValueInput{
    SecretId: aws.String(secretArn),
})

After retrieving the secret, the contents must be converted into a format that the wrapper script can use. The following sample code covers the conversion from a secret string to JSON by storing the data in a map. Once the data is in a map, a loop is used to output the information as key-value pairs.

// Convert the secret into JSON
var dat map[string]interface{}

// Convert the secret to JSON
if err := json.Unmarshal([]byte(*result.SecretString), &dat); err != nil {
    fmt.Println("Failed to convert Secret to JSON")
    fmt.Println(err)
    panic(err)
}

// Get the secret value and dump the output in a manner that a shell script can read the
// data from the output
for key, value := range dat {
    fmt.Printf("%s|%s\n", key, value)
}

Conversion to environmental variables

After the secret information is retrieved by using Golang, the wrapper script can now loop over the output, populate a temporary file with export statements, and execute the temporary file. The following code covers these steps:

# Read the data line by line and export the data as key value pairs 
# and environmental variables
echo "${values}" | while read -r line; do 
    
    # Split the line into a key and value
    ARRY=(${line//|/ })

    # Capture the kay value
    key="${ARRY[0]}"

    # Since the key had been captured, no need to keep it in the array
    unset ARRY[0]

    # Join the other parts of the array into a single value.  There is a chance that
    # The split man have broken the data into multiple values.  This will force the
    # data to be rejoined.
    value="${ARRY[@]}"
    
    # Save as an env var to the temp file for later processing
    echo "export ${key}=\"${value}\"" >> ${tempFile}
done

# Source the temp file to read in the env vars
. ${tempFile}

At this point, the information stored in the secret is now available as environmental variables to layers and the Lambda function.

Deployment

To deploy this solution, you must build on an instance that is running an Amazon Linux 2 AMI. This ensures that the compiled Golang executable is compatible with the Lambda execution environment.

The easiest way to deploy this solution is from an AWS Cloud9 environment but you can also use an Amazon EC2 instance. To build and deploy the solution into your environment, you need the ARN of the secret that you want to use. A build script is provided to ease deployment and perform compilation, archival, and AWS CDK execution.

To deploy, run:

./build.sh <ARN of the secret to use>

Once the build is complete, the following resources are deployed into your AWS account:

  • A Lambda layer (called get-secrets-layer)
  • A second Lambda layer for testing (called second-example-layer)
  • A Lambda function (called example-get-secrets-lambda)

Testing

To test the deployment, create a test event to send to the new example-get-secrets-lambda Lambda function using the AWS Management Console. The test Lambda function uses both the get-secrets-layer and second-example-layer Lambda layers, and the secret specified from the build. This function logs the values of environmental variables that were created by the get-secrets-layer and second-example-layer layers:

The secret contains the following information:

{
  "EXAMPLE_CONNECTION_TOKEN": "EXAMPLE AUTH TOKEN",
  "EXAMPLE_CLUSTER_ID": "EXAMPLE CLUSTER ID",
  "EXAMPLE_CONNECTION_URL": "EXAMPLE CONNECTION URL",
  "EXAMPLE_TENANT": "EXAMPLE TENANT",
  "AWS_LAMBDA_EXEC_WRAPPER": "/opt/second-example-layer"
}

This is the Python code for the example-get-secrets-lambda function:

import os
import json
import sys

def lambda_handler(event, context):
    print(f"Got event in main lambda [{event}]",flush=True)
    
    # Return all of the data
    return {
        'statusCode': 200,
        'layer': {
            'EXAMPLE_AUTH_TOKEN': os.environ.get('EXAMPLE_AUTH_TOKEN', 'Not Set'),
            'EXAMPLE_CLUSTER_ID': os.environ.get('EXAMPLE_CLUSTER_ID', 'Not Set'),
            'EXAMPLE_CONNECTION_URL': os.environ.get('EXAMPLE_CONNECTION_URL', 'Not Set'),
            'EXAMPLE_TENANT': os.environ.get('EXAMPLE_TENANT', 'Not Set'),
            'AWS_LAMBDA_EXEC_WRAPPER': os.environ.get('AWS_LAMBDA_EXEC_WRAPPER', 'Not Set')
        },
        'secondLayer': {
            'SECOND_LAYER_EXECUTE': os.environ.get('SECOND_LAYER_EXECUTE', 'Not Set')
        }
    }

When running a test using the AWS Management Console, you see the following response returned from the Lambda in the AWS Management Console:

{
  "statusCode": 200,
  "layer": {
    "EXAMPLE_AUTH_TOKEN": "EXAMPLE AUTH TOKEN",
    "EXAMPLE_CLUSTER_ID": "EXAMPLE CLUSTER ID",
    "EXAMPLE_CONNECTION_URL": "EXAMPLE CONNECTION URL",
    "EXAMPLE_TENANT": "EXAMPLE TENANT",
    "AWS_LAMBDA_EXEC_WRAPPER": "/opt/second-example-layer"
  },
  "secondLayer": {
    "SECOND_LAYER_EXECUTE": "true"
  }
}

When the secret changes, there is a delay before those changes are available to the Lambda layers and function. This is because the layer only executes in the init phase of the Lambda lifecycle. After the Lambda execution environment is recreated and initialized, the layer executes and creates environmental variables with the new secret information.

Conclusion

This solution provides a way to convert information from Secrets Manager into Lambda environment variables. By following this approach, you can centralize the management of information through Secrets Manager, instead of at the function level.

For more information about the Lambda lifecycle, see the Lambda execution environment lifecycle documentation.

The code for this post is available from this GitHub repo.

For more serverless learning resources, visit Serverless Land.

More Russian SVR Supply-Chain Attacks

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/10/more-russian-svr-supply-chain-attacks.html

Microsoft is reporting that the same attacker that was behind the SolarWinds breach — the Russian SVR, which Microsoft is calling Nobelium — is continuing with similar supply-chain attacks:

Nobelium has been attempting to replicate the approach it has used in past attacks by targeting organizations integral to the global IT supply chain. This time, it is attacking a different part of the supply chain: resellers and other technology service providers that customize, deploy and manage cloud services and other technologies on behalf of their customers. We believe Nobelium ultimately hopes to piggyback on any direct access that resellers may have to their customers’ IT systems and more easily impersonate an organization’s trusted technology partner to gain access to their downstream customers. We began observing this latest campaign in May 2021 and have been notifying impacted partners and customers while also developing new technical assistance and guidance for the reseller community. Since May, we have notified more than 140 resellers and technology service providers that have been targeted by Nobelium. We continue to investigate, but to date we believe as many as 14 of these resellers and service providers have been compromised. Fortunately, we have discovered this campaign during its early stages, and we are sharing these developments to help cloud service resellers, technology providers, and their customers take timely steps to help ensure Nobelium is not more successful.

Трудният разговор за Информационно обслужване

Post Syndicated from Bozho original https://blog.bozho.net/blog/3845

Трудният разговор за Информационно обслужване трябва да започне.

Още в края на 2019 г. остро критикувах решението на ГЕРБ да изтрият цяла глава от Закона за електронно управление (маскирайки неуспеха си да създадат държавно предприятие „Единен системен оператор“) и с едно изречение да прехвърлят цялото ИТ на всяка институция, която Министерският съвет реши, под „крилото“ на Информационно обслужване.

Тази уредба премахва изискванията за прозрачност, позволява схеми с вътрешно възлагане тип „Автомагистрали“, не гарантира предсказуемост (за един ден МС може да извади една институция от списъка и да „убие“ всяко планиране).

Няма да се съглася с разпространеното мнение, че Информационно обслужване е пълно с некадърници, които нищо не могат да свършат. Там има хора, които всяка ИТ компания би искала да има, и които са много над средното ниво. Има, разбира се, и такива, които никоя не би искала. Като във всяка голяма компания.

Но институционално погледнато, ролята на системния интегратор (или оператор) е важна. Държавната администрация няма капацитет за почти нищо в сферата на ИТ и трябва да може да се довери на някого. Дори за да направиш аутсорсинг (=ЗОП), пак трябва да имаш компетентност, каквато в много администрации няма.

Трябва да отчетем един факт – миналата есен Информационно обслужване изгради национална здравна информационна система за няколко седмици, когато това беше нужно. И го направи добре, по мое мнение. След като 4 години тази система се беше бавила по различни причини.
DDoS атаките от последните дни са сериозни и едва ли има компания, която може да се защити без никакъв downtime. Познавам инфраструктурата на ИО в тази ѝ част и мисля, че е направено достатъчно.

Въпросите, на които трябва да отговорим, са няколко:

  • Трябва ли всяка администрация да си поръчва софтуерни лицензи и хардуер самостоятелно? Моят отговор е „не“ – в много държави има успешни примери за централизирано управление на лицензи и стандартизирани поръчки за хардуер, така че държавата да получи възможно най-добрите оферти. В Тексас дори има агенция, за която производителите имат специални отстъпки в ценовите си листи, защото веднъж влязъл в процеса, производителят получава достъп до цялата администрация „на един клик разстояние“. Информационно обслужване може да върши тази работа.
  • Трябва ли всяка администрация да поддържа ИТ компетенции? Отново не; то не е и възможно. Да, трябва да има грамотен ИТ директор, но принципът на „споделените услуги“ изисква вътрешно възлагане на неща като писане на технически задания, текущ контрол по същество (а не по документи), интегриране на системи, спешна извънгаранционна поддръжка и др.
  • Трябва ли ИО да е „черна кутия“? Не – това е критика към сегашната уредба. Нищо в закона не задължава ИО да публикува никой от договорите за вътрешно възлагане. Наложи се да искаме договора и заданието за изграждане на НЗИС по реда на Закона за достъп до обществена информация. от МЗ дори поискаха удължаване на срока, явно е било трудно да го намерят. Трябва изрична уредба за каталог на предоставяните услуги, цени, на които се предоставят и пълна прозрачност на извършеното – какво, за кого, колко, кога.
  • Не е ли по-добре всичко да се аутсорсне към частния сектор? Краткият отговор е „не винаги“, а дългият е тук: https://blog.bozho.net/blog/3254. За съжаление резултатът през годините не винаги е в полза на този подход. Това не значи, че държавната фирма ще се справи по-добре, разбира се.
  • Трябва ли ИО да пише софтуер за администрациите и при какви условия? Това е най-трудният въпрос. Ако ползваме здравната система за пример, отговорът би бил „да“ – след години нищоправене, ИО свърши работата. Но само на този пример не бива да си правим заключения. Вероятно правилният подход е да се уредят няколко критични сектора в закон, така че да има предвидимост, и извън тях ИО да има само поддържаща функция. Това ще е важната част от дебата в следващото Народно събрание.

Все пак Информационно обслужване е важен инструмент за държавата. Има много слабости, които трябва да бъдат коригирани, много конкретика, която да бъде прецизно уредена. Моделът на ГЕРБ „слагаме един ред в закона и правим каквото си искаме“ трябва да спре. Но тези проблеми имат решения и ще ги приложим в следващия управленски мандат.

Материалът Трудният разговор за Информационно обслужване е публикуван за пръв път на БЛОГодаря.

TLS Simply and Automatically for Europe’s Largest Cloud Customers

Post Syndicated from Let's Encrypt original https://letsencrypt.org/2021/10/28/tls-simply-and-automatically.html

OVHcloud, the largest hosting provider in Europe, has used Let’s Encrypt for TLS certificates since 2016. Since then, they’ve provisioned tens of millions of certificates for their shared hosting customers. We often get asked about how large integrations work and their best practices so this will be the first in a series of blog posts we’ll publish on the topic.

OVHcloud first started looking into using Let’s Encrypt certificates because the team saw a need for the protection provided by TLS for every customer (remember, way back five years ago, when that wasn’t just a thing everybody did?). “Our goal was to deliver TLS simply. We didn’t want to have to write a tutorial for our customers to upload a cert, but instead just click and it works,” said Guillaume Marchand, OVHcloud’s Technical Team Lead.

They considered building their own CA but determined the cost and complexity of doing so would be impractical. Instead, they build an ACME client to prepare for using Let’s Encrypt. It took about six months, “we simply followed the RFC and did a bit of reverse engineering of Certbot,” said Guillaume. In addition to a custom client, OVHcloud automated their Certificate Signing Request (CSR) process and certificate installation process.

Schematic of how OVHcloud automatically and simply gets Let's Encrypt certificates

Getting a TLS certificate is on the critical path to onboarding a shared hosting client, so monitoring is a big part of OVHcloud’s success with Let’s Encrypt. They set up monitoring at every step in the delivery process: requesting the certificate, asking for challenges, waiting for validation, and requesting certificate creation. They also keep an eye on how long it takes to get a certificate (“it’s really fast”). OVHcloud also monitors our status page to stay apprised of our operational status.

Over 10,000 certificates are issued from Let’s Encrypt to OVHcloud every day. As the company continues to expand into North America, they predict that number will grow. The initial and ongoing work done by the OVHcloud team ensures that TLS will be a simple and reliable aspect of their service.

OVHcloud is a longtime sponsor of ISRG so we’d like to close by thanking them for not just being great technical collaborators, but also financial supporters.

Check out our blog post about how Shopify uses Let’s Encrypt certificates for another example of how our certificates are used in the enterprise.

Supporting Let’s Encrypt

As a nonprofit project, 100% of our funding comes from contributions from our community of users and supporters. We depend on their support in order to provide our services for the public benefit. If your company or organization would like to sponsor Let’s Encrypt please email us at [email protected]. If you can support us with a donation, we ask that you make an individual contribution.

Open-Sourcing a Monitoring GUI for Metaflow

Post Syndicated from Netflix Technology Blog original https://netflixtechblog.com/open-sourcing-a-monitoring-gui-for-metaflow-75ff465f0d60

Open-Sourcing a Monitoring GUI for Metaflow, Netflix’s ML Platform

tl;dr Today, we are open-sourcing a long-awaited GUI for Metaflow. The Metaflow GUI allows data scientists to monitor their workflows in real-time, track experiments, and see detailed logs and results for every executed task. The GUI can be extended with plugins, allowing the community to build integrations to other systems, custom visualizations, and embed upcoming features of Metaflow directly into its views.

Metaflow is a full-stack framework for data science that we started developing at Netflix over four years ago and which we open-sourced in 2019. It allows data scientists to define ML workflows, test them locally, scale-out to the cloud, and deploy to production in idiomatic Python code. Since open-sourcing, the Metaflow community has been growing quickly: it is now the 7th most starred active project on Netflix’s GitHub account with nearly 4800 stars. Outside Netflix, Metaflow is used to power machine learning in production by hundreds of companies across industries from bioinformatics to real estate.

Since its inception, Metaflow has been a command-line-centric tool. It makes it easy for data scientists to express even complex machine learning applications in idiomatic Python, test them locally, or scale them out in the cloud — all using their favorite IDEs and terminals. Following our culture of freedom and responsibility, Metaflow grants data scientists the freedom to choose the right modeling approach, handle data and features flexibly, and construct workflows easily while ensuring that the resulting project executes responsibly and robustly on the production infrastructure.

As the number and criticality of projects running on Metaflow increased — some of which are very central to our business — our ML platform team started receiving an increasing number of support requests. Frequently, the questions were of the nature “can you help me understand why my flow takes so long to execute” or “how can I find the logs for a model that failed last night.” Technically, Metaflow provides a Python API that allows the user to inspect all details e.g., in a notebook, but writing code in a notebook to answer basic questions like this felt overkill and unnecessarily tedious. After observing the situation for months, we started forming an understanding of the kind of a new user interface that could address the growing needs of our users.

Requirements for a Metaflow GUI

Metaflow is a human-centered system by design. We consider our Python API and the CLI to be integral parts of the overall user interface and user experience, which singularly focuses on making it easier to build production-ready ML projects from scratch. In our approach, Python code provides a highly expressive and productive user interface for expressing complex business logic, such as ML models and workflows. At the same time, the CLI allows users to execute specific commands quickly and even automate common actions. When it comes to complex, real-life development work like this, it would be hard to achieve the same level of productivity on a graphical user interface.

However, textual UIs are quite lacking when it comes to discoverability and getting a holistic understanding of the system’s state. The questions we were hearing reflected this gap: we were lacking a user interface that would allow the users, quite simply, to figure out quickly what is happening in their Metaflow projects.

Netflix has a long history of developing innovative tools for observability, so when we began to specify requirements for the new GUI, we were able to leverage experiences from the previous GUIs built for other use cases, as well as real-life user stories from Metaflow users. We wanted to scope the GUI tightly, focusing on a specific gap in the Metaflow experience:

  1. The GUI should allow the users to see what flows and tasks are executing and what is happening inside them. Notably, we didn’t want to replace any of the functionality in the Metaflow APIs or CLI with the GUI — just to complement them. This meant that the GUI would be read-only: all actions like writing code and starting executions should happen on the users’ IDE and terminal as before. We also had no need to build a model-monitoring GUI yet, which is a wholly separate problem domain.
  2. The GUI would be targeted at professional data scientists. Instead of a fancy GUI for demos and presentations, we wanted a serious productivity tool with carefully thought-out user workflows that would fit seamlessly into our toolchain of data science. This requires attention to small details: for instance, users should be able to copy a link to any view in the GUI and share it e.g., on Slack, for easy collaboration and support (or to integrate with the Metaflow Slack bot). And, there should be natural affordances for navigating between the CLI, the GUI, and notebooks.
  3. The GUI should be scalable and snappy: it should handle our existing repository consisting of millions of runs, some of which contain tens of thousands of tasks without hiccups. Based on our experiences with other GUIs operating at Netflix-scale, this is not a trivial requirement: scalability needs to be baked into the design from the very beginning. Sluggish GUIs are hard to debug and fix afterwards, and they can have a significantly negative impact on productivity.
  4. The GUI should integrate well with other GUIs. A modern ML stack consists of many independent systems like data warehouses, compute layers, model serving systems, and, in particular, notebooks. It should be possible to find runs and tasks of interest in the Metaflow GUI and use a task-specific view to jump to other GUIs for further information. Our landscape of tools is constantly evolving, so we didn’t want to hardcode these links and views in the GUI itself. Instead, following the integration-friendly ethos of Metaflow, we want to embed relevant information in the GUI as plugins.
  5. Finally, we wanted to minimize the operational overhead of the GUI. In particular, under no circumstances should the GUI impact Metaflow executions. The GUI backend should be a simple service, optionally sitting alongside the existing Metaflow metadata service, providing a read-only, real-time view to the stored state. The frontend side should be easily extensible and maintainable, suggesting that we wanted a modern React app.

Monitoring GUI for Metaflow

As our ML Platform team had limited frontend resources, we reached out to Codemate to help with the implementation. As it often happens in software engineering projects, the project took longer than expected to finish, mostly because the problem of tracking and visualizing thousands of concurrent objects in real-time in a highly distributed environment is a surprisingly non-trivial problem (duh!). After countless iterations, we are finally very happy with the outcome, which we have now used in production for a few months.

When you open the GUI, you see an overview of all flows and runs, both current and historical, which you can group and filter in various ways:

Runs Grouped by flows

We can use this view for experiment tracking: Metaflow records every execution automatically, so data scientists can track all their work using this view. Naturally, the view can be grouped by user. They can also tag their runs and filter the view by tags, allowing them to focus on particular subsets of experiments.

After you click a specific run, you see all its tasks on a timeline:

Timeline view for a run

The timeline view is extremely useful in understanding performance bottlenecks, distribution of task runtimes, and finding failed tasks. At the top, you can see global attributes of the run, such as its status, start time, parameters etc. You can click a specific task to see more details:

Task view

This task view shows logs produced by a task, its results, and optionally links to other systems that are relevant to the task. For instance, if the task had deployed a model to a model serving platform, the view could include a link to a UI used for monitoring microservices.

As specified in our requirements, the GUI should work well with Metaflow CLI. To facilitate this, the top bar includes a navigation component where the user can copy-paste any pathspec, i.e., a path to any object in the Metaflow universe, which are prominently shown in the CLI output. This way, the user can easily move from the CLI to the GUI to observe runs and tasks in detail.

While the CLI is great, it is challenging to visualize flows. Each flow can be represented as a Directed Acyclic Graph (DAG), and so the GUI provides a much better way to visualize a flow. The DAG view presents all the steps of a flow and how they are related. Each step may have developer comments. They are colored to indicate the current state. Split steps are grouped by shaded boxes, while steps that participated in a foreach are grouped by a double shade box. Clicking on a step will take you to the Task view.

DAG View

Users at different organizations will likely have some special use cases that are not directly supported. The Metaflow GUI is extensible through its plugin API. For example, Netflix has its container orchestration platform called Titus. Users can configure tasks to utilize Titus to scale up or out. When failures happen, users will need to access their Titus containers for more information, and within the task view, a simple plugin provides a link for further troubleshooting.

Example task-level plugin

Try it at home!

We know that our user stories and requirements for a Metaflow GUI are not unique to Netflix. A number of companies in the Metaflow community have requested GUI for Metaflow in the past. To support the thriving community and invite 3rd party contributions to the GUI, we are open-sourcing our Monitoring GUI for Metaflow today!

You can find detailed instructions for how to deploy the GUI here. If you want to see the GUI in action before deploying it, Outerbounds, a new startup founded by our ex-colleagues, has deployed a public demo instance of the GUI. Outerbounds also hosts an active Slack community of Metaflow users where you can find support for GUI-related issues and share feedback and ideas for improvement.

With the new GUI, data scientists don’t have to fly blind anymore. Instead of reaching out to a platform team for support, they can easily see the state of their workflows on their own. We hope that Metaflow users outside Netflix will find the GUI equally beneficial, and companies will find creative ways to improve the GUI with new plugins.

For more context on the development process and motivation for the GUI, you can watch this recording of the GUI launch meetup.


Open-Sourcing a Monitoring GUI for Metaflow was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

The collective thoughts of the interwebz

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close