Brandon Adkins’ Career Journey – Taking Chances and Tackling New Challenges

Post Syndicated from Rapid7 original https://blog.rapid7.com/2024/08/15/brandon-adkins-career-journey-taking-chances-and-tackling-new-challenges/

Brandon Adkins’ Career Journey - Taking Chances and Tackling New Challenges

Brandon Adkins is the Manager of our Threat Intelligence & Detection Engineering (TIDE) team. His career journey spans a variety of roles and teams where he has been able to showcase his technical skills in security. Since joining Rapid7, he’s had experience as a Penetration Testing Consultant, working with both red and purple teams, and now as a leader with our TIDE team he supports engineers in writing effective detections for products like Insight IDR.

Adkins is no stranger to seeking out and taking on new technical challenges. Before joining Rapid7, he had built a long and successful career, achieving the role of Principal Information Security Analyst.

“I decided to come to Rapid7 because I was at a point in my career where in order to advance further, I was either going to be a people manager, or I would have to look elsewhere,” said Adkins. “At the time, I didn’t feel like I was ready to hang up my hat as an individual contributor. I still felt I had more to offer on the technical side, and didn’t want to be done yet.”

This drive led him to pursue his Offensive Security Certified Professional (OSCP) designation, enabling him to become a Penetration Tester. “I got my notification that I had passed my test the day I had my first interview with Rapid7. So the fact that I got the job really shows how they were willing to take a risk on someone brand new, and invest in my career by giving me that chance.”

When asked what the biggest shift was in coming to Rapid7, he praises the quality and caliber of talent he was exposed to. “In my past role, I was used to being one of the smartest guys in the room. Coming into Rapid7 and seeing the depth of knowledge that is here on the team, and the level of expertise everyone brings, I very quickly realized that there was so much more for me to learn.”

Adkins was inspired by those around him, and his curiosity and desire to keep growing didn’t stay quiet for long. During his time, he moved from red teams to purple teams, and ultimately started to become curious about the detection engineering team who is responsible for ensuring our products are effectively able to identify suspicious behavior.

“I really enjoyed my time as a pen tester. I loved purple teaming because I got to work alongside our customer security teams and help them identify ways to improve.” This collaborative experience and being able to blend his experience from blue and red teaming sparked further curiosity in detection engineering.

“I reached out to a few people and thought my next move might be to join the team as an engineer. When we actually got to talking, it turns out what the team really needed at the time was a manager. I was hesitant at first, but the more that I thought about it the more I thought, ‘I think I can really help make a difference here and do something good.’”

Adkins’ extensive technical background combined with his ability to work collaboratively in a customer-facing capacity ended up being the combination of talent that was needed to help the team work more efficiently. “They already have great people writing code. What they needed, and what I hope to bring, is someone who can speak to the business and advocate for the team, to smooth out any speed bumps, and ultimately clear the way so they can do what they are best at”.

Since taking on his new role in January 2024, the team has grown to be three times the size it was originally. “It’s an exciting time to be part of the team because we are getting the support and investment from the business to continue to iterate and make our products even better.”

As he continues to hire new people into the business, and support existing employees in growing their careers, Adkins says there are two key factors he looks for to spot high-caliber talent – communication skills, and the ability to collaborate.  “Technical ability is obviously important, you have to be able to do the work. But beyond that, if we’re looking for someone to step into a more senior role on the team, or evaluate if someone is ready for a promotion, I want to see examples of how they can communicate their ideas and challenges effectively, and how they use the partnerships we have across the business to collaborate and find solutions.”

For the TIDE team, Rapid7’s engineers sit at the intersection of customer feedback, product management, and our security operations center. “At a certain point, we can’t do our jobs well without having a partnership with other teams. We need to know from the SOC team if something isn’t working the way it should be. We want to know from our customers and Customer Advisors what’s working well and what more they’d like to see, and we need to work alongside our product teams and analysts to understand and synthesize data to get a full picture of the customer attack surface.”

For Adkins, his journey in cybersecurity is one that has opened a number of different doors as he’s explored new roles and teams. His expertise and experience has helped support customers around the world in understanding their attack surface and more efficiently protecting their business from bad actors.

When asked what advice he would share for others looking to grow their career, he shared “When you get an opportunity to try something new – especially at Rapid7 – jump at it. Rapid7 hired me as a pen tester with zero pentest experience. Four years later, they took a risk on me again as a people leader with zero previous people leader experience. This is a place where these moves and opportunities are not only available, but are supported by the leadership around you. If you have the fundamental skills necessary and it’s something you’re interested in, there’s a ton of room for you to expand your career.”

Email Journaling with SES Mail Manager

Post Syndicated from Zip Zieper original https://aws.amazon.com/blogs/messaging-and-targeting/email-journaling-with-ses-mail-manager/

Introduction to Journaling

Email journaling is the practice of preserving comprehensive records of all email communications within an organization. This approach stems from the need to maintain rigid, compliance-driven retention policies focused on auditing an entire organization’s email activities. Because journaled email messages are often required to satisfy on-demand audit and investigation requests, they must be readily searchable, making accessibility a key requirement. Reflecting legal and regulatory requirements, email journaling has historically required expensive, dedicated off-site storage and complex retrieval systems.

Amazon WorkMail is a managed business email service with flexible journaling capabilities that are configurable at both the individual mailbox and organization-wide level. With WorkMail, you can use custom rules to selectively preserve or redirect certain messages using granular journaling controls. This flexibility allows administrators to implement both traditional email journaling and configurations that you can customize to meet specific use cases.

Email journaling is used to capture and retain every email sent to and from an organization, primarily for compliance purposes. In contrast, email archiving is typically used to offload and store emails from an organization’s primary email system, often driven by inbox size limits and data backup or eDiscovery needs. While journaling focuses on preserving a consolidated record of communications separate from live mailboxes, archiving is a more selective process. Journaling is usually driven by regulatory, audit, and compliance requirements. As discussed in this blog post, you can use the Mail Manager archiving feature not only for selective email backup and optimization, but also to fulfill your email journaling requirements. You can learn more about email archiving with Mail Manager in this blog post.

Amazon Simple Email Service (SES) Mail Manager provides comprehensive tools that simplify managing large volumes of email communications within an organization. Mail Manager has a built-in archiving function which can be used as an inexpensive journaling solution for email systems like Amazon WorkMail. Mail Manager’s rules engine allows for the creation of rules that readily satisfy a wide range of email journaling requirements. Additionally, Mail Manager’s archiving capability supports multiple, concurrent archiving destinations that can be independently searched and exported on demand.

In this blog post, we discuss how Amazon WorkMail and Amazon Simple Email Service (SES) Mail Manager make email journaling easier to set up and use, more cost-effective and versatile. We’ll walk the reader through setting up email journaling for an Amazon WorkMail organization that uses SES Mail Manager’s routing, processing, and archiving features.

SES Mail Manager as Journaling Destination for WorkMail

For our purposes, we’ll assume you’ve already set up WorkMail as your mailbox provider, but the process described below will work with the journaling features of most 3rd party email solutions. If you want to explore Amazon WorkMail, visit the getting started documentation here.

In the following sections, we’ll describe how to configure WorkMail journaling to send full email journals to SES Mail Manager’s archives. We’ll define different retention periods for each archive to demonstrate how this solution can be used to meet both short and long-term retention requirements. Finally, we’ll use the AWS SES Mail Manager console to search, export, and manage the email journals and archives.

In our examples, we’ll use Amazon Route 53 to create a new domain called ‘journaling.solutions’ which we’ll configure to send all ‘@journaling.solutions’ emails to an SES Mail Manager Ingest endpoint. To begin, open the AWS Console, navigate to your WorkMail Organization’s settings, and click on the Journaling tab:

Organization settings Journaling tab

Organization settings Journaling tab

Click Edit, enable journaling, and provide a journaling email address (we’re using ‘[email protected]’) to receive journaled content. Provide a report email address, such as the admin email list, to receive journaling reports:

Provide a Journaling email address

Provide a Journaling email address

Open the AWS SES console in a new browser window, and navigate to Mail Manager’s Rule sets. Create a new rule set called ‘journaling-rule-demo’. Click Edit and create a new rule called “journal-all”, with an Archive action. Click the create an archive button and create an archive called ‘journaling-archive-demo’:

Create a new Rule Set called ‘Journaling-rule-demo’

Create a new Rule Set called ‘Journaling-rule-demo’

When creating Mail Manager archives, you have options to set the retention period from 3 months to permanent storage. You can also choose to encrypt your archived messages with your own KMS key. The configuration in our example is for permanent storage and shows the optional text field for using your own KMS key:

you can encrypt the archived messages with your own KMS key

you can encrypt the archived messages with your own KMS key

Traditional journaling calls for recording every email message to the journal, so for our ‘journal-all’ rule, we will not define filtering behaviors in the rule set. This will instruct Mail manager to send all emails for [email protected] to the journaling-archive-demo archive. It is worth noting that Mail Manager’s rule set can be configured to filter and independently process multiple recipient addresses. Consult the documentation to learn about other ways to customize Mail Manager for your use cases.

Next, create a new traffic policy, called journaling-traffic-demo, and configure it to reject any message not explicitly sent to the journaling destination address ([email protected]):

Create a new Traffic policy, called ‘Journaling-traffic-demo’

Create a new Traffic policy, called ‘Journaling-traffic-demo’

Create an open ingress endpoint called ‘journaling-demo-IG’, and select the ‘journaling-traffic-demo’ traffic policy and ‘journaling-rule-demo’ rule set:

Create an Open Ingress endpoint called ‘Journaling-demo-IG’,

Create an Open Ingress endpoint called ‘Journaling-demo-IG’,

After you press the create Ingest endpoint button, Mail Manager will create an Ingress endpoint and assign it a DNS A Record to be used in your DNS configurations to route email to Mail Manager:

Mail Manager Ingress endpoint DNS A Record to be used in your DNS configurations

Mail Manager Ingress endpoint DNS A Record to be used in your DNS configurations

From the General details page of the Ingress endpoint, copy the Ingress endpoint’s DNS A Record to your clipboard. Open a new browser window to your DNS provider’s MX configuration page (in our example below, we’re using AWS Route53). Edit the MX record for ‘journaling.solutions’ by pasting the Ingress endpoint A record. This configuration will route email sent to any address ‘@journaling.solutions’ to the Mail Manager’s Ingress endpoint for processing by the Traffic policy and Rule set:

Using AWS Route53 to edit MX record for ‘journaling.solutions’ to Ingress endpoint A record

Using AWS Route53 to edit MX record for ‘journaling.solutions’ to Ingress endpoint A record

To test your new journaling configuration, send several emails to several email addresses in your WorkMail organization (or the alternative inbox provider you configured in the first step). WorkMail (or your alternative inbox provider) will send a full record of all emails to the journaling destination address ([email protected]).

Wait a few minutes after sending the emails above, then open the AWS Mail Manager console’s archiving controls and search for messages sent in the last 12 hours:

AWS Mail Manager console’s archiving controls

AWS Mail Manager console’s archiving controls

The example above shows a search for all messages received in the “last 12 hours”, with no other filters specified. The results show every message inserted into the archive in this timeframe. You’ll see one entry where the from address is different (from toby@tegwj@…). This is an example of mail that was sent directly to the journaling destination address ([email protected]). This works because our traffic policy and rule set configurations don’t include any filters.

A cost effective solution at scale

Using Mail Manager as a journaling solution gives you more direct control over your costs than typical journaling services. While most journaling services in the market today charge a fixed rate per journaled mailbox, Mail Manager pricing is comprised of a monthly fixed fee per ingestion endpoint and consumption pricing for basic message handling, and the amount of data archived.

For example, imagine your organization has 250 mailboxes, each handling 50 messages per day. On a monthly basis this amounts to 375,000 messages. If we assume each message is 40 kilobytes in size, your organization is generating roughly 15 gigabytes of email per month. As you can see from the table below, the total cost in month 1 is about $140, or $0.56/mailbox.

|Item |Unit Price |Volume |Subtotal/Mo |
|— |— |— |— |
|Ingress Endpoint |$50/mo |1 |$50 |
|Core message processing |$0.15/1000 msgs |375 |$56.25 |
|Archive insertion/indexing |$2/GB (one-time) |15 |$30 |
|Archive storage |$0.19/GB/mo |15 |$2.85 |
|Subtotal: | | |$139.10 |
| |Monthly price per mailbox |$0.56 |

If the proposed email rate in our assumptions stays constant, the Mail Manager archive will grow by 15 gigabytes each month. After 36 months, the total monthly storage cost increases to $102.60. This results in a total monthly spend in month 36 of $238.85, or $0.96/mailbox/month.

Conclusion

In this blog post, we’ve explored how Amazon WorkMail and Amazon SES Mail Manager can provide a cost-effective and accessible solution for email journaling. By leveraging the flexible journaling capabilities of WorkMail and the archiving features of SES Mail Manager, organizations can easily satisfy rigorous compliance requirements around email retention and accessibility.

The combination of WorkMail’s journaling controls and SES Mail Manager’s rule-based archiving allows you to tailor your journaling solution to your specific needs. Whether you require short-term retention for audits or long-term preservation for legal and regulatory purposes, SES Mail Manager’s flexible archiving options have you covered with predictable and transparent costs that scale with your organization’s email volume.

If you’re looking for a modern, scalable, and cost-effective solution for your email journaling needs, we encourage you to explore the capabilities of Amazon SES Mail Manager. Get started today by visiting the AWS documentation and begin streamlining your email compliance and retention processes.

About the Authors

Toby Weir-Jones

Toby Weir-Jones

Toby is a Principal Product Manager for Amazon SES and WorkMail. He joined AWS in January 2021 and has significant experience in both business and consumer information security products and services. His focus on email solutions at SES is all about tackling a product that everyone uses and finding ways to bring innovation and improved performance to one of the most ubiquitous IT tools.

Zip

Zip

Zip is a Sr. Specialist Solutions Architect at AWS, working with Amazon Pinpoint and Simple Email Service and WorkMail. Outside of work he enjoys time with his family, cooking, mountain biking, boating, learning and beach plogging.

Andy Wong

Andy Wong

Andy Wong is a Sr. Product Manager with the Amazon WorkMail team. He has 10 years of diverse experience in supporting enterprise customers and scaling start-up companies across different industries. Andy’s favorite activities outside of technology are soccer, tennis and free-diving.

Bruno Giorgini

Bruno Giorgini

Bruno Giorgini is a Senior Solutions Architect specializing in Pinpoint and SES. With over two decades of experience in the IT industry, Bruno has been dedicated to assisting customers of all sizes in achieving their objectives. When he is not crafting innovative solutions for clients, Bruno enjoys spending quality time with his wife and son, exploring the scenic hiking trails around the SF Bay Area.

Achieving Frugal Architecture using the AWS Well-Architected Framework guidance

Post Syndicated from Ashley DeLoach original https://aws.amazon.com/blogs/architecture/achieving-frugal-architecture-using-the-aws-well-architected-framework-guidance/

As part of the re:Invent 2023 keynote, Dr. Werner Vogels introduced the Frugal Architect mindset. This mindset emphasizes the importance of continuous learning, curiosity, and regular revision of architectural choices with a focus on cost and sustainability. Cost and sustainability should be treated as critical non-functional requirements, alongside factors like security, compliance, and performance. The Frugal Architect approach involves measuring and optimizing cost at every stage of the development process, which allows for innovation in parallel with promoting responsible resource usage. In the rapidly-evolving technology landscape, builders should adopt the Frugal Architect mindset to balance innovation with cost efficiency and environmental sustainability.

This blog discusses how the six pillars of the AWS Well-Architected Framework (operational excellence, security, reliability, performance efficiency, cost optimization, and sustainability) align with the seven Frugal Architect laws. It demonstrates how adhering to the principles and best practices outlined in these pillars can help architects and builders effectively implement the Frugal Architect laws in their projects. The Well-Architected Framework provides a comprehensive set of guidelines that embed the concepts of frugality, efficiency, and cost effectiveness, which are the core tenets of the Frugal Architect laws. By following the Framework’s pillars, architects can build secure, reliable, efficient, and cost-optimized systems and promote sustainability.

Make Cost a Non-functional Requirement (Law 1)

Non-functional requirements are criteria that evaluate a system’s operation instead of its specific features or functionality. This includes aspects like accessibility, availability, scalability, security, portability, maintainability, and compliance. However, one crucial non-functional requirement that is often overlooked is cost. Consider implications early on and throughout the design, development, and operation of your systems. Organizations can strike a balance between desired features, time-to-market, and operational efficiency through early prioritization of cost considerations. The Frugal Architect argues that you should treat cost as a fundamental non-functional requirement that should be given upfront consideration when planning and initiating system development projects.

The Cost Optimization Pillar of the AWS Well-Architected Framework provides guidance on how to optimize costs when using AWS Cloud services. It emphasizes treating cost as a key requirement, not an afterthought. The main principles focus on the importance of a robust financial management processes, adoption of a cloud consumption model that allows for flexible scaling and pay-per-use billing, continual measurement of outputs against costs to optimize efficiency, use of managed services to minimize operational overhead, and implementation of transparent cost attribution to tie cloud spending to revenue sources and workloads. Organizations that follow these practices can effectively manage and optimize their costs and benefit from the scalability and agility of cloud computing.

These cost optimization principles can help organizations maximize the financial benefits of using the AWS Cloud and avoid wasteful spending. Cost optimization is an ongoing process that includes rightsizing, higher output for the same cost, and use of the most cost-effective AWS services. The pillar promotes a disciplined approach to evaluate trade-offs between cost and other optimization areas like performance or reliability. Overall, you can use this pillar to make informed decisions to provision and operate AWS services cost-effectively.

Systems that Last Align Cost to Business (Law 2)

The durability and longevity of a system are closely tied to how well its costs align with the underlying business model. During the creation of a system, consider revenue sources and profit drivers. The key is to identify the primary dimension or aspect that generates revenue, and then verify that the system architecture supports and optimizes for that revenue-generating dimension. Essentially, revenue and profitability considerations should be the primary forces behind cost decisions in system design.

The AWS Well-Architected Cost Optimization Pillar provides practices and guidance for organizations to accurately monitor their AWS costs and usage. This visibility helps users understand the profitability of different business units and products, which facilitates informed decisions on resource allocation across the organization. Organizations can implement these practices to gain insights into their AWS spending patterns, which aids in development of effective cost optimization strategies. Overall, accurate expenditure analysis and attribution are crucial for organizations to optimize cloud costs, measure ROI, and make data-driven resource allocation decisions.

It’s important to accurately identify and attribute cloud costs to specific workloads. The cloud allows for transparent cost attribution, which helps organizations link costs to individual revenue streams and workload owners. This granular cost attribution data empowers workload owners to measure return on investment (ROI) for their workloads. With detailed cost information, workload owners can optimize resource utilization and reduce costs by rightsizing resources, eliminating waste, and making informed decisions. Organizations must use accurate cost attribution to understand where their cloud spending is going and verify that resources are being used efficiently across different workloads and revenue streams.

Architecting is a Series of Trade-Offs (Law 3)

Architectural decisions involve trade-offs, particularly between cost, resilience, and performance. Systems will inevitably fail, so investment in resilience is important but may impact performance. It’s important to find the right balance between technical requirements and business needs and align with risk tolerance and budget constraints. Frugality is about maximizing value, not just minimizing spend. Frugality means that you determine what you’re can pay for based on your priorities and make informed trade-off decisions. Ultimately, architectural choices require careful consideration of the tensions between different non-functional requirements.

The AWS Well-Architected Framework helps you make architectural trade-offs through its design principles and practices across its six pillars with your business requirements in mind. As you architect workloads, you make trade-offs between pillars based on your business context. You might optimize to improve the sustainability impact and reduce cost at the expense of reliability in development environments, or for mission-critical solutions. You might optimize reliability with increased costs and sustainability impact. In ecommerce solutions, performance can affect revenue and customer propensity to buy. Security generally is not a viable trade-off against the other pillars.

Rather than optimizing for any single pillar, the Framework guides a holistic evaluation across all pillars to determine the right architectural approach. Organizations can use AWS best practices while they find the optimal balance that aligns with their unique requirements. The key is making intentional trade-off decisions instead of following any uniform approach.

Unobserved Systems Lead to Unknown Costs (Law 4)

Without proper observation and measurement, the true operational costs of a system remain hidden, and wasteful practices can persist unnoticed. Just as exposing a utility meter prompts more mindful usage, visibility increases into costs can drive more sustainable behaviors. While implementing comprehensive monitoring requires upfront investment, the long-term benefits of conserving resources and optimizing efficiency make it a worthwhile endeavor. Ultimately, you should maintain cost awareness to foster a culture of responsible, sustainable practices.

The Operational Excellence Pillar of the AWS Well-Architected Framework emphasizes the importance of observability to gain actionable insights into workloads. This involves creation of key performance indicators (KPIs) and use of observability data telemetry to comprehensively understand workload behavior, performance, reliability, cost, and health. Organizations can implement observability best practices to make informed decisions and take prompt action when business outcomes are at risk due to issues with workload operation. Observability data provides visibility into the current state and helps identify areas for improvement. This means that organizations can be proactive in performance optimization, reliability enhancement, and cost reduction based on the actionable insights derived from observability telemetry data. Overall, observability is crucial for maintenance of operational excellence through the use of data-driven decision-making and continuous improvement of workloads.

Overall, monitoring guidance is a core component across multiple pillars of the Well-Architected Framework, as it helps organizations effectively manage and optimize their cloud workloads. For more detail on the monitoring principles of the AWS Well-Architected Framework, see Cost-Aware Architectures Implement Cost Controls (Law 5).

Cost-Aware Architectures Implement Cost Controls (Law 5)

The key aspects of frugal architecture combine granular controls with robust monitoring to identify areas for optimization. This helps you optimize costs and maintain a good user experience. With a robust monitoring system, you can take action where improvements are needed.

The AWS Well-Architected Framework aligns with the concept of frugality, which focuses on maximizing value rather than just minimizing spending. The Framework helps businesses achieve maximum value by making architectural choices that meet their specific requirements.

The Cost Optimization Pillar emphasizes the continual monitoring of usage and costs to identify opportunities for efficiency improvements and cost savings. This includes expenditure analysis, adoption of consumption-based models, and implementation of cloud financial management practices.

The Security Pillar, Reliability Pillar, and Performance Efficiency Pillar reinforce the importance of monitoring systems, workloads, and costs in real-time to maintain security, automatically recover from failures, and optimize performance relative to cost.

The Sustainability Pillar focuses on measurement of a workload’s current and forecasted environmental impact. It recommends continual evaluation of new hardware and software offerings that can reduce the environmental footprint.

Overall, monitoring guidance spans multiple Well-Architected pillars to maximize value through optimization of cost, performance, security, reliability, and sustainability.

Cost Optimization is Incremental (Law 6)

Cost efficiency is a continuous process, not a one-time goal. Regularly monitor your systems to identify inefficient patterns and areas for optimization. Revisit and refine systems periodically to find additional opportunities for improvement and further reduce costs over time.

The Cost Optimization Pillar covers principles like analysis and attribution of expenditure, measurement of overall efficiency, adoption of a consumption model, and implementation of cloud financial management practices.

Additionally, the Operational Excellence Pillar provides principles that apply not just to cost optimization but all pillars. These include observability for actionable insights, safe automation where possible, frequent small reversible changes, frequent refinement of operations procedures, anticipation of failure, and documentation and distribution of learning from operational events and metrics.

Organizations can follow these AWS Well-Architected Framework principles and their practices to continuously improve their cloud architectures and operations and optimize costs effectively.

Unchallenged Success Leads to Assumptions (Law 7)

We should continue to reevaluate past approaches, even those that were previously successful. Just because something worked before does not mean that it is still the best method. Grace Hopper, a computer scientist, mathematician, and United States Navy rear admiral, cautioned against blind adherence to tradition, saying that “we’ve always done it this way” is a dangerous mindset. We must be willing to question the old ways and explore new and potentially better methods.

The AWS Well-Architected Framework advocates for an evolutionary architecture approach to system design. Traditional architectures are often designed as static, with only a few major version updates during the system’s lifetime. However, as businesses and requirements change over time, initial architectural decisions can limit the ability to adapt and evolve the system. Cloud computing enables capabilities like automated testing and lower-risk design changes, which allows systems to evolve continually rather than being constrained by the original design. An evolutionary architecture positions businesses to take advantage of new innovations and changes as part of standard practice. Rather than being locked into original architectural choices, an evolutionary approach fosters ongoing adaptation and modernization as requirements shift. This contrasts with traditional fixed architectures that make it difficult to evolve over time and provides greater flexibility to evolve systems iteratively.

The Operational Excellence Pillar includes implementation of observability to understand system behavior, safe automation of processes, frequent but reversible changes, regular refinement of operations procedures, proactive anticipation potential failures proactively, and distribution of learnings from operational events and metrics to drive continuous improvement.

Overall, the Well-Architected Framework provides guidance on evolutionary architecture and operations processes to effectively manage increasing software complexity over time.

Conclusion

Frugality is about maximizing value, rather than just minimizing costs. Following AWS Well-Architected Framework best practices regarding security, reliability, and operational excellence can help realize frugal yet robust architectures. True frugality involves optimizing costs by aligning spending with areas that deliver the highest business value and impact. The Well-Architected Framework provides guidance for making architectural decisions that increase efficiency, lower risks, and maximize return on cloud investments. This involves determining priorities, understanding sources of value, and making informed trade-off decisions based on those priorities. It’s important to avoid indiscriminate cost-cutting and instead focus on resources on what matters most to drive value for the organization. By following Well-Architected best practices, companies can practice frugality in a strategic way that balances optimization with business goals.

Start your Frugal Architecture journey with AWS Well-Architected today by reading the documentation or visiting the AWS Well-Architected Tool in the console.

How to centrally manage secrets with AWS Secrets Manager

Post Syndicated from Shagun Beniwal original https://aws.amazon.com/blogs/security/how-to-centrally-manage-secrets-with-aws-secrets-manager/

In today’s digital landscape, managing secrets, such as passwords, API keys, tokens, and other credentials, has become a critical task for organizations. For some Amazon Web Services (AWS) customers, centralized management of secrets can be a robust and efficient solution to address this challenge. In this post, we delve into using AWS data protection services such as AWS Secrets Manager and AWS Key Management Service (AWS KMS) to help make secrets management easier in your environment by centrally managing them from a designated AWS account.

Centralized secrets management involves the consolidation of sensitive information into a single, secure repository. This repository acts as a centralized vault where secrets are stored, accessed, and managed with strict security controls. Centralizing secrets can help organizations enforce uniform security policies, streamline access control, and mitigate the risk of unauthorized access or leakage.

This approach offers several key benefits. First, it can enhance security by reducing the threat surface and providing a single point of control for managing access to sensitive information. Additionally, centralized secrets management can facilitate compliance with regulatory requirements by enforcing strict access controls and audit trails.

Furthermore, centralization promotes efficiency and scalability by enabling automated workflows for secret rotation, provisioning, and revocation. This automation reduces administrative tasks and minimizes the risk of human error, enhancing overall operational excellence.

Overview

In this post, we’ll walk you through how to set up a centralized account for managing your secrets and their lifecycle by using AWS Lambda rotation functions. Furthermore, to facilitate efficient access and management across multiple member accounts, we’ll discuss how to establish tunnelling through VPC peering to enable seamless communication between the Centralized Security Account in this architecture and the associated member accounts.

Notably, applications within the member accounts will directly access the secrets stored in the Centralized Security Account through the use of resource policies, streamlining the retrieval process. Additionally, using AWS provided DNS within the Centralized Security Account’s virtual private cloud (VPC) will automate the resolution of database host addresses to their respective control plane IP addresses. This functionality allows AWS Lambda function traffic to efficiently traverse the peering connection, enhancing overall system performance and reliability.

Figure 1 shows the solution architecture. The architecture has four accounts that are managed through AWS Organizations. Out of these four accounts, there are three workload accounts designated as Account A, Account B, and Account C that host the application and database for serving user requests, and a Centralized Security Account from which the secrets will be maintained and managed. VPC 1 from every workload account (Account A, Account B, and Account C) is peered with VPC 1 (part of the Centralized Security Account) to allow communication between workload accounts and the secrets management account. For high availability, secrets are also replicated to a different AWS Region.

Figure 1: Sample solution architecture for centrally managing secrets

Figure 1: Sample solution architecture for centrally managing secrets

Deploy the solution

Follow the steps in this section to deploy the solution.

Step 1: Create secrets, including database secrets, in your Centralized Security Account

First, create the secrets you want to use for this walkthrough. For example, the database secrets will have a following parameters:

{
    "engine": " sql”,
    "username": " admin ",
    "password": "EXAMPLE-PASSWORD",
    "host": "<cross account DB host URL>",
    "dbInstanceIdentifier": "<cross account DB instance identifier>"
    "port": "3306"
}

To create a database secret (console)

  1. Open the AWS Secrets Manager console in the Centralized Security Account.
  2. Choose Store a new secret.
  3. Choose Credentials for other database and provide the user name and password.

    Figure 2: Create and store a new secret using Secrets Manager

    Figure 2: Create and store a new secret using Secrets Manager

  4. For Encryption key, use the instructions in the AWS KMS documentation to create and choose the AWS KMS key that you want Secrets Manager to use to encrypt the secret value. Because you need to access the secret from another AWS account, make sure you are using an AWS KMS customer managed key (CMK).

    Important: Make sure that you do NOT use aws/secretsmanager, because it is an AWS managed key for Secrets Manager and you cannot modify the key policy.

    Figure 3: Select the encryption key to encrypt the secret created

    Figure 3: Select the encryption key to encrypt the secret created

    AWS Secrets Manager makes it possible for you to replicate secrets across multiple AWS Regions to provide regional access and low-latency requirements. If you turn on rotation for your primary secret, Secrets Manager rotates the secret in the primary Region, and the new secret value propagates to the associated Regions. Rotation of replicated secrets does not have to be individually managed.

    Note: When replicating a secret in Secrets Manager, you have the option to choose between using a multi-Region key (MRK) or an independent KMS key in the Region where the secrets are replicated. Your choice depends on your specific requirements such as operational preferences, regulatory compliance, and ease of management.

  5. For Database, select the database from the list of supported database types displayed and provide the host URL in the server address field, the database name, and the port number. Choose Next.

    Figure 4: Selecting the database and providing the database details

    Figure 4: Selecting the database and providing the database details

  6. For Configure secret, provide a secret name (for example, PostgresAppUser) and optionally add a description and tags. The resource permissions required to access the secret from across accounts will be explained later in this post.

    (Optional) Under Replicate secret, select other Regions and customer managed KMS keys from respective Regions to replicate this secret for high availability purposes, and then choose Next.

  7. The next screen will ask you to configure automatic rotation, but you can skip this step for now because you will create the rotation Lambda function in Step 2. Choose Next and then Store to finish saving the secret.

    Note: Secrets Manager rotation uses a Lambda function to update the secret and the database or service. After the secret is created, you must create a rotation Lambda function separately and attach it to the secret for rotating it. This detailed process is covered in the following steps.

Step 2: Deploy the rotation Lambda function where needed

For secrets that require automatic rotation to be turned on, deploy the rotation Lambda function from the serverless application list.

To deploy the rotation Lambda function

  1. In the Centralized Security Account, open the AWS Lambda console.
  2. In the left navigation menu, choose Applications, and then choose Create application.
  3. Choose Serverless Application and then choose the Public Applications tab.
  4. Make sure you have selected the checkbox for Show apps that create custom IAM roles or resource policies.

    Figure 5: Create a rotation Lambda function in the centralized security account for secret rotation

    Figure 5: Create a rotation Lambda function in the centralized security account for secret rotation

  5. In the search field under Serverless application, search for SecretsManager, and the available functions for rotation will be displayed. Choose the Lambda function based on your DB engine type. For example, if the DB engine type is Postgres SQL, select SecretsManagerRDSPostgreSQLRotationSingleUser from the list by choosing the application name.

    Figure 6: Choosing the AWS provided PostgreSQL rotation function (optionally you may choose a different rotation Lambda function)

    Figure 6: Choosing the AWS provided PostgreSQL rotation function (optionally you may choose a different rotation Lambda function)

  6. On the next page, under Application settings, provide the requested details for the following settings:
    1. functionName (for example, PostgresDBUserRotationLambda)
    2. endpoint – For the SecretsManagerRDSPostgreSQLRotationSingleUser option, in the endpoint field, add https://secretsmanager.us-east-1.amazonaws.com. (Choose the Secrets Manager service endpoint based on the Region where the rotation Lambda is created.)
    3. kmsKeyArn – Used by the secret for encryption.
    4. vpcSecurityGroupIds Provide the security group ID for the rotation Lambda function. Under the outbound rules tab of the security group attached to the rotation Lambda, add the required rules for the Lambda function to communicate with the Secrets Manager service endpoint and database. Also, make sure that the security groups attached to your database or service allow inbound connections from the Lambda rotation function.
    5. vpcSubnetIds – When you provide vpcSubnetIDs, provide subnets of a VPC from the Centralized Security Account where you are planning to deploy your rotation Lambda functions.

    Figure 7: Set up rotation Lambda configuration

    Figure 7: Set up rotation Lambda configuration

  7. Select the checkbox next to I acknowledge that this app creates custom IAM roles and resource policies, and then choose Deploy. This will create the required Lambda function to rotate your secret.
  8. Navigate to the Secrets Manager console and edit the secret to turn on automatic rotation (for instructions, see the Secrets Manager documentation).

    Figure 8: Editing the rotation in the Secrets Manager console

    Figure 8: Editing the rotation in the Secrets Manager console

    Set a rotation schedule according to your organization’s data security strategy.

  9. For Lambda rotation function, select the new Lambda function PostgresDbUserRotationLambda that you created in the previous step to associate it with the secret.

    Figure 9: The rotation configuration settings in the Secrets Manager console

    Figure 9: The rotation configuration settings in the Secrets Manager console

Step 3: Set up networking for Lambda to reach the Secrets Manager service endpoint

To provide connectivity to the Lambda function, you can either deploy a VPC endpoint with Private DNS enabled or a NAT gateway.

Deploy a VPC endpoint with Private DNS enabled

To create an Amazon VPC endpoint for AWS Secrets Manager (recommended)

  1. Open the Amazon VPC console, choose Endpoints, and then choose Create endpoint.
  2. For Service category, select AWS services. In the Service Name list, select the Secrets Manager endpoint service named com.amazonaws.<Region>.secretsmanager.

    Figure 10: Create a VPC endpoint for Secrets Manager

    Figure 10: Create a VPC endpoint for Secrets Manager

  3. For VPC, specify the VPC you want to create the endpoint in. This should be the VPC that you selected for hosting centralized secret rotation using the AWS Lambda function.
  4. To create a VPC endpoint, you need to specify the private IP address range in which the endpoint will be accessible. To do this, select the subnet for each Availability Zone (AZ). This restricts the VPC endpoint to the private IP address range specific to each AZ and also creates an AZ-specific VPC endpoint. Specifying more than one subnet-AZ combination helps improve fault tolerance and make the endpoint accessible from a different AZ in case of an AZ failure.
  5. Select the Enable DNS name checkbox for the VPC endpoint. Private DNS resolves the standard Secrets Manager DNS hostname https://secretsmanager.<Region>.amazonaws.com. to the private IP addresses associated with the VPC endpoint specific DNS hostname.

    Figure 11: Set up VPC endpoint configurations

    Figure 11: Set up VPC endpoint configurations

  6. Associate a security group with this endpoint (for instructions, see the AWS PrivateLink documentation). The security group enables you to control the traffic to the endpoint from resources in your VPC. The attached security group should accept inbound connections from the Lambda function for rotation on port 443.

    Figure 12: Attaching the security group to the VPC endpoint

    Figure 12: Attaching the security group to the VPC endpoint

Create a NAT gateway

Alternatively, you can give your function internet access. Place the function in private subnets and route the outbound traffic to a NAT gateway in a public subnet. The NAT gateway has a public IP address and connects to the internet through the VPC’s internet gateway. To create a NAT gateway, follow the steps described in this AWS re:post article.

Step 4: Deploy VPC peering

Next, deploy VPC peering between the Centralized Security Account and the member accounts that hold the database.

To deploy VPC peering

  1. Open the Amazon VPC console in the Centralized Security Account.
  2. In the left navigation pane, choose Peering connections, and then choose Create peering connection.
  3. Configure the following information, and choose Create peering connection when you are done:
    1. Name – You can optionally name your VPC peering connection, for example central_secret_management_vpc_peer.
    2. VPC ID (Requester) – Select the centralized secret management AWS Lambda VPC in your account with which you want to create the VPC peering connection.
    3. Account – Choose Another account.
    4. Account ID – Enter the ID of the AWS account that owns the database.

      Figure 13: Create VPC peering connection

      Figure 13: Create VPC peering connection

    5. VPC ID (Accepter) – Enter the ID of the database VPC with which to create the VPC peering connection.

      Figure 14: Create VPC peering connection – Entering the VPC ID

      Figure 14: Create VPC peering connection – Entering the VPC ID

  4. From the database account, navigate to the Amazon VPC console. Choose Peering connections and then choose Accept request.

    Figure 15: Accepting the VPC peering request from the database account (Accounts A, B, and C)

    Figure 15: Accepting the VPC peering request from the database account (Accounts A, B, and C)

  5. Add a route to the route tables in both VPCs so that you can send and receive traffic across the peering connection. Each table has a local route and a route that sends traffic for the peer VPC to the VPC peering connection.

    Figure 16: Sample table to show VPC peering connections between the Centralized Security Account and application/database accounts

    Figure 16: Sample table to show VPC peering connections between the Centralized Security Account and application/database accounts

  6. Perform the following steps in the Centralized Security Account:
    1. Open the Amazon VPC console in the Centralized Security Account.
    2. Select the Centralized Security Account Lambda VPC. Under Details, choose Main route table.
    3. Choose Edit routes, and then choose Add routes. Under Destination, add the database VPC CIDR (172.31.0.0/16) in an empty field. Under Target, select the peering connection you created in Step 3.
  7. Perform the following steps in Account 2, where the application/database is hosted:
    1. Open the VPC console in the database account.
    2. Select the Centralized Security Account Lambda VPC and then, under Details, choose Main route table.
    3. Choose Edit routes, and then choose Add routes. Under Destination, add the rotation Lambda VPC CIDR (10.0.0.0/16) in an empty field. Under Target, select the peering connection you created in Step 3.

Step 5: Set up resource-based policies on each secret

After the secrets are deployed into the Centralized Security Account, to allow application roles or users in other accounts to access the secrets (known as cross-account access), you must allow access in both a resource policy and in an identity policy. This is different than granting access to identities in the same account rather than the secret.

To set up resource-based policies on each secret

  1. Attach a resource policy to the secret in the Centralized Security Account by using the following steps:
    1. Open the Secrets Manager console. Remember to choose the Region that is appropriate for your setup.
    2. From the list of secrets, choose your secret.
    3. On the Secret details page, choose the Overview tab.
    4. Under Resource permissions, choose Edit permissions.
    5. In the Code field, attach or append the following resource policy statement, and then choose Save:
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::<account2-id>:role/ApplicationRole"
          },
          "Action": "secretsmanager:GetSecretValue",
          "Resource": "<ARN of secret to which this policy is attached>"
        }
      ]
    }

  2. Add the following resource policy statement to the key policy for the KMS key in the Centralized Security Account.
    {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::<account2-id>:role/ApplicationRole"
          },
          "Action": [
            "kms:Decrypt",
            "kms:DescribeKey"
          ],
          "Resource": "<kms-key-resource-arn>"
        }

    If there exists no policy on the key, add the following policy to the key.

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::<account2-id>:role/ApplicationRole"
          },
          "Action": [
            "kms:Decrypt",
            "kms:DescribeKey"
          ],
          "Resource": "<kms-key-resource-arn>"
        }
      ]
    }

  3. Attach an identity policy to the identity in the accounts where you hosted your applications to provide access to the secret and the KMS key used to encrypt the secret.
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": "secretsmanager:GetSecretValue",
          "Resource": "arn:aws:secretsmanager:<your-region>:<centralized-security-account-id>:secret:<secret-id>"
        },
        {
          "Effect": "Allow",
          "Action": "kms:Decrypt",
          "Resource": "arn:aws:kms:<your-region>:<centralized-security-account-id>:key/<key-id>"
        }
      ]
    }

The access policies mentioned here are just for the example in this post. In a production environment, only provide the needed granular permissions by exercising least privilege principles.

What challenges does this solution present, and how can you overcome them?

Along with the advantages discussed in this post, there are a few challenges you should anticipate while deploying this solution:

  1. Currently there is a maximum of 20,480 characters allowed in a resource-based permissions policy attached to a secret. For organizations where a large number of external accounts need to be given access to a secret, you will need to keep this quota in mind.
  2. There is also a limit on the total number of active VPC peering connections per VPC. By default, the limit is 50 connections, but this is adjustable up to 125. If you require more connections across VPCs, you can use other solutions, like a transit gateway, as an alternative.
  3. As the number of applications that require access to secrets from the Centralized Security Account increases, the number of external accesses will also increase, and access control might become difficult over time. To reduce the number of external accounts that have access to the Centralized Security Account, you may choose to use AWS IAM Access Analyzer.

Conclusion

In this post, we provided you with a step-by-step solution to establish a Centralized Security Account that uses the AWS Secrets Manager service for securely storing your secrets in a central place. The post outlined the process of deploying AWS Lambda functions to facilitate automatic rotation of necessary secrets. Furthermore, we delved into the implementation of VPC peering to provide uninterrupted connectivity between the rotation function and your databases or applications housed in different AWS accounts, helping to ensure smooth rotation.

Finally, we discussed the essential policies that are needed to enable applications to use these secrets through resource-based policies. This implementation provides a way for you to conveniently monitor and audit your secrets.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
 

Shagun Beniwal

Shagun Beniwal
Shagun is a Technical Account Manager at AWS. He manages Global System Integrators (GSIs) and Partners operating on AWS Enterprise Support. He is a member of the internal security community with focus areas in threat detection & incident response, infrastructure security, and IAM. Shagun helps customers achieve strategic business outcomes in security, resilience, cost optimization, and operations. You can follow Shagun on LinkedIn.

Navaneeth Krishnan Venugopal

Navaneeth Krishnan Venugopal
Navaneeth is a Cloud Support – Security Engineer II at AWS and an AWS Secrets Manager subject matter expert (SME). He is passionate about cybersecurity and helps provide tailored, secure solutions for a broad spectrum of technical issues faced by customers. Navaneeth has a focus on security and compliance and enjoys helping customers architect secure solutions on AWS.

Texas Sues GM for Collecting Driving Data without Consent

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/08/texas-sues-gm-for-collecting-driving-data-without-consent.html

Texas is suing General Motors for collecting driver data without consent and then selling it to insurance companies:

From CNN:

In car models from 2015 and later, the Detroit-based car manufacturer allegedly used technology to “collect, record, analyze, and transmit highly detailed driving data about each time a driver used their vehicle,” according to the AG’s statement.

General Motors sold this information to several other companies, including to at least two companies for the purpose of generating “Driving Scores” about GM’s customers, the AG alleged. The suit said those two companies then sold these scores to insurance companies.

Insurance companies can use data to see how many times people exceeded a speed limit or obeyed other traffic laws. Some insurance firms ask customers if they want to voluntarily opt-in to such programs, promising lower rates for safer drivers.

But the attorney general’s office claimed GM “deceived” its Texan customers by encouraging them to enroll in programs such as OnStar Smart Driver. But by agreeing to join these programs, customers also unknowingly agreed to the collection and sale of their data, the attorney general’s office said.

Press release. Court filing. Slashdot thread.

Build a serverless data quality pipeline using Deequ on AWS Lambda

Post Syndicated from Vivek Mittal original https://aws.amazon.com/blogs/big-data/build-a-serverless-data-quality-pipeline-using-deequ-on-aws-lambda/

Poor data quality can lead to a variety of problems, including pipeline failures, incorrect reporting, and poor business decisions. For example, if data ingested from one of the systems contains a high number of duplicates, it can result in skewed data in the reporting system. To prevent such issues, data quality checks are integrated into data pipelines, which assess the accuracy and reliability of the data. These checks in the data pipelines send alerts if the data quality standards are not met, enabling data engineers and data stewards to take appropriate actions. Example of these checks include counting records, detecting duplicate data, and checking for null values.

To address these issues, Amazon built an open source framework called Deequ, which performs data quality at scale. In 2023, AWS launched AWS Glue Data Quality, which offers a complete solution to measure and monitor data quality. AWS Glue uses the power of Deequ to run data quality checks, identify records that are bad, provide a data quality score, and detect anomalies using machine learning (ML). However, you may have very small datasets and require faster startup times. In such instances, an effective solution is running Deequ on AWS Lambda.

In this post, we show how to run Deequ on Lambda. Using a sample application as reference, we demonstrate how to build a data pipeline to check and improve the quality of data using AWS Step Functions. The pipeline uses PyDeequ, a Python API for Deequ and a library built on top of Apache Spark to perform data quality checks. We show how to implement data quality checks using the PyDeequ library, deploy an example that showcases how to run PyDeequ in Lambda, and discuss the considerations using Lambda for running PyDeequ.

To help you get started, we’ve set up a GitHub repository with a sample application that you can use to practice running and deploying the application.

Since you are reading this post you may also be interested in the following:

Solution overview

In this use case, the data pipeline checks the quality of Airbnb accommodation data, which includes ratings, reviews, and prices, by neighborhood. Your objective is to perform the data quality check of the input file. If the data quality check passes, then you aggregate the price and reviews by neighborhood. If the data quality check fails, then you fail the pipeline and send a notification to the user. The pipeline is built using Step Functions and comprises three primary steps:

  • Data quality check – This step uses a Lambda function to verify the accuracy and reliability of the data. The Lambda function uses PyDeequ, a library for data quality checks. As PyDeequ runs on Spark, the example employs the Spark Runtime for AWS Lambda (SoAL) framework, which makes it straightforward to run a standalone installation of Spark in Lambda. The Lambda function performs data quality checks and stores the results in an Amazon Simple Storage Service (Amazon S3) bucket.
  • Data aggregation – If the data quality check passes, the pipeline moves to the data aggregation step. This step performs some calculations on the data using a Lambda function that uses Polars, a DataFrames library. The aggregated results are stored in Amazon S3 for further processing.
  • Notification – After the data quality check or data aggregation, the pipeline sends a notification to the user using Amazon Simple Notification Service (Amazon SNS). The notification includes a link to the data quality validation results or the aggregated data.

The following diagram illustrates the solution architecture.

Implement quality checks

The following is an example of data from the sample accommodations CSV file.

id name host_name neighbourhood_group neighbourhood room_type price minimum_nights number_of_reviews
7071 BrightRoom with sunny greenview! Bright Pankow Helmholtzplatz Private room 42 2 197
28268 Cozy Berlin Friedrichshain for1/6 p Elena Friedrichshain-Kreuzberg Frankfurter Allee Sued FK Entire home/apt 90 5 30
42742 Spacious 35m2 in Central Apartment Desiree Friedrichshain-Kreuzberg suedliche Luisenstadt Private room 36 1 25
57792 Bungalow mit Garten in Berlin Zehlendorf Jo Steglitz – Zehlendorf Ostpreu√üendamm Entire home/apt 49 2 3
81081 Beautiful Prenzlauer Berg Apt Bernd+Katja 🙂 Pankow Prenzlauer Berg Nord Entire home/apt 66 3 238
114763 In the heart of Berlin! Julia Tempelhof – Schoeneberg Schoeneberg-Sued Entire home/apt 130 3 53
153015 Central Artist Appartement Prenzlauer Berg Marc Pankow Helmholtzplatz Private room 52 3 127

In a semi-structured data format such as CSV, there is no inherent data validation and integrity checks. You need to verify the data against accuracy, completeness, consistency, uniqueness, timeliness, and validity, which are commonly referred as the six data quality dimensions. For instance, if you want to display the name of the host for a particular property on a dashboard, but the host’s name is missing in the CSV file, this would be an issue of incomplete data. Completeness checks can include looking for missing records, missing attributes, or truncated data, among other things.

As part of the GitHub repository sample application, we provide a PyDeequ script that will perform the quality validation checks on the input file.

The following code is an example of performing the completeness check from the validation script:

checkCompleteness = VerificationSuite(spark)
.onData(dataset) \
.isComplete("host_name")

The following is an example of checking for uniqueness of data:

checkCompleteness = VerificationSuite(spark)
.onData(dataset) \
.isUnique ("id")

You can also chain multiple validation checks as follows:

checkResult = VerificationSuite(spark) \
.onData(dataset) \
.isComplete("name") \
.isUnique("id") \
.isComplete("host_name") \
.isComplete("neighbourhood") \
.isComplete("price") \
.isNonNegative("price")) \
.run()

The following is an example of making sure 99% or more of the records in the file include host_name:

checkCompleteness = VerificationSuite(spark)
.onData(dataset) \
.hasCompleteness("host_name", lambda x: x >= 0.99)

Prerequisites

Before you get started, make sure you complete the following prerequisites:

  1. You should have an AWS account.
  2. Install and configure the AWS Command Line Interface (AWS CLI).
  3. Install the AWS SAM CLI.
  4. Install Docker community edition.
  5. You should have Python 3

Run Deequ on Lambda

To deploy the sample application, complete the following steps:

  1. Clone the GitHub repository.
  2. Use the provided AWS CloudFormation template to create the Amazon Elastic Container Registry (Amazon ECR) image that will be used to run Deequ on Lambda.
  3. Use the AWS SAM CLI to build and deploy the rest of the data pipeline to your AWS account.

For detailed deployment steps, refer to the GitHub repository Readme.md.

When you deploy the sample application, you’ll find that the DataQuality function is in a container packaging format. This is because the SoAL library required for this function is larger than the 250 MB limit for zip archive packaging. During the AWS Serverless Application Model (AWS SAM) deployment process, a Step Functions workflow is also created, along with the necessary data required to run the pipeline.

Run the workflow

After the application has been successfully deployed to your AWS account, complete the following steps to run the workflow:

  1. Go to the S3 bucket that was created earlier.

You will notice a new bucket with the prefix as your stack name.

  1. Follow the instructions in the GitHub repository to upload the Spark script to this S3 bucket. This script is used to perform data quality checks.
  2. Subscribe to the SNS topic created to receive success or failure email notifications as explained in the GitHub repository.
  3. Open the Step Functions console and run the workflow prefixed DataQualityUsingLambdaStateMachine with default inputs.
  4. You can test both success and failure scenarios as explained in the instructions in the GitHub repository.

The following figure illustrates the workflow of the Step Functions state machine.

Review the quality check results and metrics

To review the quality check results, you can navigate to the same S3 bucket. Navigate to the OUTPUT/verification-results folder to see the quality check verification results. Open the file name starting with the prefix part. The following table is a snapshot of the file.

check check_level check_status constraint constraint_status
Accomodations Error Success SizeConstraint(Size(None)) Success
Accomodations Error Success CompletenessConstraint(Completeness(name,None)) Success
Accomodations Error Success UniquenessConstraint(Uniqueness(List(id),None)) Success
Accomodations Error Success CompletenessConstraint(Completeness(host_name,None)) Success
Accomodations Error Success CompletenessConstraint(Completeness(neighbourhood,None)) Success
Accomodations Error Success CompletenessConstraint(Completeness(price,None)) Success

Check_status suggests if the quality check was successful or a failure. The Constraint column suggests the different quality checks that were done by the Deequ engine. Constraint_status suggests the success or failure for each of the constraint.

You can also review the quality check metrics generated by Deequ by navigating to the folder OUTPUT/verification-results-metrics. Open the file name starting with the prefix part. The following table is a snapshot of the file.

entity instance name value
Column price is non-negative Compliance 1
Column neighbourhood Completeness 1
Column price Completeness 1
Column id Uniqueness 1
Column host_name Completeness 0.998831356
Column name Completeness 0.997348076

For the columns with a value of 1, all the records of the input file satisfy the specific constraint. For the columns with a value of 0.99, 99% of the records satisfy the specific constraint.

Considerations for running PyDeequ in Lambda

Consider the following when deploying this solution:

  • Running SoAL on Lambda is a single-node deployment, but is not limited to a single core; a node can have multiple cores in Lambda, which allows for distributed data processing. Adding more memory in Lambda proportionally increases the amount of CPU, increasing the overall computational power available. Multiple CPU with single-node deployment and the quick startup time of Lambda results in faster job processing when it comes to Spark jobs. Additionally, the consolidation of cores within a single node enables faster shuffle operations, enhanced communication between cores, and improved I/O performance.
  • For Spark jobs that run longer than 15 minutes or larger files (more than 1 GB) or complex joins that require more memory and compute resource, we recommend AWS Glue Data Quality. SoAL can also be deployed in Amazon ECS.
  • Choosing the right memory setting for Lambda functions can help balance the speed and cost. You can automate the process of selecting different memory allocations and measuring the time taken using Lambda power tuning.
  • Workloads using multi-threading and multi-processing can benefit from Lambda functions powered by an AWS Graviton processor, which offers better price-performance. You can use Lambda power tuning to run with both x86 and ARM architecture and compare results to choose the optimal architecture for your workload.

Clean up

Complete the following steps to clean up the solution resources:

  1. On the Amazon S3 console, empty the contents of your S3 bucket.

Because this S3 bucket was created as part of the AWS SAM deployment, the next step will delete the S3 bucket.

  1. To delete the sample application that you created, use the AWS CLI. Assuming you used your project name for the stack name, you can run the following code:
sam delete --stack-name "<your stack name>"
  1. To delete the ECR image you created using CloudFormation, delete the stack from the AWS CloudFormation console.

For detailed instructions, refer to the GitHub repository Readme.md file.

Conclusion

Data is crucial for modern enterprises, influencing decision-making, demand forecasting, delivery scheduling, and overall business processes. Poor quality data can negatively impact business decisions and efficiency of the organization.

In this post, we demonstrated how to implement data quality checks and incorporate them in the data pipeline. In the process, we discussed how to use the PyDeequ library, how to deploy it in Lambda, and considerations when running it in Lambda.

You can refer to Data quality prescriptive guidance for learning about best practices for implementing data quality checks. Please refer to Spark on AWS Lambda blog to learn about running analytics workloads using AWS Lambda.


About the Authors

Vivek Mittal is a Solution Architect at Amazon Web Services. He is passionate about serverless and machine learning technologies. Vivek takes great joy in assisting customers with building innovative solutions on the AWS cloud platform.

John Cherian is Senior Solutions Architect at Amazon Web Services helps customers with strategy and architecture for building solutions on AWS.

Uma Ramadoss is a Principal Solutions Architect at Amazon Web Services, focused on the Serverless and Integration Services. She is responsible for helping customers design and operate event-driven cloud-native applications using services like Lambda, API Gateway, EventBridge, Step Functions, and SQS. Uma has a hands on experience leading enterprise-scale serverless delivery projects and possesses strong working knowledge of event-driven, micro service and cloud architecture.

Improve the resilience of Amazon Managed Service for Apache Flink application with system-rollback feature

Post Syndicated from Subham Rakshit original https://aws.amazon.com/blogs/big-data/improve-the-resilience-of-amazon-managed-service-for-apache-flink-application-with-system-rollback-feature/

“Everything fails all the time” – Werner Vogels, CTO Amazon

Although customers always take precautionary measures when they build applications, application code and configuration errors can still happen, causing application downtime. To mitigate this, Amazon Managed Service for Apache Flink has built a new layer of resilience by allowing customers to opt for the system-rollback feature that will seamlessly revert the application to a previous running version, thereby improving application stability and high availability.

Apache Flink is an open source distributed processing engine that offers powerful programming interfaces for stream and batch processing. It also offers first-class support for stateful processing and event time semantics. Apache Flink supports multiple programming languages, including Java, Python, Scala, SQL, and multiple APIs with different levels of abstraction. These APIs can be used interchangeably in the same application.

Managed Service for Apache Flink is a fully managed, serverless experience in running Apache Flink applications, and it now supports Apache Flink 1.19.1, the latest released version of Apache Flink at the time of this writing.

This post explores how to use the system-rollback feature in Managed Service for Apache Flink.We discuss how this functionality improves your application’s resilience by providing a highly available Flink application. Through an example, you will also learn how to use the APIs to have more visibility of the application’s operations. This would help in troubleshooting application and configuration issues.

Error scenarios for system-rollback

Managed Service for Apache Flink operates under a shared responsibility model. This means the service owns the infrastructure to run Flink applications that are secure, durable, and highly available. Customers are responsible for making sure application code and configurations are correct. There have been cases where updating the Flink application failed due to code bugs, incorrect configuration, or insufficient permissions. Here are a few examples of common error scenarios:

  1. Code bugs, including any runtime errors encountered. For example, null values are not appropriately handled in the code, resulting in NullPointerException
  2. The Flink application is updated with parallelism higher than the max parallelism configured for the application.
  3. The application is updated to run with incorrect subnets for a virtual private cloud (VPC) application which results in failure at Flink job startup.

As of this writing, the Managed Service for Apache Flink application still shows a RUNNING status when such errors occur, despite the fact that the underlying Flink application cannot process the incoming events and recover from the errors.

Errors can also happen during application auto scaling. For example, when the application scales up but runs into issues restoring from a savepoint due to operator mismatch between the snapshot and the Flink job graph. This can happen if you failed to set the operator ID using the uid method or changed it in a new application.

You may also receive a snapshot compatibility error when upgrading to a new Apache Flink version. Although stateful version upgrades of Apache Flink runtime are generally compatible with very few exceptions, you can refer to the Apache Flink state compatibility table and Managed Service for Apache Flink documentation for more details.

In such scenarios, you can either perform a force-stop operation, which stops the application without taking a snapshot, or you can roll back the application to the previous version using the RollbackApplication API. Both processes need customer intervention to recover from the issue.

Automatic rollback to the previous application version

With the system-rollback feature, Managed Service for Apache Flink will perform an automatic RollbackApplication operation to restore the application to the previous version when an update operation or a scaling operation fails and you encounter the error scenarios discussed previously.

If the rollback is successful, the Flink application is restored to the previous application version with the latest snapshot. The Flink application is put into a RUNNING state and continues processing events. This process results in high availability of the Flink application with improved resilience under minimal downtime. If the system-rollback fails, the Flink application will be in a READY state. If this is the case, you need to fix the error and restart the application.

However, if a Managed Service for Apache Flink application is started with application or configuration issues, the service will not start the application. Instead, it will return in the READY state. This is a default behavior regardless of whether system-rollback is enabled or not.

System-rollback is performed before the application transitions to RUNNING status. Automatic rollback will not be performed if a Managed Service for Apache Flink application has already successfully transitioned to RUNNING status and later faces runtime issues such as checkpoint failures or job failures. However, customers can trigger the RollbackApplication API themselves if they want to roll back on runtime errors.

Here is the state transition flowchart of system-rollback.

Amazon Managed Service for Apache Flink State Transition

System-rollback is an opt-in feature that needs you to enable it using the console or the API. To enable it using the API, invoke the UpdateApplication API with the following configuration. This feature is available to all Apache Flink versions supported by Managed Service for Apache Flink.

Each Managed Service for Apache Flink application has a version ID, which tracks the application code and configuration for that specific version. You can get the current application version ID from the AWS console of the Managed Service for Apache Flink application.

aws kinesisanalyticsv2 update-application \
	--application-name sample-app-system-rollback-test \
	--current-application-version-id 5 \
	--application-configuration-update "{\"ApplicationSystemRollbackConfigurationUpdate\": {\"RollbackEnabledUpdate\": true}}" \
	--region us-west-1

Application operations observability

Observability of the application versions change is of utmost importance because Flink applications can be rolled back seamlessly from newly upgraded versions to previous versions in the event of application and configuration errors. First, visibility of the version history will provide chronological information about the operations performed on the application. Second, it will help with debugging because it shows the underlying error and why the application was rolled back. This is so that the issues can be fixed and retried.

For this, you have two additional APIs to invoke from the AWS Command Line Interface (AWS CLI):

  1. ListApplicationOperations – This API will list all the operations, such as UpdateApplication, ApplicationMaintenance, and RollbackApplication, performed on the application in a reverse chronological order.
  2. DescribeApplicationOperation – This API will provide details of a specific operation listed by the ListApplicationOperations API including the failure details.

Although these two new APIs can help you understand the error, you should also refer to the AWS CloudWatch logs for your Flink application for troubleshooting help. In the logs, you can find additional details, including the stack trace. Once you identify the issue, fix it and update the Flink application.

For troubleshooting information, refer to documentation .

System-rollback process flow

The following image shows a Managed Service for Apache Flink application in RUNNING state with Version ID: 3. The application is consuming data successfully from the Amazon Kinesis Data Stream source, processing it, and writing it into another Kinesis Data Stream sink.

Also, from the Apache Flink Dashboard, you can see the Status of the Flink application is RUNNING.

To demonstrate the system-rollback, we updated the application code to intentionally introduce an error. From the application main method, an exception is thrown, as shown in the following code.

throw new Exception("Exception thrown to demonstrate system-rollback");

While updating the application with the latest jar, the Version ID is incremented to 4, and the application Status shows it is UPDATING, as shown in the following screenshot.

After some time, the application rolls back to the previous version, Version ID: 3, as shown in the following screenshot.

The application now has successfully gone back to version 3 and continues to process events, as shown by Status RUNNING in the following screenshot.

To troubleshoot what went wrong in version 4, list all the application versions for the Managed Service for Apache Flink application: sample-app-system-rollback-test.

aws kinesisanalyticsv2 list-application-operations \
    --application-name sample-app-system-rollback-test \
    --region us-west-1

This shows the list of operations done on Flink application: sample-app-system-rollback-test

{
  "ApplicationOperationInfoList": [
    {
      "Operation": "SystemRollbackApplication",
      "OperationId": "Z4mg9iXiXXXX",
      "StartTime": "2024-06-20T16:52:13+01:00",
      "EndTime": "2024-06-20T16:54:49+01:00",
      "OperationStatus": "SUCCESSFUL"
    },
    {
      "Operation": "UpdateApplication",
      "OperationId": "zIxXBZfQXXXX",
      "StartTime": "2024-06-20T16:50:04+01:00",
      "EndTime": "2024-06-20T16:52:13+01:00",
      "OperationStatus": "FAILED"
    },
    {
      "Operation": "StartApplication",
      "OperationId": "BPyrMrrlXXXX",
      "StartTime": "2024-06-20T15:26:03+01:00",
      "EndTime": "2024-06-20T15:28:05+01:00",
      "OperationStatus": "SUCCESSFUL"
    }
  ]
}

Review the details of the UpdateApplication operation and note the OperationId. If you use the AWS CLI and APIs to update the application, then the OperationId can be obtained from the UpdateApplication API response. To investigate what went wrong, you can use OperationId to invoke describe-application-operation.

Use the following command to invoke describe-application-operation.

aws kinesisanalyticsv2 describe-application-operation \
    --application-name sample-app-system-rollback-test \
    --operation-id zIxXBZfQXXXX \
    --region us-west-1

This will show the details of the operation, including the error.

{
    "ApplicationOperationInfoDetails": {
        "Operation": "UpdateApplication",
        "StartTime": "2024-06-20T16:50:04+01:00",
        "EndTime": "2024-06-20T16:52:13+01:00",
        "OperationStatus": "FAILED",
        "ApplicationVersionChangeDetails": {
            "ApplicationVersionUpdatedFrom": 3,
            "ApplicationVersionUpdatedTo": 4
        },
        "OperationFailureDetails": {
            "RollbackOperationId": "Z4mg9iXiXXXX",
            "ErrorInfo": {
                "ErrorString": "org.apache.flink.runtime.rest.handler.RestHandlerException: Could not execute application.\n\tat org.apache.flink.runtime.webmonitor.handlers.JarRunOverrideHandler.lambda$handleRequest$4(JarRunOverrideHandler.java:248)\n\tat java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)\n\tat java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)\n\tat java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)\n\tat java.ba"
            }
        }
    }
}

Review the CloudWatch logs for the actual error information. The following code shows the same error with the complete stack trace, which demonstrates the underlying problem.

Amazon Managed Service for Apache Flink failed to transition the application to the desired state. The application is being rolled-back to the previous state. Please investigate the following error. org.apache.flink.runtime.rest.handler.RestHandlerException: Could not execute application.
at org.apache.flink.runtime.webmonitor.handlers.JarRunOverrideHandler.lambda$handleRequest$4(JarRunOverrideHandler.java:248)
at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)
...
...
...
Caused by: java.lang.Exception: Exception thrown to demonstrate system-rollback
at com.amazonaws.services.msf.StreamingJob.main(StreamingJob.java:101)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355)
... 12 more

Finally, you need to fix the issue and redeploy the Flink application.

Conclusion

This post has explained how to enable the system-rollback feature and how it helps to minimize application downtime in bad deployment scenarios. Moreover, we have explained how this feature will work, as well as how to troubleshoot underlying problems. We hope you found this post helpful and that it provided insight into how to improve the resilience and availability of your Flink application. We encourage you to enable the feature to improve resilience of your Managed Service for Apache Flink application.

To learn more about system-rollback, refer to the AWS documentation.


About the author

Subham Rakshit is a Senior Streaming Solutions Architect for Analytics at AWS based in the UK. He works with customers to design and build streaming architectures so they can get value from analyzing their streaming data. His two little daughters keep him occupied most of the time outside work, and he loves solving jigsaw puzzles with them. Connect with him on LinkedIn.

[$] Standards for use of unsafe Rust in the kernel

Post Syndicated from daroc original https://lwn.net/Articles/982868/

Rust is intended to let programmers write safer code.
But compilers are
not omniscient, and writing Rust code that interfaces with hardware (or that
works with memory outside of Rust’s lifetime paradigm) requires, at
some point, the programmer’s assurance that some operations are permissible. Benno Lossin

suggested adding
some more documentation
to

the Rust-for-Linux project
clarifying the
standards for commenting uses of unsafe in kernel code. There’s general
agreement that such standards are necessary, but less agreement on exactly when
it is appropriate to use unsafe.

ASRock Rack GNRD8-2L2T Intel Xeon 6 Granite Rapids Motherboard Shown

Post Syndicated from John Lee original https://www.servethehome.com/asrock-rack-gnrd8-2l2t-intel-xeon-6-granite-rapids-motherboard-shown/

We spotted the ASRock Rack GNRD8-2L2T. This motherboard will support the Intel Xeon 6 R1S CPUs with 136 PCIe Gen5/ CXL 2.0 lanes

The post ASRock Rack GNRD8-2L2T Intel Xeon 6 Granite Rapids Motherboard Shown appeared first on ServeTheHome.

Black Hat 2024: Key Takeaways and Industry Trends

Post Syndicated from Ryan Blanchard original https://blog.rapid7.com/2024/08/14/black-hat-2024-key-takeaways-and-industry-trends/

Black Hat 2024: Key Takeaways and Industry Trends

What a week! As Hacker Summer camp shifts into the rearview, it’s time to take a moment to reflect on the week, what we learned and the people we had the pleasure of meeting while out in Las Vegas. As is always the case at Black Hat 2024, the cybersecurity community was buzzing with the latest innovations and insights from their favorite vendors, industry speakers and training sessions. There was no shortage of information covered throughout the week, and with the sheer volume of it, it can be hard to catch everything going on. In this post I am going to do my part by attempting to summarize some of the key themes and takeaways from the event. So, with that, let’s get right to it.

  1. The rise of advanced threats: AI and machine learning at the forefront. One of the most striking themes at Black Hat 2024 was the sophistication of modern cyber threats. This year, sessions highlighted how attackers are leveraging artificial intelligence (AI) and machine learning (ML) to lower the barrier to entry, increase the scale and impact of attacks and circumvent traditional controls. From deepfake technology used in phishing schemes to AI-driven automated attacks, the industry is witnessing a new era of cyber threats that require equally advanced defensive strategies and continuous learning to ensure security teams keep pace with emerging trends and threat vectors.
  2. Zero trust and identity: the gradual shift towards never trust, always verify. Zero Trust was a major focal point at this year’s event. Experts and vendors alike emphasized the importance of adopting a Zero Trust approach to cybersecurity. This model, which operates on the principle of “never trust, always verify,” aims to minimize trust within and outside the network. The shift towards Zero Trust reflects the growing need for more robust security frameworks that can handle today’s complex threat environment.
  3. Software supply chain security: extending your defense beyond the perimete. Software supply chain attacks were a hot topic, underscoring the need for organizations to extend their security measures beyond their immediate environment. Black Hat 2024 reinforced the importance of securing not just your own systems but also those of your vendors, partners and the software dependencies that modern applications consist of. Discussions centered on strategies for improving supply chain resilience, shifting security visibility and gates earlier on in the development lifecycle and the role of continuous monitoring in mitigating these risks over time.
  4. Emerging technologies: navigating the new cybersecurity landscape. Black Hat 2024 showcased numerous emerging technologies and their implications for cybersecurity. Sessions explored the security challenges associated with Generative AI, blockchain, the Internet of Things (IoT) and Quantum Computing. As these technologies evolve, they bring both new opportunities and new risks, making it crucial for security professionals to stay informed and prepared.
  5. Training and awareness: building a culture of security. Many sessions emphasized the critical role of security training and awareness programs. With human error often cited as a leading cause of security incidents, organizations are increasingly focusing on educating their employees and fostering a culture of security awareness. Training programs that address current threats and promote best practices are becoming integral to comprehensive security strategies.

Keynote sessions did not disappoint

The keynote sessions at Black Hat are always one of my personal favorite parts, and this year was no exception. While there were a number of sessions I found insightful and well worth the watch, one in particular that stood out was Thursday’s Fireside chat with Moxie Marlinspike, the Founder of Signal, and Jeff Moss, the Founder of Black Hat and member of the U.S. Department of Homeland Security Advisory Council. During the session they covered a range of topics, but chief among them was the future of privacy and the balance between privacy and security.

Product launches: Surface Command and Exposure Command unveiled

Beyond rich discussions and cutting-edge presentations, we made some significant waves with the launch of Surface Command and Exposure Command, two exciting new product offerings designed to unify your attack surface and deliver effective hybrid risk management. We covered these new products a little more in-depth here, but to recap:

Surface Command: unifying your attack surface

Surface Command offers a unified view of both internal and external attack surfaces, breaking down data silos and providing a comprehensive picture of your environment. This tool helps organizations identify and address vulnerabilities more effectively.

Exposure Command: prioritizing critical threats with precision

Exposure Command extends these capabilities by enriching asset data with high-fidelity risk context, enabling teams to prioritize and address the most critical threats with greater precision.

These launches are a testament to Rapid7’s commitment to advancing cybersecurity and providing our customers with the tools they need to stay ahead of potential threats, and represent the next chapter in our mission to enable security teams to take command of their attack surface.

What’s Next for Rapid7?

Black Hat 2024 was a microcosm of the dynamic and rapidly evolving nature of the cybersecurity landscape. The insights gained and the innovations showcased will undoubtedly influence the industry’s approach to security in the coming years. As we move forward, the lessons from Black Hat and the invaluable direct feedback will inform our strategy and drive the development of new capabilities to meet the ever-changing demands of our customers and the industry at large.

As we wrap up our experiences from Black Hat 2024, it’s clear that the cybersecurity landscape is evolving rapidly, with new threats and technologies shaping the way we approach security. The insights gained from the event, along with the direct feedback from industry peers, will be instrumental in guiding our strategy at Rapid7. We’re excited to continue innovating and leading the charge in helping organizations take command of their attack surfaces. Stay tuned as we build on these insights to deliver even more powerful solutions in the coming months.

Security updates for Wednesday

Post Syndicated from jake original https://lwn.net/Articles/985654/

Security updates have been issued by AlmaLinux (389-ds-base), Debian (ffmpeg), Fedora (chromium), Red Hat (.NET 8.0, container-tools:rhel8, edk2, firefox, gnome-shell, grafana, jose, kernel, kernel-rt, krb5, open-vm-tools, orc, pcs, poppler, python-urllib3, and wget), SUSE (gtk2, gtk3, kernel, python-setuptools, python310-setuptools, python312-setuptools, python39-setuptools, and webkit2gtk3), and Ubuntu (dotnet8, libcroco, linux-azure, linux-lowlatency, linux-raspi, and linux-oracle).