Post Syndicated from Explosm.net original https://explosm.net/comics/pandas
New Cyanide and Happiness Comic
Post Syndicated from Explosm.net original https://explosm.net/comics/pandas
New Cyanide and Happiness Comic
Post Syndicated from Naranjan Goklani original https://aws.amazon.com/blogs/security/support-canadas-cccs-pbhva-overlay-compliance-with-the-landing-zone-accelerator-on-aws/
Organizations seeking to adhere to the Canadian Centre for Cyber Security (CCCS) Protected B High Value Assets (PBHVA) overlay requirements can use the Landing Zone Accelerator (LZA) on AWS solution with the CCCS Medium configuration to accelerate their compliance journey. To further support customers, AWS recently collaborated with Coalfire to assess and verify the LZA solution’s ability to support CCCS PBHVA overlay controls.
By implementing the PBHVA control overlay over a CCCS Medium baseline, you can better protect your organization’s most critical assets from potential threats and vulnerabilities, providing continuity of essential government operations and safeguarding sensitive information.
The CCCS PBHVA overlay consists of 137 controls designed to protect high-value assets, including 69 new controls and 68 controls from CCCS Medium. These controls provide enhanced data protection, particularly for integrity and availability, and are based on NIST SP 800-53 Revision 5.
Coalfire’s assessment found that the LZA on AWS solution significantly supports CCCS PBHVA overlay compliance requirements:
The 29 percent of controls not addressed by the LZA are on the customer side of the shared responsibility model. They are addressed in the customer’s application stack or as non-technical controls such as policies and procedures.
The LZA solution implements several critical security features:
While the LZA solution provides significant compliance support, organizations should note:
The AWS Landing Zone Accelerator Verified Reference Architecture documentation is available for customer download in AWS Artifact. This resource can help organizations reduce the time and effort required to deploy an environment that aligns with CCCS PBHVA overlay requirements.
The Coalfire assessment confirms that the LZA on AWS solution provides effective support for CCCS PBHVA overlay compliance objectives. However, organizations should remember that compliance is an ongoing process that requires active management and cannot be achieved through technology alone.
For more information about implementing the Landing Zone Accelerator for CCCS PBHVA overlay requirements, contact your AWS account team or the AWS Public Sector team directly.
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
Post Syndicated from Esra Kayabali original https://aws.amazon.com/blogs/aws/anthropics-claude-3-7-sonnet-the-first-hybrid-reasoning-model-is-now-available-in-amazon-bedrock/
Amazon Bedrock is expanding its foundation model (FM) offerings as the generative AI field evolves. Today, we’re excited to announce the availability of Anthropic’s Claude 3.7 Sonnet foundation model in Amazon Bedrock. As Anthropic’s most intelligent model to date, Claude 3.7 Sonnet stands out as their first hybrid reasoning model capable of producing quick responses or extended thinking, meaning it can work through difficult problems using careful, step-by-step reasoning. Additionally, today we are adding Claude 3.7 Sonnet to the list of models used by Amazon Q Developer. Amazon Q is built on Bedrock, and with Amazon Q you can use the most appropriate model for a specific task such as Claude 3.7 Sonnet, for more advanced coding workflows that enable developers to accelerate building across the entire software development lifecycle.
Key highlights of Claude 3.7 Sonnet
Here are several notable features and capabilities of Claude 3.7 Sonnet in Amazon Bedrock.
The first Claude model with hybrid reasoning – Claude 3.7 Sonnet takes a different approach to how models think. Instead of using separate models—one for quick answers and another for solving complex problems—Claude 3.7 Sonnet integrates reasoning as a core capability within a single model. This combination is more similar to how the human brains works. After all, we use the same brain whether we’re answering a simple question or solving a difficult puzzle.
The model has two modes—standard and extended thinking mode—which can be toggled in Amazon Bedrock. In standard mode, Claude 3.7 Sonnet is an improved version of Claude 3.5 Sonnet. In extended thinking mode, Claude 3.7 Sonnet takes additional time to analyze problems in detail, plan solutions, and consider multiple perspectives before providing a response, allowing it to make further gains in performance. You can control speed and cost by choosing when to use reasoning capabilities. Extended thinking tokens count towards the context window and are billed as output tokens.
Anthropic’s most powerful model for coding – Claude 3.7 Sonnet is state-of-the art for coding, excelling in understanding context and creative problem solving, and according to Anthropic, achieves an industry-leading 70.3% for standard mode on SWE-bench Verified. Claude 3.7 Sonnet also performs better than Claude 3.5 Sonnet across the majority of benchmarks. These enhanced capabilities make Claude 3.7 Sonnet ideal for powering AI agents and complex workflows.
Source: https://www.anthropic.com/news/claude-3-7-sonnet
Over 15x longer output capacity than its predecessor – Compared to Claude 3.5 Sonnet, this model offers significantly expanded output length. This enhanced capacity is particularly useful when you explicitly request more detail, ask for multiple examples, or request additional context or background information. To achieve long outputs, try asking for a detailed outline (for writing use cases, you can specify outline detail down to the paragraph level and include word count targets). Then, ask for the response to index its paragraphs to the outline and reiterate the word counts. Claude 3.7 Sonnet supports outputs up to 128K tokens long (up to 64K as generally available and up to 128K as a beta).
Adjustable reasoning budget – You can control the budget for thinking when you use Claude 3.7 Sonnet in Amazon Bedrock. This flexibility helps you weigh the trade-offs between speed, cost, and performance. By allocating more tokens to reasoning for complex problems or limiting tokens for faster responses, you can optimize performance for your specific use case.
Claude 3.7 Sonnet in action
As for any new model, I have to request access in the Amazon Bedrock console. In the navigation pane, I choose Model access under Bedrock configurations. Then, I choose Modify model access to request access for Claude 3.7 Sonnet.
To try Claude 3.7 Sonnet, I choose Chat / Text under Playgrounds in the navigation pane. Then I choose Select model and choose Anthropic under the Categories and Claude 3.7 Sonnet under the Models. To enable the extended thinking mode, I toggle Model reasoning under Configurations. I type the following prompt, and choose Run:
You're the manager of a small restaurant facing these challenges:
Three staff members called in sick for tonight's dinner service
You're expecting a full house (80 seats)
There's a large party of 20 coming at 7 PM
Your main chef is available but two kitchen helpers are among those who called in sick
You have 2 regular servers and 1 trainee available
How would you:
Reorganize the available staff to handle the situation
Prioritize tasks and service
Determine if you need to make any adjustments to reservations
Handle the large party while maintaining service quality
Minimize negative impact on customer experience
Explain your reasoning for each decision and discuss potential trade-offs
Here’s the result with an animated image showing the reasoning process of the model.

To test image-to-text vision capabilities, I upload an image of a detailed architectural site plan created using Amazon Bedrock. I receive a detailed analysis and reasoned insights of this site plan.
Claude 3.7 Sonnet can also be accessed through AWS SDK by using Amazon Bedrock API. To learn more about Claude 3.7 Sonnet’s features and capabilities, visit the Anthropic’s Claude in Amazon Bedrock product detail page.
Get started with Claude 3.7 Sonnet today
Claude 3.7 Sonnet’s enhanced capabilities can benefit multiple industry use cases. Businesses can create advanced AI assistants and agents that interact directly with customers. In fields such as healthcare, it can assist in medical imaging analysis and research summarization, and financial services can benefit from its abilities to solve complex financial modeling problems. For developers, it serves as a coding companion that can review code, explain technical concepts, and suggest improvements across different languages.
Anthropic’s Claude 3.7 Sonnet is available today in the US East (N. Virginia), US East (Ohio), and US West (Oregon) Regions. Check the full Region list for future updates.
Claude 3.7 Sonnet is priced competitively and matches the price of Claude 3.5 Sonnet. For pricing details, refer to the Amazon Bedrock pricing page.
To get started with Claude 3.7 Sonnet in Amazon Bedrock, visit the Amazon Bedrock console and Amazon Bedrock documentation.
Post Syndicated from Anshu Bathla original https://aws.amazon.com/blogs/security/four-ways-to-grant-cross-account-access-in-aws/
As your Amazon Web Services (AWS) environment grows, you might develop a need to grant cross-account access to resources. This could be for various reasons, such as enabling centralized operations across multiple AWS accounts, sharing resources across teams or projects within your organization, or integrating with third-party services. However, granting cross-account access requires careful consideration of your security, availability, and manageability requirements.
In this blog post, we explore four different ways to grant cross-account access using resource-based policies. Each method has its own unique tradeoffs, and the best choice depends on your specific requirements and use case.
Cross-account access is granted by identity-based policies and resource-based policies in AWS Identity and Access Management (IAM). Identity-based policies attach to an IAM role, while resource-based polices attach to resources like Amazon Simple Storage Service (Amazon S3) buckets and AWS Key Management Service (AWS KMS) keys. Resource-based policies require you to specify one or more principals (IAM users or roles) that are allowed to access the resource.
Your choice of how to specify the principal in a resource-based policy impacts some aspects of both the confidentiality and the availability of your solution. Understanding this impact and making the right tradeoffs for your use case is the focus of this post.
Imagine that you have an S3 bucket in your AWS account (Account A) that needs to be accessed by different principals in another AWS account (Account B). For this scenario, we assume that the principals in Account B have the necessary access to S3 in their identity-based policies, and we will focus on authoring the resource-based policies in Account A. While the methods explained here use Amazon S3, the concepts discussed apply to all AWS services that support resource-based policies. In the following sections, we walk through four different ways to grant cross-account access in this scenario and discuss the tradeoffs of each.
In this example, you use an S3 bucket policy to grant access to a specific IAM role (RoleFromAccountB) in Account B by specifying the IAM role’s Amazon Resource Name (ARN) in the Principal element of the policy in Account A.
Using this bucket policy, if someone in Account B deletes or recreates the role (RoleFromAccountB), then that role can no longer access the amzn-s3-demo-bucket-account-a bucket, even if that role is recreated with the same name. The reason is that when you save this policy, the role ARN is mapped to the unique ID of the role, which looks something like this: AROADBQP57FF2AEXAMPLE. You will see a role identifier in the Principal element of your resource-based policies if you view them after you delete the role that they referenced.
This behavior is intentional. The resource-based policy only allows the specific instance of the role that you set as principal at the time of policy creation. This helps prevent unintended access to your resources if you delete a role, but forget to update your resource-based policy to remove that role. This behavior can also cause an availability risk because the role (RoleFromAccountB) will have a new unique ID when it is recreated and will no longer have access to the bucket. Roles can be recreated for a number of reasons, including accidentally when you use tools such as infrastructure as code.
You might consider choosing this method if:
RoleFromAccountB) is deleted.RoleFromAccountB) is deleted.In this example, you grant access to a specific account in the Principal element of the resource-based policy. This resource-based policy of Account A allows any user or role from Account B that also has an identity-based policy that grants them access to read the objects.
Note: You can use either
"Principal": {"AWS": "111122223333"}or"Principal": {"AWS": "arn:aws:iam::111122223333:root"}in thePrincipalelement. They are equivalent, and the long-form ARN does not represent the root user.
This resource-based policy helps avoid the potential availability issue discussed for Method 1. If a role in Account B that needs to have access to the bucket is recreated, it will still have access after the recreation of that role. This is because you don’t specify a role in the Principal element—instead, you specify an account. If you use Method 2, you must be comfortable delegating access control decisions to the owner of that account.
This approach explicitly delegates access control decisions to IAM in the other account (Account B). Principals in Account B have access to this bucket if allowed by their identity-based policies.
You might consider choosing this method if:
This method expands on Method 2 and adds a condition that grants access only to a specific IAM role. Similar to Method 2, you use the account number as the value of the Principal element, but also use the aws:PrincipalArn condition key to limit access to a specific principal in Account B.
The aws:PrincipalArn condition key is a global condition key that compares the ARN of the principal that made the request with the ARN that you specify in the policy. For IAM roles, the request context returns the ARN of the role, not the ARN of the user that assumed the role.
This policy comes with the same availability benefits as the policy in Method 2: access to this resource will survive role recreation. This is because the role is translated to its unique identifier only when it is used in the Principal element. It is not translated to a unique identifier when it is used in a condition. If the role (RoleFromAccountB) in Account B is recreated, accidentally or intentionally, the policy will continue to grant access because the role matches the role ARN specified in the condition key of the resource-based policy in Account A. As a result, Method 3 provides a balanced approach to availability and security.
You might consider choosing this method if:
aws:PrincipalArn condition key if that role (RoleFromAccountB) is recreated.This method is focused on a different use case and is not an alternative to the methods listed earlier. Use this method if you have a resource (an S3 bucket, in this example) that you want to share with your entire organization, but not share with anyone outside of it.
There is no way to specify an organization by using the Principal element of a resource-based policy, so you must use the aws:PrincipalOrgId condition key to restrict access to a specific organization. In this policy, you specify a wildcard in the Principal element, which says that anyone can access the bucket. Then the condition reduces “anyone” to just those AWS account principals that belong to the specified organization and have an identity-based policy that allows them access.
You then add an additional conditional block that compares the aws:PrincipalAccount condition key to the aws:ResourceAccount condition key by using a policy variable. This extra conditional block is optional and excludes the account that owns the bucket (Account A) from the allow statement. The reason for using this extra conditional block is so that principals in Account A still require an allow statement in their identity-based policy to access this bucket. If you choose to exclude this aws:PrincipalAccount comparison, principals in Account A are granted access to the bucket without an explicit allow statement in their identity-based policy. Policy evaluation logic only requires either the identity-based policy or the resource-based policy (but not both) to allow a request when the principal and resource are in the same account.
You might consider choosing this method if:
Choosing a method to grant cross-account access requires careful consideration of your requirements and use case. Each of the four methods discussed in this blog post has its own advantages and tradeoffs. By understanding these methods and their implications, you can decide on the most appropriate approach to grant cross-account access to your AWS resources. Remember to regularly review and audit your resource-based policies to verify that they align with your security and access requirements.
To learn how resource-based policies work with Amazon S3, see the blog post IAM Policies and Bucket Policies and ACLs! Oh My! Controlling Access to S3 Resources.
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
Post Syndicated from jzb original https://lwn.net/Articles/1011671/
Version
2.0 of the Aqualung
gapless music player has been released. Aqualung supports playback of
a wide range of audio formats, ripping CDs to WAV, FLAC, Ogg Vorbis,
or MP3, and subscribing to podcasts via RSS or Atom feeds. The primary
change in this release is the migration
from GTK2 to GTK3, and dropping support for custom skins as a
result.
Post Syndicated from Elizabeth Fuentes original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-cloud-club-captain-applications-formula-1-amazon-nova-prompt-engineering-and-more-feb-24-2025/
AWS Developer Day 2025, held on February 20th, showcased how to integrate responsible generative AI into development workflows. The event featured keynotes from AWS leaders including Srini Iragavarapu, Director Generative AI Applications and Developer Experiences, Jeff Barr, Vice President of AWS Evangelism, David Nalley, Director Open Source Marketing of AWS, along with AWS Heroes and technical community members. Watch the full event recording on Developer Day 2025.
Applications are now open through March 6th for the 2025 AWS Cloud Clubs Captains program. AWS Cloud Clubs are student-led groups for post-secondary and independent students, 18 years old and over. Find a club near you on our Meetup page.
Last week’s launches
Here are some launches that got my attention:
Amplify Hosting announces support for IAM roles for server-side rendered (SSR) applications – AWS Amplify Hosting now supports AWS Identity and Access Management (IAM) roles for SSR applications, enabling secure access to AWS services without managing credentials manually. Learn more in the IAM Compute Roles for Server-Side Rendering with AWS Amplify Hosting blog.
AWS WAF enhances Data Protection and logging experience – AWS WAF expands its Data Protection capabilities allowing sensitive data in logs to be replaced with cryptographic hashes (e.g. ‘ade099751d2ea9f3393f0f’) or a predefined static string (‘REDACTED’) before logs are sent to WAF Sample Logs, Amazon Security Lake, Amazon CloudWatch, or other logging destinations.
Announcing AWS DMS Serverless comprehensive premigration assessments – AWS Database Migration Service Serverless (AWS DMS Serverless) now supports premigration assessments for replications to identify potential issues before database migrations begin. The tool analyzes source and target databases, providing recommendations for optimal DMS settings and best practices.
Amazon ECS increases the CPU limit for ECS tasks to 192 vCPUs – Amazon Elastic Container Service (Amazon ECS) now supports CPU limits of up to 192 vCPU for ECS tasks deployed on Amazon Elastic Compute Cloud (Amazon EC2) instances, an increase from the previous 10 vCPU limit. This enhancement allows customers to more effectively manage resource allocation on larger Amazon EC2 instances.
AWS Network Firewall introduces automated domain lists and insights – AWS Network Firewall now provides automated domain lists and insights by analyzing 30 days of HTTP/S traffic. This helps create and maintain allow-list policies more efficiently, at no extra cost.
AWS announces Backup Payment Methods for invoices – AWS now enables you to set up backup payment methods that automatically activate if primary payment fails. This helps prevent service interruptions and reduces manual intervention for invoice payments.
Get updated with all the announcements of AWS announcements on the What’s New with AWS? page.
Other AWS news
Here are additional noteworthy items:
AWS Partner Network: Essential training resources for ISV partners – To help scale solutions effectively, AWS provides essential training resources for Software Vendors (ISVs) partners in four key areas: AWS Marketplace fundamentals, Foundational Technical Review (FTR), APN Customer Engagement (ACE) program and co-selling, and Partner funding opportunities.
How Formula 1® uses generative AI to accelerate race-day issue resolution – Formula 1® (F1) uses Amazon Bedrock to speed up race-day issue resolution, reducing troubleshooting time from weeks to minutes through a chatbot that analyzes root causes and suggests fixes.

Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases – This blog introduces a solution using Amazon Bedrock Knowledge Bases and Amazon Bedrock Agents to reduce Large language models (LLMs) hallucinations by implementing a verified semantic cache that checks queries against curated answers before generating new responses, improving accuracy and response times.

Orchestrate an intelligent document processing workflow using tools in Amazon Bedrock – This blog demonstrates an intelligent document processing workflow using Amazon Bedrock tools that combines Anthropic’s Claude 3 Haiku for orchestration and Anthropic’s Claude 3.5 Sonnet (v2) for analysis to handle structured, semi-structured, and unstructured healthcare documents efficiently.
From community.aws
Here are my personal favorites posts from community.aws:
Tracing Amazon Bedrock Agents – Learn how to track and analyze Amazon Bedrock Agents workflows using AWS X-Ray for better observability, by Randy D.
Testing Amazon ECS Network Resilience with AWS FIS – This article demonstrates how to test network resilience in Amazon ECS using AWS FIS with guidance from Amazon Q Developer, by Sunil Govindankutty
Stop Using Default Arguments in AWS Lambda Functions – Discover why your AWS Lambda costs might be spiralling out of control due to a common Python programming practice, by Stuart Clark.
Amazon Nova Prompt Engineering on AWS: A Field Guide by Brooke – A field guide for using Amazon Nova models, covering prompt engineering patterns and best practices on AWS, by Brooke Jamieson.
Creating Deployment Configurations for EKS with Amazon Q – Amazon Q Developer helps create EKS deployments by providing templates and best practices for Kubernetes configs, by Ricardo Tasso.
Processing WhatsApp Multimedia with Amazon Bedrock Agents: Images, Video, and Documents – I invite you to read my latest blog, which explains how to create a WhatsApp AI assistant using Amazon Bedrock and Amazon Nova models to process multimedia content such as images, videos, documents, and audio.
Upcoming AWS events
Check your calendars and sign up for these upcoming AWS events:
AWS GenAI Lofts – GenAI Lofts offer collaborative spaces and immersive experiences for startups and developers. You can join in-person GenAI Loft San Francisco events such as Hands-on with Agentic Graph RAG Workshop (February 25), Unstructured Data Meetup SF (February 26 – 27) and AI Tinkerers – San Francisco – February 2025 Demos + Science Fair (February 27 – 28). GenAI Loft Berlin has events and workshops on February 24 to March 7 that you can’t miss!
AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Milan, Italy (April 2), Bay Area – Security Edition (April 4), Timișoara, Romania (April 10), and Prague, Czeh Republic (April 29).
AWS Innovate: Generative AI + Data – Join a free online conference focusing on generative AI and data innovations. Available in multiple geographic regions: APJC and EMEA (March 6), North America (March 13), Greater China Region (March 14), and Latin America (April 8).
AWS Summits – Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Register in your nearest city: Paris (April 9), Amsterdam (April 16), London (April 30), and Poland (May 5).
AWS re:Inforce – AWS re:Inforce (June 16–18) in Philadelphia, PA our annual learning event devoted to all things AWS cloud security. Registration opens in March, and be ready to join more than 5,000 security builders and leaders.
Create your AWS Builder ID and reserve your alias. Builder ID is a universal login credential that gives you access–beyond the AWS Management Console–to AWS tools and resources, including over 600 free training courses, community features, and developer tools such as Amazon Q Developer.
You can browse all upcoming in-person and virtual events.
That’s all for this week. Stay tuned for next week’s Weekly Roundup!
— Eli
This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!
Post Syndicated from Emma Burdett original https://blog.rapid7.com/2025/02/24/under-the-hoodie-the-pen-test-diaries/

Each year, Rapid7 penetration testers conduct over 1,000 security assessments, pushing boundaries to expose vulnerabilities before the bad guys do. The mission? Get in, escalate privileges, and own the environment—physically, digitally, or sometimes just by sweet-talking an unsuspecting employee.
Names? Redacted. Companies? Anonymized. But the hacks? Real.
Welcome to Under the Hoodie, where we share stories straight from the frontlines of ethical hacking. Below are real accounts from our testers, revealing just how easy it can be to break into supposedly secure environments. Click through to hear each story unfold.
A law firm’s file storage system was sitting on the internet, just begging for a break-in. Using a mix of open-source intelligence (OSINT) and Burp Suite, our pen tester enumerated users, guessed a couple of predictable passwords (think “Winter2024!”), and walked right into confidential legal documents. Verdict? Guilty of weak security.
Ever wondered how much damage someone could do by simply plugging into an open network jack on a college campus? Turns out, a lot. Our tester started with network poisoning attacks, cracked some hashes, and before long, had access to criminal records, police databases, PhD research, and even student grade records. Could’ve handed out straight A’s if they wanted.
Check out the full infiltration.
A misconfigured Microsoft SQL server turned out to be the golden ticket for total network compromise. After gaining basic user access via weak credentials, our tester found a juicy SQL cluster, enabled some stored procedures, and pulled off process injection to gain domain admin privileges. Translation? They owned the company’s entire network from the inside out.
Sometimes, hacking isn’t about code—it’s about confidence. Armed with a fake badge and a box of popular local donuts, our tester waltzed into a corporate office by leveraging good ol’ human kindness. A security guard even held the door open. The lesson? Free food lowers defenses faster than any zero-day exploit.
Hear about the sugar-powered social engineering.
A single phone call is sometimes all it takes. Our tester posed as an employee needing a password reset. After some casual chit-chat, an IT admin happily provided a fresh login. No brute force, no malware—just old-school social engineering at its finest.
Find out just how easy it was.
High-security target? Challenge accepted. Our testers, posing as IT consultants, walked right into a police department, escorted through all secure areas, and even got their hands on a set of keys to a patrol car. No alarms. No suspicion. Just a dangerously believable pretext.
A fake email, a cloned login page, and a hundred unsuspecting employees. Eight of them entered their credentials, and just like that, our tester had access to financial data, payroll systems, and even proxy rights to other accounts. MFA saved the day—barely.
Find out just how this phishing attack unfolded.
A health transcription company left its web app vulnerable to SQL injection. The result? Full access to sensitive medical records within minutes. The tester reported it immediately, and the company had to shut down its entire system for emergency remediation. All before their hot cocoa had a chance to cool down.
No cracked passwords? No worries. Our tester leveraged network sniffing, NTLM relay attacks, and Active Directory Certificate Services to escalate privileges. By the time it was over, they had full control over the company’s systems—without ever knowing a single password.
Every system has weak points—some technical, some human. The goal of penetration testing isn’t just to break in; it’s to make sure real attackers can’t.
Hear more stories from the trenches.
Post Syndicated from corbet original https://lwn.net/Articles/1010667/
The kernel’s slab allocator is responsible for the allocation of small
(usually sub-page) chunks of memory. For many workloads, the speed of
object allocation and freeing is one of the key factors in overall
performance, so it is not surprising that a lot of effort has gone into
optimizing the slab allocator over time. Now that the kernel is down to a single slab allocator, the
memory-management developers have free rein to add complexity to it; the
latest move in that direction is the per-CPU
sheaves patch set from slab maintainer Vlastimil Babka.
Post Syndicated from Varun Sharma original https://aws.amazon.com/blogs/security/connect-your-on-premises-kubernetes-cluster-to-aws-apis-using-iam-roles-anywhere/
Many customers want to seamlessly integrate their on-premises Kubernetes workloads with AWS services, implement hybrid workloads, or migrate to AWS. Previously, a common approach involved creating long-term access keys, which posed security risks and is no longer recommended. While solutions such as Kubernetes secrets vault and third-party options exist, they fail to address the underlying issue effectively.
One option to connect your on-premises Kubernetes workloads to AWS APIs is to use the service account issuer discovery feature. This allows the Kubernetes API server to act as an OpenID Connect (OIDC) identity provider and be federated with AWS Identity and Access Management (IAM). However, this approach requires public internet access to the Kubernetes API server, which might not be desirable for some customers.
To help eliminate the need for long-term access keys or exposing the Kubernetes API server to the public internet, AWS has introduced AWS IAM Roles Anywhere. This feature enables secure, seamless integration of on-premises Kubernetes workloads with AWS services, promoting robust security practices and minimizing potential risks associated with long-term credentials or public exposure.
IAM Roles Anywhere enables workloads outside of AWS to access AWS resources by exchanging X.509 bound identities for temporary AWS credentials. With IAM Roles Anywhere, you can use the same IAM roles and policies as your AWS workloads to access AWS resources, promoting consistency.
IAM Roles Anywhere can be combined with a standard public key infrastructure solution. In this blog post, we use AWS Private Certificate Authority, which has several advantages over using a self-signed certificate authority (CA). First, it reduces operational and management overhead, because AWS manages the CA for you. Second, the cryptographic key material can be stored in hardware security modules or at least vaulted, which helps you protect your private CA against key compromises. Additionally, certificates can be short-lived, which aligns with dynamic Kubernetes environments where pod lifetimes are typically shorter than traditional servers.
We also demonstrate how to integrate IAM Roles Anywhere without modifying your existing workload Docker files, and how to automate the X.509 certificate lifecycle with cert-manager and an AWS Private CA backend in short-lived certificate mode. By using these capabilities, you can seamlessly integrate your on-premises Kubernetes workloads with AWS services, promoting robust security practices, minimizing risks associated with long-term credentials, and helping to ensure a streamlined, consistent access management experience.
This post is for customers who run their own Kubernetes cluster outside of AWS without using Amazon EKS Anywhere. If you’re using Amazon Elastic Kubernetes Service (Amazon EKS), use IAM roles for service accounts or Amazon EKS Pod Identity instead.
“Why should I prefer X.509 certificates over IAM access keys?” Access keys are long-term credentials that must be rotated regularly to minimize the risk of unauthorized access. They need to be securely deployed onto servers hosting applications that use them, requiring procedures for secure transfer and deletion of transient copies. As the number of applications and access keys grows, tracking and managing them becomes operationally challenging.
In contrast, X.509 certificates use public key infrastructure (PKI). The private key is generated directly on the application server and doesn’t leave it. Only a certificate signing request, which doesn’t contain secrets, is sent to the CA for signing and returning the certificate. This alleviates the need for securely transmitting secret keys.
However, you can argue that X.509 certificates are also long-lived credentials. This concern is valid, but not necessarily true. As demonstrated by projects such as Let’s Encrypt, it’s possible to reduce certificate lifetimes from years to months by implementing automation for certificate renewal. After such a mechanism is in place, certificate lifetimes can be further limited to days or even hours.
In this post, we introduce mutually authenticated Transport Layer Security (mTLS), which uses certificates for high-assurance bidirectional authentication. Certificates are used to establish trust between the client and server, making sure that both parties are authenticated and authorized to communicate securely. By implementing mTLS, you can achieve a higher level of security and trust in your communication channels, mitigating potential risks associated with unauthorized access or man-in-the-middle attacks. Here, we implement ephemeral certificates that are tied to the lifecycle of pods. When a pod is started, a certificate is automatically created, and it expires after a short period of time unless it’s actively in use by the pod, in which case it’s automatically renewed by the cert-manager. This approach verifies that certificates are only valid for the duration of the pod’s lifetime, minimizing the potential risk associated with long-lived credentials. Additionally, IAM Roles Anywhere supports certificate revocation list (CRL) checks, allowing you to perform explicit revocation of certificates if required. This feature provides an additional layer of security, enabling you to revoke access promptly in case of compromised credentials or other security concerns.
Throughout this post, we assume that you have a basic understanding of IAM Roles Anywhere. For more information you can see this blog post. Furthermore, we assume that you are familiar with Kubernetes, kubectl, Helm, and cert-manager.
This solution assumes that you have an existing Kubernetes cluster running outside of AWS.
Figure 1 shows the high-level architecture of our solution. An on-premises Kubernetes cluster accessing AWS APIs using IAM Roles Anywhere with X.509 certificates issued by AWS Private CA in short-lived-certificate mode.
Figure 1: High level architecture of on-premises Kubernetes accessing AWS APIs
Here’s how the solution works, as shown in Figure 1:
Let’s explore the different parts of the architecture in more detail.
AWS Private CA offers a short-lived certificate, where the validity period is limited to 7 days or fewer. You can see this AWS Blog to learn how to use AWS Private CA short-lived certificates. This new mode can be used to issue certificates for your Kubernetes pods and benefit from lower costs of operations. By synchronizing the certificate lifecycle with the lifecycle of the pod, you can minimize the operational overhead for this solution. To help meet requirements for auditability and transparency, you can use the audit report feature to list the issued certificates in a machine readable format.
Figure 2 shows a detailed overview of the components involved in authentication with IAM Roles Anywhere.
Figure 2: Components of IAM Roles Anywhere
IAM Roles Anywhere allows you to obtain temporary security credentials for workloads that run outside of AWS. Your workloads must use a certificate issued by a trusted PKI CA to authenticate with IAM Roles Anywhere. You establish trust between IAM Roles Anywhere and your CA by creating a trust anchor that points to the root of the CA.
Figure 3 shows a detailed overview of the cert-manager setup used in this post, including the aws-privateca-issuer add-on for the integration of AWS Private CA.
Figure 3: Detailed overview of cert-manager setup
cert-manager is a tool for managing X.509 certificates in Kubernetes. As shown in Figure 3, cert-manager will make sure that certificates are valid and up-to-date and attempt to renew them before they expire. By using add-ons, you can configure different backends for issuing X.509 certificates. In this post, we explore how to integrate cert-manager with AWS Private CA using the aws-privateca-issuer add-on. The aws-privateca-issuer add-on defines two custom resources, AWSPCAIssuer and AWSPCAClusterIssuer, which are used to configure the link to AWS Private CA. They are similar to the Issuer and ClusterIssuer resources that come with cert-manager, but specific to aws-privateca-issuer.
After the AWSPCAIssuer or AWSPCAClusterIssuer is available, aws-privateca-issuer authenticates towards AWS APIs using temporary security credentials obtained from IAM Roles Anywhere. cert-manager watches for the certificate resource, which references to an AWSPCAIssuer, which in turn references to AWS Private CA. aws-privatca-issuer requests a certificate from AWS Private CA. The auto-generated private key and the signed certificate are stored in Kubernetes secrets.
cert-manager supports multiple ways of integrating into your Kubernetes workloads. You can use certificate resources, which represent a human-readable definition of a certificate signing request (CSR) and contain information on certificate lifespan and renewal time. When using a certificate, the auto-generated private key and the signed certificate are stored in Kubernetes secrets.
With this option, an X.509 certificate is issued manually and saved as a secret. After a PKI is configured as an issuer, a certificate resource is created to automate the renewal of the certificate. With the certificate resource, the lifecycle of certificates is decoupled from the lifecycle of the pods that use them. This allows you to bootstrap the X.509 certificate even before the trusted PKI is deployed.
Another way of integrating cert-manager is by using a CSI driver. In this case, the certificate lifecycle is bound to the lifecycle of the pod. An X.509 certificate and private key are mounted into a predefined folder where your workloads can read them. On pod creation, cert-manager automatically creates a private key and requests a certificate for the configured trusted PKI. When the pod is deleted, the private key and certificate are also deleted and become invalid because they aren’t renewed by cert-manager.
In this post, we use the CSI driver approach for workloads to create ephemeral certificates for IAM Roles Anywhere.
Figure 4 shows a detailed view of how pods can be configured to use IAM Roles Anywhere without needing to change the underlying Docker images by using a sidecar that provides an IMDSv2 endpoint that mimics the behavior in the Amazon Elastic Compute Cloud (Amazon EC2) instance metadata endpoint.
Figure 4: Pod configuration using a sidecar
As shown in Figure 4, when using a certificate resource, the auto-generated private key and the signed certificate are stored in Kubernetes secrets and mounted into the pod. When using the CSI driver, a private key is generated locally (for the pod), a certificate is requested from cert-manager based on the given attributes and is issued by AWSPCAIssuer, and the certificates are mounted directly into the pod with no intermediate secret being created.
IAM Roles Anywhere uses the CreateSession API to authenticate requests with a SigV4a signature using the private key and its associated X.509 certificate. This exchange provides a IAM role session credential, as if you had assumed the IAM role. The aws_signing_helper binary is provided to call the CreateSession API from the command line. In this post, a sidecar container that provides an IMDSv2 endpoint to the workload container is used. This container uses the aws_signing_helper binary and uses its serve command.
This way, applications using AWS SDKs can use the AWS_EC2_METADATA_SERVICE_ENDPOINT environment variable to set the instance metadata endpoint to the correct port on the localhost interface. The X.509 certificate and private key are provided as files to the sidecar container.
In this section, we show the steps needed to deploy the solution in your AWS account.
To deploy the solution in this post, make sure that you have the following in place:
Note: As an alternative to using the AWS CLI, you can use the AWS Controllers for Kubernetes (ACK) service controller for AWS Private CA for creating and managing
CertificateAuthority,Certificate, andCertificateAuthorityActivationresources directly within your Kubernetes cluster. After establishing your CA hierarchy using the ACK controller, you can proceed with the subsequent steps involving IAM Roles Anywhere integration,aws-privateca-issuer, and cert-manager as described in this post.
CertificateAuthorityArn, which you will need for further commands, so export it for later use. Replace <region> with your AWS Region.
CertificateArn which you will need later. Export it.
ACTIVE.
At this point your root CA is set up and ready to use. The next step is to configure IAM Roles Anywhere.
trustAnchorArn. Replace <value-of-trustAnchorArn> with the Amazon Resource Name (ARN) value of your IAM Roles Anywhere trust anchor.
aws-privateca-issuer cert-manager plugin. This role needs to include the actions sts:AssumeRole, sts:SetSourceIdentity and sts:TagSession, which are required by IAMRA. Replace <TA_ID> with your trust anchor.Note: You should specify a PrincipalTag with the CN. Furthermore, it should be scoped to the IAMRA service principal. This further restricts authorization based on attributes that are extracted from the X.509 certificate and provides an additional layer of security by helping to ensure that even if an unauthorized party gains access to a valid certificate, they cannot assume the role unless the certificate’s CN matches the specified value.
iamra-issuer role:
EndEntityCertificate.
iamra-issuer role.
The created role iamra-issuer will only be used by the aws-privateca-issuer to integrate with AWS Private CA. You should repeat the process of creating IAM roles and IAMRA profiles for your workloads. it’s recommended to create a separate IAM role for each workload and limit its use with condition statements in the trust policy, checking for the workload identity and trust anchor (for example, matching the common name). Furthermore, it’s important that you add IAMRA to the trust policy and allow the aforementioned actions. Best practice with IAM roles is to apply least-privilege permissions.
To integrate IAM Roles Anywhere within your Kubernetes environment, you need to provide an IMDSv2 endpoint to your application containers by running the aws_signing_helper binary as a sidecar. You also need to configure your applications using an environment variable to use the new instance metadata endpoint. To do so, build a Docker image that works as a sidecar.
In this step, create a basic image that fulfills the preceding requirements. In your environment, you might want to adapt this example to use your own base image and implement your image hardening processes.
Copy the following script and save it as init.sh.
This script is the entry point of the sidecar container. It expects the environment variables TRUST_ANCHOR_ARN, PROFILE_ARN, and ROLE_ARN, which are required by aws_signing_helper. It also expects an X.509 certificate and its private key in the folder /iamra, which will be mounted in a later stage during pod initialization. Finally, it invokes the aws_signing_helper with the serve directive which creates an IMDSv2 endpoint listening on 9911 by default. This can be customized using the --port parameter.
Now let’s inspect the Docker file.
Note: At the time of writing, we used the alpine3.17.0 image. Use a hardened base image that’s designed to be secure and aligns with the requirements of your environment.
This Docker file copies the init.sh and downloads the aws_signing_helper binary. The init.sh script is defined as an entry point to the container. Dynamic libraries required by aws_signing_helper are installed using Alpine Linux package manager (Apk).
Now build the docker image, sign in to it, and push it for later use. For the following commands replace <my-docker-registry> with the hostname of your local registry or use an ECR Repository.
In this step, install cert-manager into your cluster and configure aws-privateca-issuer using a manually bootstrapped certificate. cert-manager-approver-policy is used to control which certificates can be requested by the workloads. Then, set up the cert-manager CSI driver to automatically provision X.509 certificates for your workload pods.
Start with the cert-manager setup:
Note: At the time of writing, we used cert-manager version 1.16.2. Check for the latest stable version.
Now, install the cert-manager aws-privateca-issuer plugin. This integration connects cert-manager with AWS Private CA and lets you issue short-lived certificates automatically. Currently, aws-privateca-issuer Helm chart doesn’t support IAMRA natively. So, you’re going to use the same init-container to set up IAMRA as for the workload pods.
You need to issue the first X.509 certificate for aws-privateca-issuer IAMRA manually. Later, cert-manager will renew it automatically.
iamra-issuer.
The previous command will create an RSA private key named iamra.key and a certificate signing request name iamra.csr. Now you need to call AWS Private CA to issue the bootstrap certificate.
CertificateArn for your iamra-issuer certificate. Export it and save the certificate to a file.
You’re ready to install the aws-privateca-issuer. You need to modify the Helm chart because it doesn’t currently support IAMRA. You will render the Helm chart into YAML manifests, which are then adapted for IAMRA.
aws-privateca-issuer and verify the deployment you have modified. It should show that one pod is ready and available.
AWSPCAIssuer, which will be used for renewal of the manually bootstrapped certificate for the aws-privateca-issuer add-on.Note: At the time of writing, we used
awspca cert-managerAPI version v1beta1. Check for the latest stable version.
AWSPCAIssuer or AWSPCAClusterIssuer is available, aws-privateca-issuer is going to authenticate towards AWS APIs by calling sts.get-caller-identity and verify the authentication method. You can verify this using its log files. It should print the assumed role.
Now, you can create a cert-manager Certificate resource that represents a desired certificate that should be issued by the referenced cert-manager Issuer. It combines information of a CSR with details on the validity period and renewal.
In Step 4, sub-step 9, you created an AWSPCAIssuer named iamra-cm-issuer. You then used this AWSPCAIssuer to renew the manually bootstrapped certificate for the aws-privateca-issuer.
In Step 4, sub-step 11, you created the certificate iamra-privateca-issuer-cert, which is used by the aws-privateca-issuer.
In this step, you will deploy the sample workload. When deploying the sample workload, make sure to repeat the process of creating IAM roles and IAMRA profiles (from Step 2), the AWSPCAIssuer (Step 4, sub-step 9), and the CertificateRequestPolicy (Step 4, sub-step 11) for the certificate request.
For more information on certificate request policies, see the cert-manager documentation on approval policies.
Use the following code to deploy the workload.
To test the deployment, you can use kubectl exec to access the iamra-sidecar container. Navigate to the iamra directory and check if the certificate and key are mounted.
Command:
kubectl exec -it acmpca-csi-test – sh
ls | grep iamra
Output: iamra
Command:
cd iamra
/iamra# ls
Output: ca.crt tls.crt tls.key
You can also exec into the aws-cli container and verify the caller identity and make API calls to Amazon Simple Storage Service (Amazon S3):
Command:
kubectl exec -it acmpca-csi-test -c aws-cli – sh
$aws sts get-caller-identity
Output: You should see iam-roles-anywhere-s3-full-access in caller-identity.
Command:
$aws s3 ls
Output: You should be able to list the S3 bucket based on the permissions associated with the assumed role.
In this post, you learned about a solution for securely connecting on-premises Kubernetes workloads to AWS services using IAM Roles Anywhere. The approach alleviates the need for long-term access keys or public internet exposure of the Kubernetes API server. By using this solution for containerized and full stack applications, you can benefit from:
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
Post Syndicated from John Lee original https://aws.amazon.com/blogs/architecture/wellright-modernizes-to-an-event-driven-architecture-to-manage-bursty-and-unpredictable-traffic/
WellRight is a leading comprehensive corporate wellness platform provider that helps organizations and employees drive meaningful outcomes through personalized wellness programs. The platform increases engagement and benefit utilization by delivering engaging challenges across multiple dimensions of wellness, from physical activities like step tracking to mental health initiatives and team-building exercises.
In this post, we share how WellRight optimized the cost and performance of their application through a ground-up modernization to an event-driven architecture.
WellRight’s infrastructure often experiences bursty and unpredictable traffic patterns. For instance, clients can upload bulk user data at any time, which can impact tens of thousands of users, which then cascade into millions of changes. WellRight’s legacy monolithic infrastructure had several challenges when faced with such traffic:
The following figure shows the Number of Messages Received metric from a sample Amazon Simple Queue Service (Amazon SQS) queue. WellRight would often receive burst of events at an unpredictable time.

To address the challenges, WellRight made the strategic decision to transition to an event-driven architecture using fully managed AWS services. WellRight’s platform is driven by asynchronous state changes that propagate through multiple wellness programs, which is well suited for an event-driven architecture and can be broken down into microservices. Managed services such as AWS Lambda, Amazon SQS, and Amazon DynamoDB were appealing because they would eliminate the need to manage servers and allow WellRight to focus on core business logic and reduce the operational burden to their engineering team. It also has the added benefit of avoiding overprovisioning of infrastructure or continuously right-sizing resources. Each microservice would scale automatically as needed with no manual efforts, minimizing costs. The loosely coupled architecture would allow the WellRight team to be flexible, being able to add or make modifications to existing programs without affecting existing workflows.
WellRight’s initial event-driven architecture was centered around using serverless and fully managed services. DynamoDB was used as a primary data store for user information. For instance, when a user makes progress on their step challenge, the update in the DynamoDB table would propagate through DynamoDB Streams to Amazon EventBridge. Then, the event would be routed to the appropriate SQS queue, which functions as a buffer and provides fault tolerance to the events. A Lambda function would then process individual user metrics and update the Programs table. The Programs table uses DynamoDB Streams to send out updates using Amazon Simple Notification Service (Amazon SNS), keeping users informed about their progress and rankings.
The following diagram illustrates the flow of an event after a user update.

The first iteration of the event-driven architecture fared better than the monolithic legacy application, but the bursty nature of the traffic was still an issue. Lambda functions triggered by SQS queues scaled rapidly, handling requests in under 15 minutes that previously required 30 servers and took hours to process. Lambda provided WellRight the scalability that they needed, but the rapid scaling introduced a new challenge. This resulted in the throttling of DynamoDB and reaching Lambda concurrency limits during times of extremely high load, which led to many unprocessed messages in the dead-letter queue (DLQ).
In January 2023, AWS introduced the maximum concurrency feature for Lambda functions using Amazon SQS as an event source. This new feature allowed WellRight to control the concurrency of their Lambda functions for each SQS queue. Prior to this launch, Lambda functions would continue to scale as long as there were messages in the SQS queue. At times, Lambda functions would scale to its concurrency limits, resulting in it throttling itself. However, with this feature in place, the scaling Lambda functions would not exceed the set maximum concurrency value. This provided WellRight fine-grained control over the overall throughput of the system. WellRight would adjust the maximum concurrency value as needed to protect downstream processes from being overwhelmed, while responding to customer requests in a timely manner.
The following screenshot of the Lambda console shows the maximum concurrency for the function is set to 100 for an SQS trigger.

WellRight converted all Amazon SQS to Lambda integrations to use this feature. This provided WellRight with full control over the throughput of customer requests while preventing overloading the system. With the maximum concurrency feature, WellRight reduced failed processed messages by 99%, and eliminated DynamoDB throttling events. The feature was enabled for all Amazon SQS and Lambda integrations, including those without scaling issues, as a safeguard for potential future scaling demands.
WellRight’s event-driven architecture significantly improved their ability to handle bursty and unpredictable traffic patterns. The managed serverless services can scale instantaneously to handle these traffic spikes, providing a seamless experience for their clients. With their previous legacy architecture, clients experienced lags in challenge progress, leaderboards, and reward processing.
Now, clients continue to upload updates with over 1 million entries at any time, and WellRight can maintain up-to-the-minute leaderboards and reward processing. The transition to the new architecture has also yielded significant cost savings for WellRight. Prior to the serverless architecture, their baseline architecture required several large Amazon Elastic Compute Cloud (Amazon EC2) instances to handle the initial burst of traffic. After implementing the event-driven architecture, WellRight reduced their costs by 70% on the progress calculation service.
WellRight is currently in the process of rolling out the new event-driven architecture to the remaining clients. By the end of 2024, WellRight plans to retire the majority of their remaining servers, further reducing their infrastructure costs.
WellRight’s transition to an event-driven architecture on AWS has been a successful endeavor. By using fully managed services such as Lambda, Amazon SQS, and DynamoDB, they have been able to handle bursty and unpredictable traffic patterns efficiently, while providing a seamless experience for their clients. The introduction of maximum concurrency for Lambda functions has been a game changer, allowing WellRight to control the throughput of their Lambda functions and avoid overwhelming downstream resources.
Overall, the event-driven architecture has enabled WellRight to scale efficiently, improve performance, and reduce costs of their progress calculation service by over 70%. As they continue to optimize their serverless architecture and migrate remaining clients, WellRight is well-positioned to further enhance their platform and provide an exceptional experience to their customers.
To learn more about building event-driven architectures, including key concepts, best practices, AWS services, and getting started resources, visit Serverless Land.
Post Syndicated from Jon Handler original https://aws.amazon.com/blogs/big-data/supercharge-your-rag-applications-with-amazon-opensearch-service-and-aryn-docparse/
The old adage “garbage in, garbage out” applies to all search systems. Whether you are building for ecommerce, document retrieval, or Retrieval Augmented Generation (RAG), the quality of your search results depends on the quality of your search documents. Downstream, RAG systems improve the quality of generated answers by adding relevant data from other systems to the generative prompt. Most RAG solutions use a search engine to search for this relevant data. To get great responses, you need great search results, and to get great search results, you need great data. If you don’t properly partition, extract, enrich, and clean your data before loading it, your search results will reflect the poor quality of your search documents.
Aryn DocParse segments and labels PDF documents, runs OCR, extracts tables and images, and more. It turns your messy documents into beautiful, structured JSON, which is the first step of document extract, transform, and load (ETL). DocParse runs the open source Aryn Partitioner and its state-of-the-art, open source deep learning DETR AI model trained on over 80,000 enterprise documents. This leads to up to 6 times more accurate data chunking and 2 times improved recall on vector search or RAG when compared to off-the-shelf systems. The following screenshot is an example of how DocParse would segment a page in an ETL pipeline. You can visualize labeled bounding boxes for each document segment using the Aryn Playground.

In this post, we demonstrate how to use Amazon OpenSearch Service with purpose-built document ETL tools, Aryn DocParse and Sycamore, to quickly build a RAG application that relies on complex documents. We use over 75 PDF reports from the National Transportation Safety Board (NTSB) about aircraft incidents. You can refer to the following example document from the collection. As you can see, these documents are complex, containing tables, images, section headings, and complicated layouts.
Let’s get started!
Complete the following prerequisite steps:
Although you can generate an ETL pipeline to load your OpenSearch Service domain using the Aryn DocPrep UI, we will instead focus on the underlying Sycamore document ETL library and write a pipeline from scratch.
Sycamore was designed to make it straightforward for developers and data engineers to define complex data transformations over large collections of documents. Borrowing some ideas from popular dataflow frameworks like Apache Spark, Sycamore has a core abstraction called the DocSet. Each DocSet represents a collection of unstructured documents, and is scalable from a single document to many thousands. Each document in a DocSet has an arbitrary set of key-value properties as metadata, as well as an ordered list of elements. An Element corresponds to a chunk of the document that can be processed and embedded separately, such as a table, headline, text passage, or image. Like documents, Elements can also contain arbitrary key-value properties to encode domain- or application-specific metadata.
We’ve created a Jupyter notebook that uses Sycamore to orchestrate data preparation and loading. This notebook uses Sycamore to create a data processing pipeline that sends documents to DocParse for initial document segmentation and data extraction, then runs entity extraction and data transforms, and finally loads data into OpenSearch Service using a connector.
Copy the notebook into your Amazon SageMaker JupyterLab space, launch it using a Python kernel, then walk through the cells along with the following procedures.
To install Sycamore with the OpenSearch Service connector and local inference features necessary to create vector embeddings, run the first cell of the notebook:
In the second cell of the notebook, fill in your ARYN_API_KEY. You should be able to complete the example in the notebook for less than $1.
Cell 3 does the initial work of reading the source data and preparing a DocSet for that data. After initializing the Sycamore context and setting paths, this code calls out to DocParse to create a partitioned_docset:
The previous code uses materialize to create and save a checkpoint. In future runs, the code will use the materialized view to save a few minutes of time. partitioned_docset.execute() forces the pipeline to execute. Sycamore uses lazy execution to create efficient query plans, and would otherwise execute the pipeline at a much later step.
After this step, each document in the DocSet now includes the partitioned output from DocParse, including bounding boxes, text content, and images from that document, stored as elements.
Part of the key to building good retrieval for RAG is adding structured information that enables accurate filtering for the search query. Sycamore provides LLM-powered transforms that can extract this information and store it as structured properties, enriching the document. Sycamore can do unsupervised or supervised schema extraction, where it pulls out fields based on a JSON schema you provide. When executing these types of transforms, Sycamore will take a specified number of elements from each document, use an LLM to extract the specified fields, and include them as properties in the document.
Cell 4 uses supervised schema extraction, setting the schema as the fields you want to extract. You can add additional information that is passed to the LLM performing the entity extraction. The location property is an example of this:
The LLMPropertyExtractor uses the schema you provided to add additional properties to the document. Next, summarize the images to add additional information to improve retrieval.
There’s more information in your documents than just text—as the saying goes, a picture is worth 1,000 words! When your documents contain images, you can capture the information in those images using Sycamore’s SummarizeImages transform. SummarizeImages uses an LLM to compute a text summary for the image, then adds the summary to that element. Sycamore will also send related information about the image, like a caption, to the LLM to aid with summarization. The following code (in cell 4) takes advantage of DocParse type labeling to automatically apply SummarizeImages to image elements:
This cell can take up to 20 minutes to complete.
Now that your image elements contain additional retrieval information, it’s time to clean and normalize the text in the elements and extracted entities.
Unless you are in direct control of the creation of the documents you are processing, you will likely need to normalize that data and make it ready for search. Sycamore makes it straightforward for you to clean messy data and bring it to a regular form, fixing data quality issues.
For example, in the NTSB data, dates in the incident report are not all formatted the same way, and some US state names are shown as abbreviations. Sycamore makes it straightforward to write custom transformations in Python, and also provides several useful cleaning and formatting transforms. Cell 4 uses two functions in Sycamore to format the state names and dates:
The elements are now in normal form, with extracted entities and image descriptions. The next step is to merge together semantically related elements to create chunks.
When you prepare for RAG, you create chunks—parts of the full document that are related information. You design your chunks so that as a search result they can be added to the prompt to provide a unit of meaning and information. There are many ways to approach chunking. If you have small documents, sometimes the whole document is a chunk. If you have larger documents, sentences, paragraphs, or even sections can be a chunk. As you iterate on your end application, it’s common to adjust the chunking strategy to fine-tune the accuracy of retrieval. Sycamore automates the process of building chunks by merging together the elements of the DocSet.
At this stage of the processing in cell 4, each document in our DocSet has a set of elements. The following code merges elements together using a chunking strategy to create larger elements that will improve query results. For instance, the DocSet might have an element that is a table and an element that is a caption for that table. Merging those elements together creates a chunk that’s a better search result.
We will use Sycamore’s Merge transform with the GreedySectionMerger merging strategy to add elements in the same document section together into larger chunks:
With chunks created, it’s time to add vector embeddings for the chunks.
Use vector embeddings to enable semantic search in OpenSearch Service. With semantic search, retrieve documents that are close to a query in a multidimensional space, rather than by matching words exactly. In RAG systems, it’s common to use semantic search along with lexical search for a hybrid search. Using hybrid search, you get best-of-all-worlds retrieval.
The code in cell 4 creates vector embeddings for each chunk. You can use a variety of different AI models with Sycamore’s embed transform to create vector embeddings. You can run these locally or use a service like Amazon Bedrock or OpenAI. The embedding model you choose has a huge impact on your search quality, and it’s common to experiment with this variable as well. In this example, you create embeddings locally using a model called GTE:
You use materialize again here, so you can checkpoint the processed DocSet before loading. If there is an error when loading the indexes, you can retry without running the last few steps of the pipeline again.
The final ETL step is loading the prepared data into OpenSearch Service vector and keyword indexes to power hybrid search for the RAG application. Sycamore makes loading indexes straightforward with its set of connectors. Cell 5 adds configuration, specifying the OpenSearch Service domain endpoint and what indexes to create. If you’re following along, be sure to replace YOUR-DOMAIN-ENDPOINT, YOUR-OPENSEARCH-USERNAME, and YOUR-OPENSEARCH-PASSWORD in cell 5 with the actual values.
If you copied your domain endpoint from the console, it will start with the https:// URL scheme. When you replace YOUR-DOMAIN-ENDPOINT, be sure to remove https://.
In cell 6, Sycamore’s OpenSearch connector loads the data into an OpenSearch index:
Congratulations! You’ve completed some of the core processing steps to take raw PDFs and prepare them as a source for retrieval in a RAG application. In the next cells, you will run a couple of RAG queries.
In cell 7, Sycamore’s query and summarize functions create a RAG pipeline on the data. The query step uses OpenSearch’s vector search to retrieve the relevant passages for RAG. Then, cell 8 runs a second RAG query that filters on metadata that Sycamore extracted in the ETL pipeline, yielding even better results. You could also use an OpenSearch hybrid search pipeline to perform hybrid vector and lexical retrieval.
Cell 7 asks “What was common with incidents in Texas, and how does that differ from incidents in California?” Sycamore’s summarize_data transform runs the RAG query, and uses the LLM specified for generation (in this case, it’s Anthropic’s Claude):
Cell 8 makes a small adjustment to the code to add a filter to the vector search, filtering for documents from incidents with the location of California. Filters increase the accuracy of chatbot responses by removing irrelevant data from the result the RAG pipeline passes to the LLM in the prompt.
To add a filter, cell 8 adds a filter clause to the k-nearest neighbors (k-NN) query:
The output from the RAG query is as follows:
Be sure to clean up the resources you deployed for this walkthrough:
In this post, you used Aryn DocParse and Sycamore to parse, extract, enrich, clean, embed, and load data into vector and keyword indexes in OpenSearch Service. You then used Sycamore to run RAG queries on this data. Your second RAG query used an OpenSearch filter on metadata to get a more accurate result.
The way in which your documents are parsed, enriched, and processed has a significant impact on the quality of your RAG queries. You can use the examples in this post to build your own RAG systems with Aryn and OpenSearch Service, and iterate on the processing and retrieval strategies as you build your generative AI application.
Jon Handler is Director of Solutions Architecture for Search Services at Amazon Web Services, based in Palo Alto, CA. Jon works closely with OpenSearch and Amazon OpenSearch Service, providing help and guidance to a broad range of customers who have search and log analytics workloads for OpenSearch. Prior to joining AWS, Jon’s career as a software developer included four years of coding a large-scale ecommerce search engine. Jon holds a Bachelor of the Arts from the University of Pennsylvania, and a Master’s of Science and a PhD in Computer Science and Artificial Intelligence from Northwestern University.
Jon is the founding Chief Product Officer at Aryn. Prior to that, he was the SVP of Product Management at Dremio, a data lake company. Earlier, Jon was a Director at AWS, and led product management for in-memory database services (Amazon ElastiCache and Amazon MemoryDB for Redis), Amazon EMR (Apache Spark and Hadoop), and founded and was GM of the blockchain division. Jon has an MBA from Stanford Graduate School of Business and a BA in Chemistry from Washington University in St. Louis.
Post Syndicated from Patrick Kennedy original https://www.servethehome.com/intel-xeon-6700p-and-6500p-granite-rapids-sp-for-the-masses-initial-benchmarks-and-first-look/
We take a look at the new Intel Xeon 6700P and 6500P processors codenamed “Granite Rapids-SP” to see what they offer
The post Intel Xeon 6700P and 6500P Granite Rapids-SP for the Masses Initial Benchmarks and First Look appeared first on ServeTheHome.
Post Syndicated from Patrick Kennedy original https://www.servethehome.com/intel-xeon-6-soc-is-here-granite-rapids-d-is-huge/
The new Intel Xeon 6 SoC has up to 72 cores, 200Gbps of networking, along with various built-in accelerators
The post Intel Xeon 6 SoC is Here Granite Rapids-D is HUGE appeared first on ServeTheHome.
Post Syndicated from Patrick Kennedy original https://www.servethehome.com/intel-xeon-6300-launched-for-entry-servers-with-2019-core-counts/
The Intel Xeon 6300 series is the new Xeon E. With a maximum of 8 cores it only manages to keep pace with 2019 Xeon E-2200 core counts
The post Intel Xeon 6300 Launched for Entry Servers with 2019 Core Counts appeared first on ServeTheHome.
Post Syndicated from jzb original https://lwn.net/Articles/1010868/
The AlmaLinux project has published
a request for comments (RFC) on rebuilding Fedora’s Extra Packages for
Enterprise Linux (EPEL), which provides additional software for
Red Hat Enterprise Linux (RHEL) and its derivatives, to support older
x86_64 hardware that is not supported by EPEL 10. While this may
sound simple on the surface, the proposed rebuild carries a few
potential risks that the AlmaLinux and EPEL contributors would like to
avoid. The AlmaLinux
Engineering Steering Committee (ALESCo) is currently considering
feedback and will vote on the RFC in March.
Post Syndicated from jake original https://lwn.net/Articles/1011611/
The Emacs extensible text
editor (among other things) has made a security release to address two
vulnerabilities. Emacs 30.1 has fixes for CVE-2025-1244,
which is a shell-command-injection flaw in the man.el man page browser and
for CVE-2024-53920,
which is a code-execution vulnerability in the flymake
syntax-checking mode. LWN covered the
flymake problems back in December.
Post Syndicated from jake original https://lwn.net/Articles/1011610/
Security updates have been issued by AlmaLinux (bind, bind9.18, libpq, mysql, postgresql, postgresql:15, and postgresql:16), Debian (fort-validator, gnutls28, krb5, libxml2, and python-werkzeug), Fedora (chromium, openssh, proftpd, python3.8, vaultwarden, and vim), Oracle (bind, bind9.16, bind9.18, libpq, libsoup, mysql, mysql:8.0, nodejs:18, nodejs:22, postgresql, postgresql:13, postgresql:15, and postgresql:16), Red Hat (mysql, mysql:8.0, and python3), SUSE (chromedriver, dcmtk, grub2, java-1_8_0-ibm, java-23-openjdk, luanti, openssh, postgresql14, postgresql15, postgresql16, postgresql17, proftpd, radare2, and webkit2gtk3), and Ubuntu (intel-microcode, netty, and nginx).
Post Syndicated from The History Guy: History Deserves to Be Remembered original https://www.youtube.com/watch?v=XBLmOGwhSAs
Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/02/more-research-showing-ai-breaking-the-rules.html
These researchers had LLMs play chess against better opponents. When they couldn’t win, they sometimes resorted to cheating.
Researchers gave the models a seemingly impossible task: to win against Stockfish, which is one of the strongest chess engines in the world and a much better player than any human, or any of the AI models in the study. Researchers also gave the models what they call a “scratchpad:” a text box the AI could use to “think” before making its next move, providing researchers with a window into their reasoning.
In one case, o1-preview found itself in a losing position. “I need to completely pivot my approach,” it noted. “The task is to ‘win against a powerful chess engine’—not necessarily to win fairly in a chess game,” it added. It then modified the system file containing each piece’s virtual position, in effect making illegal moves to put itself in a dominant position, thus forcing its opponent to resign.
Between Jan. 10 and Feb. 13, the researchers ran hundreds of such trials with each model. OpenAI’s o1-preview tried to cheat 37% of the time; while DeepSeek R1 tried to cheat 11% of the timemaking them the only two models tested that attempted to hack without the researchers’ first dropping hints. Other models tested include o1, o3-mini, GPT-4o, Claude 3.5 Sonnet, and Alibaba’s QwQ-32B-Preview. While R1 and o1-preview both tried, only the latter managed to hack the game, succeeding in 6% of trials.
Here’s the paper.
Post Syndicated from LastWeekTonight original https://www.youtube.com/watch?v=nf7XHR3EVHo