Amazon DataZone enhances data discovery with advanced search filtering

2024-07-01 Chaitanya Vejendla

Post Syndicated from Chaitanya Vejendla original https://aws.amazon.com/blogs/big-data/amazon-datazone-enhances-data-discovery-with-advanced-search-filtering/

Amazon DataZone, a fully managed data management service, helps organizations catalog, discover, analyze, share, and govern data between data producers and consumers. We are excited to announce the introduction of advanced search filtering capabilities in the Amazon DataZone business data catalog.

With the improved rendering of glossary terms, you can now navigate large sets of terms with ease in an expandable and collapsible hierarchy, reducing the time and effort required to locate specific data assets. The introduction of logical operators (AND and OR) for filtering allows for more precise searches, enabling you to combine multiple criteria in a way that best suits your needs. The descriptive summary of search criteria helps users keep track of their applied filters, making it simple to adjust search parameters on the fly.

In this post, we discuss how these new search filtering capabilities enhance the user experience and boost the accuracy of search results, facilitating the ability to find data quickly.

Challenges

Many of our customers manage vast numbers of data assets within the Amazon DataZone catalog for discoverability. Data producers tag these assets with business glossary terms to classify and enhance discovery. For example, data assets owned by a particular department can be tagged with the glossary term for that department, like “Marketing.”

Data consumers searching for the right data assets use faceted search with various criteria, including business glossary terms, and apply filters to refine their search results. However, finding the right data assets can be challenging, especially when it involves combining multiple filters. Customers wanted more flexibility and precision in their search capabilities, such as:

A more intuitive way to navigate through extensive lists of glossary terms
The ability to apply more nuanced search logic to refine search results with greater precision
A summary of applied filters to effortlessly review and adjust search criteria

New features in Amazon DataZone

With the latest release, Amazon DataZone now supports features that enhance search flexibility and accuracy:

Improved rendering of glossary terms – Glossary terms are now displayed in a hierarchical view, providing a more organized structure. You can navigate and select from long lists of glossary terms presented in an expandable and collapsible hierarchy within the search facets. For instance, a data scientist can quickly find specific customer demographic data without sifting through an overwhelming flat list.
Logical operators for refined search – You can now choose logical operators to refine your search results, offering greater control and precision. For example, a financial analyst preparing a report on investment performance can use AND logic to combine criteria like investment type and region to pinpoint the exact data needed, or use OR logic to broaden the search to include any investments that meet either criterion.
Summary of search criteria – A descriptive summary of applied search filters is now provided, allowing you to review and manage your search criteria with ease. For example, a project manager can quickly adjust filters to find project-related assets matching specific phases or statuses.

These enhancements enable you to better understand the relationships between different search facets, enhancing the overall search experience and making it effortless to find the right data assets.

Use case overview

To demonstrate these search enhancements, we set up a new Amazon DataZone domain with two projects:

Marketing project – Publishes campaign-related data assets from the Marketing department. These data assets have been tagged with relevant business glossary terms corresponding to marketing.
Sales project – Publishes sales-related datasets from the Sales department. These data assets have been tagged with relevant business glossary terms corresponding to sales.

The following screenshots show examples of the different tagged assets.

In the following sections, we demonstrate the improvements in the user search experience for this use case.

Improved rendering of glossary terms

As a data consumer, you want to discover data assets using the faceted search capability within Amazon DataZone.

The search result panel has been enhanced to display glossaries and glossary terms in a hierarchical fashion. This allows you to expand and collapse sections for a more intuitive search experience.

For example, if you want to find product sales data assets from the Corporate Sales department, you can select the appropriate term within the glossary. The selection criteria and the corresponding result list show a total of 18 data assets, as shown in the following screenshot.

Next, if you want to further refine your search to focus only on the product category of Smartphones, you can do so.

Because OR is the default logical operator for your search within the glossary terms, it lists all the assets that are either part of Corporate Sales or tagged with Smartphones.

Logical operators for refined search

You now have the flexibility to change the default operator to AND to list only those data assets that are part of Corporate Sales and tagged with Smartphones, narrowing down the result set.

Additionally, you can further filter based on the asset type by selecting the available options. When you select Glue Table as your asset type, it defaults to the AND condition across the glossary terms and the asset type filter, thereby showing the data assets that satisfy all the filter conditions.

You also have the flexibility to change the operator to OR across these filters, yielding a more exhaustive list of data assets.

Summary of search criteria

As we showed in the preceding screenshots, the results also display a summary of the filters you applied for the search. This enables you to review and better manage your search criteria.

Conclusion

This post demonstrated new Amazon DataZone search enhancement features that streamline data discovery for a more intuitive user experience. These enhancements are designed to empower data consumers within organizations to make more informed decisions, faster. By streamlining the search process and making it more intuitive, Amazon DataZone continues to support the growing needs of data-driven businesses, helping you unlock the full potential of your data assets.

For more information about Amazon DataZone and to get started, refer to the Amazon DataZone User Guide.

About the authors

Chaitanya Vejendla is a Senior Solutions Architect specialized in DataLake & Analytics primarily working for Healthcare and Life Sciences industry division at AWS. Chaitanya is responsible for helping life sciences organizations and healthcare companies in developing modern data strategies, deploy data governance and analytical applications, electronic medical records, devices, and AI/ML-based applications, while educating customers about how to build secure, scalable, and cost-effective AWS solutions. His expertise spans across data analytics, data governance, AI, ML, big data, and healthcare-related technologies.

Ramesh H Singh is a Senior Product Manager Technical (External Services) at AWS in Seattle, Washington, currently with the Amazon DataZone team. He is passionate about building high-performance ML/AI and analytics products that enable enterprise customers to achieve their critical goals using cutting-edge technology.

Rishabh Asthana is a Front-end Engineer at AWS, working with the Amazon DataZone team based in New York City, USA.

Somdeb Bhattacharjee is an Enterprise Solutions Architect based out of New York, USA focused on helping customers on their cloud journey. He has interest in Databases, Big Data and Analytics.

AWS Weekly Roundup: AI21 Labs’ Jamba-Instruct in Amazon Bedrock, Amazon WorkSpaces Pools, and more (July 1, 2024)

2024-07-01 Esra Kayabali

Post Syndicated from Esra Kayabali original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-ai21-labs-jamba-instruct-in-amazon-bedrock-amazon-workspaces-pools-and-more-july-1-2024/

AWS Summit New York is 10 days away, and I am very excited about the new announcements and more than 170 sessions. There will be A Night Out with AWS event after the summit for professionals from the media and entertainment, gaming, and sports industries who are existing Amazon Web Services (AWS) customers or have a keen interest in using AWS Cloud services for their business. You’ll have the opportunity to relax, collaborate, and build new connections with AWS leaders and industry peers.

Let’s look at the last week’s new announcements.

Last week’s launches
Here are the launches that got my attention.

AI21 Labs’ Jamba-Instruct now available in Amazon Bedrock – AI21 Labs’ Jamba-Instruct is an instruction-following large language model (LLM) for reliable commercial use, with the ability to understand context and subtext, complete tasks from natural language instructions, and ingest information from long documents or financial filings. With strong reasoning capabilities, Jamba-Instruct can break down complex problems, gather relevant information, and provide structured outputs to enable uses like Q&A on calls, summarizing documents, building chatbots, and more. For more information, visit AI21 Labs in Amazon Bedrock and the Amazon Bedrock User Guide.

Amazon WorkSpaces Pools, a new feature of Amazon WorkSpaces – You can now create a pool of non-persistent virtual desktops using Amazon WorkSpaces and save costs by sharing them across users who receive a fresh desktop each time they sign in. WorkSpaces Pools provides the flexibility to support shared environments like training labs and contact centers, and some user settings like bookmarks and files stored in a central storage repository such as Amazon Simple Storage Service (Amazon S3) or Amazon FSx can be saved for improved personalization. You can use AWS Auto Scaling to automatically scale the pool of virtual desktops based on usage metrics or schedules. For pricing information, refer to the Amazon WorkSpaces Pricing page.

API-driven, OpenLineage-compatible data lineage visualization in Amazon DataZone (preview) – Amazon DataZone introduces a new data lineage feature that allows you to visualize how data moves from source to consumption across organizations. The service captures lineage events from OpenLineage-enabled systems or through API to trace data transformations. Data consumers can gain confidence in an asset’s origin, and producers can assess the impact of changes by understanding its consumption through the comprehensive lineage view. Additionally, Amazon DataZone versions lineage with each event to enable visualizing lineage at any point in time or comparing transformations across an asset or job’s history. To learn more, visit Amazon DataZone, read my News Blog post, and get started with data lineage documentation.

Knowledge Bases for Amazon Bedrock now offers observability logs – You can now monitor knowledge ingestion logs through Amazon CloudWatch, S3 buckets, or Amazon Data Firehose streams. This provides enhanced visibility into whether documents were successfully processed or encountered failures during ingestion. Having these comprehensive insights promptly ensures that you can efficiently determine when your documents are ready for use. For more details on these new capabilities, refer to the Knowledge Bases for Amazon Bedrock documentation.

Updates and expansion to the AWS Well-Architected Framework and Lens Catalog – We announced updates to the AWS Well-Architected Framework and Lens Catalog to provide expanded guidance and recommendations on architectural best practices for building secure and resilient cloud workloads. The updates reduce redundancies and enhance consistency in resources and framework structure. The Lens Catalog now includes the new Financial Services Industry Lens and updates to the Mergers and Acquisitions Lens. We also made important updates to the Change Enablement in the Cloud whitepaper. You can use the updated Well-Architected Framework and Lens Catalog to design cloud architectures optimized for your unique requirements by following current best practices.

Cross-account machine learning (ML) model sharing support in Amazon SageMaker Model Registry – Amazon SageMaker Model Registry now integrates with AWS Resource Access Manager (AWS RAM), allowing you to easily share ML models across AWS accounts. This helps data scientists, ML engineers, and governance officers access models in different accounts like development, staging, and production. You can share models in Amazon SageMaker Model Registry by specifying the model in the AWS RAM console and granting access to other accounts. This new feature is now available in all AWS Regions where SageMaker Model Registry is available except GovCloud Regions. To learn more, visit the Amazon SageMaker Developer Guide.

AWS CodeBuild supports Arm-based workloads using AWS Graviton3 – AWS CodeBuild now supports natively building and testing Arm workloads on AWS Graviton3 processors without additional configuration, providing up to 25% higher performance and 60% lower energy usage than previous Graviton processors. To learn more about CodeBuild’s support for Arm, visit our AWS CodeBuild User Guide.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

We launched existing services and instance types in additional Regions:

Amazon RDS now supports integration with AWS Secrets Manager in the AWS GovCloud (US) Regions. RDS integration with AWS Secrets Manager improves your database security by ensuring your RDS master user password is not visible in plaintext to administrators or engineers during your database creation workflow.
Amazon OpenSearch Serverless is now available in Canada (Central) Region. OpenSearch Serverless is a serverless deployment option for Amazon OpenSearch Service that makes it simple to run search and analytics workloads without the complexities of infrastructure management.
Amazon Redshift Concurrency Scaling is now available in the AWS Europe (Spain, Zurich) and Middle East (UAE) Regions. Amazon Redshift Concurrency Scaling elastically scales query processing power to provide consistently fast performance for hundreds of concurrent queries.
Amazon ElastiCache now supports Graviton3-based M7g and R7g node families in the following AWS Regions: US East (N. Virginia, Ohio), US West (Oregon, N. California), Canada (Central), South America (São Paulo), Europe (Frankfurt, Ireland, London, Paris (M7g only), Spain, Stockholm), and Asia Pacific (Hyderabad, Mumbai, Seoul, Singapore, Sydney, Tokyo). These new nodes deliver up to 28% increased throughput, 21% improved P99 latency, and 25% higher networking bandwidth compared to Graviton2 nodes for improved price performance.
Amazon EC2 C6a instances are now available in Asia Pacific (Hong Kong) Region. C6a instances are powered by third-generation AMD EPYC processors with a maximum frequency of 3.6 GHz. C6a instances deliver up to 15% better price performance than comparable C5a instances.
Amazon Redshift Serverless with lower base capacity is now available in the Asia Pacific (Mumbai), Europe (Stockholm), and US West (N. California) Regions. Amazon Redshift Serverless now has a lower minimum base capacity of 8 RPUs, down from 32 RPUs, providing more flexibility to support a diverse range of small to large workloads based on price-performance requirements by measuring capacity in RPUs paid per second.
Amazon Athena Provisioned Capacity is now available in South America (São Paulo) and Europe (Spain) Regions. Provisioned Capacity is a feature of Athena that allows you to run SQL queries on fully managed, dedicated serverless resources for a fixed price and no long-term commitments.
Amazon CloudWatch Logs account-level subscription filter is now available in the AWS GovCloud (US-East, US-West) Regions, as well as Israel (Tel Aviv), and Canada West (Calgary) Regions. With this new capability, you can deliver real-time log events that are ingested into Amazon CloudWatch Logs to a Kinesis Data Streams data stream, an Amazon Data Firehose delivery stream, or an AWS Lambda function for custom processing, analysis, or delivery to other destinations using a single account level subscription filter.
Amazon Route 53 Application Recovery Controller (Route 53 ARC) zonal autoshift is now generally available in the AWS GovCloud (US-East, US-West) Regions. With this feature, you can safely and automatically shift an application’s traffic away from an Availability Zone when AWS identifies a potential failure affecting that Availability Zone.
AWS Backup support for Amazon S3 is now available in AWS Canada West (Calgary) Region. AWS Backup is a policy-based, fully managed, and cost-effective solution that enables you to centralize and automate data protection of Amazon S3 along with other AWS services (spanning compute, storage, and databases) and third-party applications.
Amazon EC2 High Memory instances with 3TiB of memory are now available in Asia Pacific (Hong Kong) Region. You can start using these new High Memory instances with On Demand and Savings Plan purchase options.

Other AWS news
Here are some additional news items that you might find interesting:

Top reasons to build and scale generative AI applications on Amazon Bedrock – Check out Jeff Barr’s video, where he discusses why our customers are choosing Amazon Bedrock to build and scale generative artificial intelligence (generative AI) applications that deliver fast value and business growth. Amazon Bedrock is becoming a preferred platform for building and scaling generative AI due to its features, innovation, availability, and security. Leading organizations across diverse sectors use Amazon Bedrock to speed their generative AI work, like creating intelligent virtual assistants, creative design solutions, document processing systems, and a lot more.

Four ways AWS is engineering infrastructure to power generative AI – We continue to optimize our infrastructure to support generative AI at scale through innovations like delivering low-latency, large-scale networking to enable faster model training, continuously improving data center energy efficiency, prioritizing security throughout our infrastructure design, and developing custom AI chips like AWS Trainium to increase computing performance while lowering costs and energy usage. Read the new blog post about how AWS is engineering infrastructure for generative AI.

AWS re:Inforce 2024 re:Cap – It’s been 2 weeks since AWS re:Inforce 2024, our annual cloud-security learning event. Check out the summary of the event prepared by Wojtek.

Upcoming AWS events
Check your calendars and sign up for upcoming AWS events:

AWS Summits – Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. To learn more about future AWS Summit events, visit the AWS Summit page. Register in your nearest city: New York (July 10), Bogotá (July 18), and Taipei (July 23–24).

AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world. Upcoming AWS Community Days are in Cameroon (July 13), Aotearoa (August 15), and Nigeria (August 24).

Browse all upcoming AWS led in-person and virtual events and developer-focused events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Esra

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Annie Eaton | The Extended Reality Blueprint | Talks at Google

2024-07-01 Talks at Google

Post Syndicated from Talks at Google original https://www.youtube.com/watch?v=O94AWbLkuSM

Comic for 2024.07.01 – Mix

2024-07-01 Explosm.net

Post Syndicated from Explosm.net original https://explosm.net/comics/mix

New Cyanide and Happiness Comic

[$] Arithmetic overflow mitigation in the kernel

2024-07-01 daroc

Post Syndicated from daroc original https://lwn.net/Articles/979747/

On May 7, Kees Cook sent

a proposal to the linux-kernel mailing list, asking for the kernel
developers to start
working on a way to mitigate unintentional arithmetic overflow, which has been a
source of many bugs. This is not the first time Cook has made a request along
these lines; he sent a related patch set in
January 2024.
Several core developers objected to the plan for different
reasons. After receiving their feedback,
Cook modified his approach to tackle the problem
in a series of smaller steps.

Security updates for Monday

2024-07-01 corbet

Post Syndicated from corbet original https://lwn.net/Articles/980252/

Security updates have been issued by Debian (dcmtk, edk2, emacs, glibc, gunicorn, libmojolicious-perl, openssh, org-mode, pdns-recursor, tryton-client, and tryton-server), Fedora (freeipa, kitty, libreswan, mingw-gstreamer1, mingw-gstreamer1-plugins-bad-free, mingw-gstreamer1-plugins-base, mingw-gstreamer1-plugins-good, mingw-poppler, and mingw-python-urllib3), Gentoo (cpio, cryptography, GNU Emacs, Org Mode, GStreamer, GStreamer Plugins, Liferea, Pixman, SDL_ttf, SSSD, and Zsh), Oracle (pki-core), Red Hat (httpd:2.4, libreswan, and pki-core), SUSE (glib2 and kubevirt, virt-api-container, virt-controller-container, virt-exportproxy-container, virt-exportserver-container, virt-handler-container, virt-launcher-container, virt-libguestfs-t), and Ubuntu (espeak-ng, libcdio, and openssh).

Serious vulnerability fixed with OpenSSH 9.8

2024-07-01 corbet

Post Syndicated from corbet original https://lwn.net/Articles/980211/

OpenSSH 9.8 has been
released, fixing an ugly vulnerability:

Successful exploitation has been demonstrated on 32-bit Linux/glibc
systems with ASLR. Under lab conditions, the attack requires on
average 6-8 hours of continuous connections up to the maximum the
server will accept. Exploitation on 64-bit systems is believed to
be possible but has not been demonstrated at this time. It’s likely
that these attacks will be improved upon.

Exploitation on non-glibc systems is conceivable but has not been
examined.

There is a
configuration workaround for systems that cannot be updated, though it
has its own problems. See this Qualys
advisory for more details.

Hello World #24 out now: Impact of tech

2024-07-01 Meg Wang

Post Syndicated from Meg Wang original https://www.raspberrypi.org/blog/hello-world-24-out-now-impact-of-tech/

Do you remember a time before social media? Mobile phones? Email? We are surrounded by digital technology, and new applications impact our lives whether we engage with them or not. Issue 24 of Hello World, out today for free, gives you ideas for how to help your learners think openly and critically about technology.

Teaching about the impact of technology

For learners to become informed, empowered citizens, they need to understand the impact technology has on them as individuals, and on society as a whole. In our brand-new issue of Hello World, educators share insights from their work in and around classrooms that will help you engage your learners in learning about and discussing the impact of tech.

For example:

Jasmeen Kanwal and the team at Data Education in Schools share their resources for how young people can start to learn the skills they need to change the world with data
Julie York writes about how incorporating AI education into any classroom can help students prepare for future careers
Ben Hall discusses whether technology is divisive or inclusive, and how you can encourage students to think critically about it

This issue also includes stories on how educators use technology to create a positive impact for learners:

Yolanda Payne tells you how she’s using teaching experiences from the COVID-19 pandemic to bring better remote learning to communities in Georgia, USA, and in the US Virgin Islands
Mitchel Resnik and Natalie Rusk from Lifelong Kindergarten group at MIT Media Lab introduce their new free mobile app, OctoStudio, and how it helps learners and educators in underresourced areas get creative with code

And there is lots more for you to discover in issue 24.

Download your free digital copy

The issue also covers how you can make time to teach about the impact of technology in an already packed curriculum. Sway Grantham, Senior Learning Manager at the Raspberry Pi Foundation, says in her article:

“As adults, it is easy for us to see the impact technology has had on society and on our lives. Yet when I tell pupils that, within my lifetime, it wasn’t always illegal to hold your mobile phone to your ear and have a call while driving, they are horrified. They are living in the now and don’t yet have the perspective to allow them to see the change that has happened. However, knowing the impact of technology allows us to learn from previous mistakes, to make decisions around ethical behaviour (such as using a phone while driving), and to critically engage in real-world issues.

As teachers, allocating some time to this topic throughout the year can seem challenging, but with a few small changes, the impact might be more than you can imagine.”

Share your thoughts & subscribe to Hello World

With so many aspects of life impacted by technology, computing educators play a crucial role in supporting young people to become informed, empowered citizens. We hope you enjoy this issue of Hello World and find it useful in your teaching.

Share your thoughts and ideas about the new issue and the topic on social media by tagging the Hello World Twitter/X or Facebook accounts
Find out how you can write for the magazine
Subscribe to Hello World for free to never miss an issue

The post Hello World #24 out now: Impact of tech appeared first on Raspberry Pi Foundation.

Best of Lost and Found Animals

2024-07-01 The History Guy: History Deserves to Be Remembered

Post Syndicated from The History Guy: History Deserves to Be Remembered original https://www.youtube.com/watch?v=J1igOgxBCmY

Model Extraction from Neural Networks

2024-07-01 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/07/model-extraction-from-neural-networks.html

A new paper, “Polynomial Time Cryptanalytic Extraction of Neural Network Models,” by Adi Shamir and others, uses ideas from differential cryptanalysis to extract the weights inside a neural network using specific queries and their results. This is much more theoretical than practical, but it’s a really interesting result.

Abstract:

Billions of dollars and countless GPU hours are currently spent on training Deep Neural Networks (DNNs) for a variety of tasks. Thus, it is essential to determine the difficulty of extracting all the parameters of such neural networks when given access to their black-box implementations. Many versions of this problem have been studied over the last 30 years, and the best current attack on ReLU-based deep neural networks was presented at Crypto’20 by Carlini, Jagielski, and Mironov. It resembles a differential chosen plaintext attack on a cryptosystem, which has a secret key embedded in its black-box implementation and requires a polynomial number of queries but an exponential amount of time (as a function of the number of neurons). In this paper, we improve this attack by developing several new techniques that enable us to extract with arbitrarily high precision all the real-valued parameters of a ReLU-based DNN using a polynomial number of queries and a polynomial amount of time. We demonstrate its practical efficiency by applying it to a full-sized neural network for classifying the CIFAR10 dataset, which has 3072 inputs, 8 hidden layers with 256 neurons each, and about 1.2 million neuronal parameters. An attack following the approach by Carlini et al. requires an exhaustive search over 2^256 possibilities. Our attack replaces this with our new techniques, which require only 30 minutes on a 256-core computer.