Sabrent Apex X16 Rocket 5 Destroyer 64TB PCIe Gen5 Card Shown

Post Syndicated from Cliff Robinson original https://www.servethehome.com/sabrent-apex-x16-rocket-5-destroyer-64tb-pcie-gen5-card-shown/

The Sabrent Apex X16 Rocket 5 Destroyer is a 64TB card that uses a Microchip Switchtec PCIe Gen5 switch to provide over 50GB/s of throughput

The post Sabrent Apex X16 Rocket 5 Destroyer 64TB PCIe Gen5 Card Shown appeared first on ServeTheHome.

Modernize your data observability with Amazon OpenSearch Service zero-ETL integration with Amazon S3

Post Syndicated from Joshua Bright original https://aws.amazon.com/blogs/big-data/modernize-your-data-observability-with-amazon-opensearch-service-zero-etl-integration-with-amazon-s3/

We are excited to announce the general availability of Amazon OpenSearch Service zero-ETL integration with Amazon Simple Storage Service (Amazon S3) for domains running 2.13 and above. The integration is new way for customers to query operational logs in Amazon S3 and Amazon S3-based data lakes without needing to switch between tools to analyze operational data. By querying across OpenSearch Service and S3 datasets, you can evaluate multiple data sources to perform forensic analysis of operational and security events. The new integration with OpenSearch Service supports AWS’s zero-ETL vision to reduce the operational complexity of duplicating data or managing multiple analytics tools by enabling you to directly query your operational data, reducing costs and time to action.

OpenSearch is an open source, distributed search and analytics suite derived from Elasticsearch 7.10. OpenSearch Service currently has tens of thousands of active customers with hundreds of thousands of clusters under management processing hundreds of trillions of requests per month.

Amazon S3 is an object storage service offering industry-leading scalability, data availability, security, and performance. Organizations of all sizes and industries can store and protect any amount of data for virtually any use case, such as data lakes, cloud-centered applications, and mobile apps. With cost-effective storage classes and user-friendly management features, you can optimize costs, organize data, and configure fine-tuned access controls to meet specific business, organizational, and compliance requirements. Let’s dig into this exciting new feature for OpenSearch Service.

Benefits of using OpenSearch Service zero-ETL integration with Amazon S3

OpenSearch Service zero-ETL integration with Amazon S3 allows you to use the rich analytics capabilities of OpenSearch Service SQL and PPL directly on infrequently queried data stored outside of OpenSearch Service in Amazon S3. It also integrates with other OpenSearch integrations so you can install prepackaged queries and visualizations to analyze your data, making it straightforward to quickly get started.

The following diagram illustrates how OpenSearch Service unlocks value stored in infrequently queried logs from popular AWS log types.

You can use OpenSearch Service direct queries to query data in Amazon S3. OpenSearch Service provides a direct query integration with Amazon S3 as a way to analyze operational logs in Amazon S3 and data lakes based in Amazon S3 without having to switch between services. You can now analyze data in cloud object stores and simultaneously use the operational analytics and visualizations of OpenSearch Service.

Many customers currently use Amazon S3 to store event data for their solutions. For operational analytics, Amazon S3 is typically used as a destination for VPC Flow Logs, Amazon S3 Access Logs, AWS Load Balancer Logs, and other event sources from AWS services. Customers also store data directly from application events in Amazon S3 for compliance and auditing needs. The durability and scalability of Amazon S3 makes it an obvious data destination for many customers that want a longer-term storage or archival option at a cost-effective price point.

Bringing data from these sources into OpenSearch Service stored in hot and warm storage tiers may be prohibitive due to the size and volume of the events being generated. For some of these event sources that are stored into OpenSearch Service indexes, the volume of queries run against the data doesn’t justify the cost to continue to store them in their cluster. Previously, you would pick and choose which event sources you brought in for ingestion into OpenSearch Service based on the storage provisioned in your cluster. Access to other data meant using different tools such as Amazon Athena to view the data on Amazon S3.

For a real-world example, let’s see how using the new integration benefited Arcesium.

“Arcesium provides advanced cloud-native data, operations, and analytics capabilities for the financial services industry. Our software platform processes many millions of transactions a day, emitting large volumes of log and audit records along the way. The volume of log data we needed to process, store, and analyze was growing exponentially given our retention and compliance needs. Amazon OpenSearch Service’s new zero-ETL integration with Amazon S3 is helping our business scale by allowing us to analyze infrequently queried logs already stored in Amazon S3 instead of incurring the operational expense of maintaining large and costly online OpenSearch clusters or building ad hoc ingestion pipelines.”

– Kyle George, SVP & Global Head of Infrastructure at Arcesium.

With direct queries with Amazon S3, you no longer need to build complex extract, transform, and load (ETL) pipelines or incur the expense of duplicating data in both OpenSearch Service and Amazon S3 storage.

Fundamental concepts

After configuring a direct query connection, you’ll need to create tables in the AWS Glue Data Catalog using the OpenSearch Service Query Workbench. The direct query connection relies on the metadata in Glue Data Catalog tables to query data stored in Amazon S3. Note that tables created by AWS Glue crawlers or Athena are not currently supported.

By combining the structure of Data Catalog tables, SQL indexing techniques, and OpenSearch Service indexes, you can accelerate query performance, unlock advanced analytics capabilities, and contain querying costs. Below are a few examples of how you can accelerate your data:

  • Skipping indexes – You ingest and index only the metadata of the data stored in Amazon S3. When you query a table with a skipping index, the query planner references the index and rewrites the query to efficiently locate the data, instead of scanning all partitions and files. This allows the skipping index to quickly narrow down the specific location of the stored data that’s relevant to your analysis.
  • Materialized views – With materialized views, you can use complex queries, such as aggregations, to power dashboard visualizations. Materialized views ingest a small amount of your data into OpenSearch Service storage.
  • Covering indexes – With a covering index, you can ingest data from a specified column in a table. This is the most performant of the three indexing types. Because OpenSearch Service ingests all data from your desired column, you get better performance and can perform advanced analytics. OpenSearch Service creates a new index from the covering index data. You can use this new index for dashboard visualizations and other OpenSearch Service functionality, such as anomaly detection or geospatial capabilities.

As new data comes in to your S3 bucket, you can configure a refresh interval for your materialized views and covering indexes to provide local access to the most current data on Amazon S3.

Solution overview

Let’s take a test drive using VPC Flow Logs as your source! As mentioned before, many AWS services emit logs to Amazon S3. VPC Flow Logs is a feature of Amazon Virtual Private Cloud (Amazon VPC) that enables you to capture information about the IP traffic going to and from network interfaces in your VPC. For this walkthrough, you perform the following steps:

  1. Create an S3 bucket if you don’t already have one available.
  2. Enable VPC Flow Logs using an existing VPC that can generate traffic and store the logs as Parquet on Amazon S3.
  3. Verify the logs exist in your S3 bucket.
  4. Set up a direct query connection to the Data Catalog and the S3 bucket that has your data.
  5. Install the integration for VPC Flow Logs.

Create an S3 bucket

If you have an existing S3 bucket, you can reuse that bucket by creating a new folder inside of the bucket. If you need to create a bucket, navigate to the Amazon S3 console and create an Amazon S3 bucket with a name that is suitable for your organization.

Enable VPC Flow Logs

Complete the following steps to enable VPC Flow Logs:

  1. On the Amazon VPC console, choose a VPC that has application traffic that can generate logs.
  2. On the Flow Logs tab, choose Create flow log.
  3. For Filter, choose ALL.
  4. Set Maximum aggregation interval to 1 minute.
  5. For Destination, choose Send to an Amazon S3 bucket and provide the S3 bucket ARN from the bucket you created earlier.
  6. For Log record format, choose Custom format and select Standard attributes.

For this post, we don’t select any of the Amazon Elastic Container Service (Amazon ECS) attributes because they’re not implemented with OpenSearch integrations as of this writing.

  1. For Log file format, choose Parquet.
  2. For Hive-compatible S3 prefix, choose Enable.
  3. Set Partition logs by time to every 1 hour (60 minutes).

Validate you are receiving logs in your S3 bucket

Navigate to the S3 bucket you created earlier to see that data is streaming into your S3 bucket. If you drill down and navigate the directory structure, you find that the logs are delivered in an hourly folder and emitted every minute.

Now that you have VPC Flow Logs flowing into an S3 bucket, you need to set up a connection between your data on Amazon S3 and your OpenSearch Service domain.

Set up a direct query data source

In this step, you create a direct query data source which uses Glue Data Catalog tables and your Amazon S3 data. The action creates all the necessary infrastructure to give you access to the Hive metastore (databases and tables in Glue Data Catalog and the data housed in Amazon S3 for the bucket and folder combination you want the data source to have access to. It will also wire in all the appropriate permissions with the Security plugin’s fine-grained access control so you don’t have to worry about permissions to get started.

Complete the following steps to set up your direct query data source:

  1. On the OpenSearch Service domain, choose Domains in the navigation pane.
  2. Choose your domain.
  3. On the Connections tab, choose Create new connection.
  4. For Name, enter a name without dashes, such as zero_etl_walkthrough.
  5. For Description, enter a descriptive name.
  6. For Data source type, choose Amazon S3 with AWS Glue Data Catalog.
  7. For IAM role, if this is your first time, let the direct query setup take care of the permissions by choosing Create a new role. You can edit it later based on your organization’s compliance and security needs. For this post, we name the role zero_etl_walkthrough.
  8. For S3 buckets, use the one you created.
  9. Do not select the check box to grant access to all new and existing buckets.
  10. For Checkpoint S3 bucket, use the same bucket you created. The checkpoint folders get created for you automatically.
  11. For AWS Glue tables, because you don’t have anything that you have created in the Data Catalog, enable Grant access to all existing and new tables.

The VPC Flow Logs OpenSearch integration will create resources in the Data Catalog, and you will need access to pick those resources up.

  1. Choose Create.

Now that the initial setup is complete, you can install the OpenSearch integration for VPC Flow Logs.

Install the OpenSearch integration for VPC Flow Logs

The integrations plugin contains a wide variety of prebuilt dashboards, visualizations, mapping templates, and other resources that make visualizing and working with data generated by your sources simpler. The integration for Amazon VPC installs a variety of resources to view your VPC Flow Logs data as it sits in Amazon S3.

In this section, we show you how to make sure you have the most up-to-date integration packages for installation. We then show you how to install the OpenSearch integration. In most cases, you will have the latest integrations such as VPC Flow Logs, NGINX, HA Proxy, or Amazon S3 (access logs) at the time of the release of a minor or major version. However, OpenSearch is an open source community-led project, and you can expect that there will be version changes and new integrations not yet included with your current deployment.

Verify the latest version of the OpenSearch integration for Amazon VPC

You may have upgraded from earlier versions of OpenSearch Service to OpenSearch Service version 2.13. Let’s confirm that your deployment matches what is present in this post.

On OpenSearch Dashboards, navigate to the Integrations tab and choose Amazon VPC. You will see a release version for the integration.

Confirm that you have version 1.1.0 or higher. If your deployment doesn’t have it, you can install the latest version of the integration from the OpenSearch catalog. Complete the following steps:

  1. Navigate to the OpenSearch catalog.
  2. Choose Amazon VPC Flow Logs.
  3. Download the 1.1.0 Amazon VPC Integration file from the repository folder labeled amazon_vpc_flow_1.1.0.
  4. In the OpenSearch Dashboard’s Dashboard Management plugin, choose Saved objects.
  5. Choose Import and browse your local folders.
  6. Import the downloaded file.

The file contains all the necessary objects to create an integration. After it’s installed, you can proceed to the steps to set up the Amazon VPC OpenSearch integration.

Set up the OpenSearch integration for Amazon VPC

Let’s jump in and install the integration:

  1. In OpenSearch Dashboards, navigate to the Integrations tab.
  2. Choose the Amazon VPC integration.
  3. Confirm the version is 1.1.0 or higher and choose Set Up.
  4. For Display Name, keep the default.
  5. For Connection Type, choose S3 Connection.
  6. For Data Source, choose the direct query connection alias you created in prior steps. In this post, we use zero_etl_walkthrough.
  7. For Spark Table Name, keep the prepopulated value of amazon_vpc_flow.
  8. For S3 Data Location, enter the S3 URI of your log folder created by VPC Flow Logs set up in the prior steps. In this post, we use s3://zero-etl-walkthrough/AWSLogs/.

S3 bucket names are globally unique, and you may want to consider using bucket names that conform to your company’s compliance guidance. UUIDs plus a descriptive name are good options to guarantee uniqueness.

  1. For S3 Checkpoint Location, enter the S3 URI of your checkpoint folder which you define. Checkpoints store metadata for the direct query feature. Make sure you pick any empty or unused path in the bucket you choose. In this post, we use s3://zero-etl-walkthrough/CP/, which is in the same bucket we created earlier.
  2. Select Queries (recommended) and Dashboards and Visualizations for Flint Integrations using live queries.

You get a message that states “Setting Up the Integration – this can take several minutes.” This particular integration sets up skipping indexes and materialized views on top of your data in Amazon S3. The materialized view aggregates the data into a backing index that occupies a significantly smaller data footprint in your cluster compared to ingesting all the data and building visualizations on top of it.

When the Amazon VPC integration installation is complete, you have a broad variety of assets to play with. If you navigate to the installed integrations, you will find queries, visualizations, and other assets that can help you jumpstart your data exploration using data sitting on Amazon S3. Let’s look at the dashboard that gets installed for this integration.

I love it! How much does it cost?

With OpenSearch Service direct queries, you only pay for the resources consumed by your workload. OpenSearch Service charges for only the compute needed to query your external data as well as maintain optional indexes in OpenSearch Service. The compute capacity is measured in OpenSearch Compute Units (OCUs). If no queries or indexing activities are active, no OCUs are consumed. The following table contains sample compute prices based on searching HTTP logs in IAD.

Data scanned per query (GB) OCU price per query (USD)
1-10 $0.026
100 $0.24
1000 $1.35

Because the price is based on the OCUs used per query, this solution is tailored for infrequently queried data. If your users query data often, it makes more sense to fully ingest into OpenSearch Service and take advantage of storage optimization techniques such as using OR1 instances or UltraWarm.

OCUs consumed by zero-ETL integrations will be populated in AWS Cost Explorer. This will be at the account level. You can account for OCU usage at the account level and set thresholds and alerts when thresholds have been crossed. The format of the usage type to filter on in Cost Explorer is RegionCode-DirectQueryOCU (OCU-hours). You can create a budget using AWS Budgets and configure an alert to be notified when DirectQueryOCU (OCU-Hours) usage meets the threshold you set. You can also optionally use an Amazon Simple Notification Service (Amazon SNS) topic with an AWS Lambda function as a target to turn off a data source when a threshold criterion is met.

Summary

Now that you have a high-level understanding of the direct query connection feature, OpenSearch integrations, and how the OpenSearch Service zero-ETL integration with Amazon S3 works, you should consider using the feature as part of your organization’s toolset. With OpenSearch Service zero-ETL integration with Amazon S3, you now have a new tool for event analysis. You can bring hot data into OpenSearch Service for near real-time analysis and alerting. For the infrequently queried, larger data, mainly used for post-event analysis and correlation, you can query that data on Amazon S3 without moving the data. The data stays in Amazon S3 for cost-effective storage, and you access that data as needed without building additional infrastructure to move the data into OpenSearch Service for analysis.

For more information, refer to Working with Amazon OpenSearch Service direct queries with Amazon S3.


About the authors

Joshua Bright is a Senior Product Manager at Amazon Web Services. Joshua leads data lake integration initiatives within the OpenSearch Service team. Outside of work, Joshua enjoys listening to birds while walking in nature.

Kevin Fallis is an Principal Specialist Search Solutions Architect at Amazon Web Services. His passion is to help customers leverage the correct mix of AWS services to achieve success for their business goals. His after-work activities include family, DIY projects, carpentry, playing drums, and all things music.


Sam Selvan
is a Principal Specialist Solution Architect with Amazon OpenSearch Service.

[$] Measuring and improving buffered I/O

Post Syndicated from jake original https://lwn.net/Articles/976856/

There are two types of file I/O on Linux, buffered I/O, which goes through
the page cache, and direct I/O, which goes directly to the storage device.
The performance of buffered I/O was reported to be a lot worse than direct
I/O, especially for one specific test, in Luis Chamberlain’s
topic
proposal
for a session at the 2024 Linux Storage,
Filesystem, Memory Management, and BPF Summit
.
The proposal resulted in a lengthy mailing-list discussion,
which also came up in Paul McKenney’s RCU session the next
day; Chamberlain led a
combined storage and filesystem session to discuss those results with an
eye toward improving buffered I/O performance.

Kali Linux 2024.2 released

Post Syndicated from jzb original https://lwn.net/Articles/977303/

Version 2024.2 of the Kali Linux penetration testing distribution
has been released. This
release includes an update to GNOME
46
, a high-resolution (HiDPI) mode for Xfce, as well as a number
of new packages such as the AutoRecon network
reconnaissance tool, pspy command-line utility for
snooping on Linux processes, and SploitScan tool for
fetching and displaying CVE information. Kali Linux is based on Debian
testing, and 2024.2 incorporates Debian’s work to transition to 64-bit
time_t
to avoid year 2038 problems. Users with existing Kali
systems should be sure to follow the documentation
when upgrading.

Our brand-new cohort of AWS Heroes has arrived – June 2024

Post Syndicated from Taylor Jacobsen original https://aws.amazon.com/blogs/aws/our-brand-new-cohort-of-aws-heroes-has-arrived-june-2024/

The vibrant AWS community is made up of millions of builders worldwide. Within this global audience, there are technical enthusiasts who are going above and beyond to solve problems and generously share their learnings and best practices to empower others—the AWS Heroes. These inspirational leaders make significant contributions, and the AWS Heroes program is our way of recognizing and highlighting their impactful efforts.

Please join us in celebrating our newest group of AWS Heroes!

Arshad Zackeriya – Wellington, New Zealand

Community Hero Arshad Zackeriya is a Senior Engineer at Xero, specializing in empowering organizations to deliver software at high velocity. He is well-known in the community as “Zack,” and his expertise primarily centers around Amazon EKS and developer tools. Zack is also a public speaker, and serves as one of the co-organizers and leaders for the Wellington Chapter in the AWS User Group Aotearoa New Zealand. Additionally, he was an AWS New Voices Coach and was an AWS Community Builder for five consecutive years, earning recognition as a nominee for the AWS Community Builder of the Year for 2022 and 2023 in the APJ region.

Julia Furst Morgado – New York, USA

Container Hero Julia Furst Morgado is a Global Technologist on the Product Strategy team in the Office of the CTO at Veeam Software. She is committed to diversity and inclusion, and her passion lies in making Cloud Native technologies and DevOps best practices easier to understand by sharing her knowledge. Julia excels in evangelizing and creating engaging content focused on Amazon EKS, and presenting at major events about Amazon EKS Blueprints and Amazon EKS security. Additionally, she co-organizes the AWS Community Day New York, Kubernetes Community Days, and the AWS User Group Lisbon – Women in Tech chapter, fostering vibrant collaboration and learning opportunities.

Paloma Lataliza – Belo Horizonte, Brazil

Community Hero Paloma Lataliza is a Cloud Engineer with over six years of experience. She has a bachelor’s degree in Computer Science, specialized in Cloud Computing, is an enthusiast of container technologies and passionate about technology and sharing knowledge. Paloma is a leader of the AWS User Group Minas Gerais, and she is dedicated to mentoring women by providing a supportive network and offering them free classes to make tech more accessible. This is further demonstrated as the organizer of the AWSome Women Community Summit Brazil, and founder of the Mulheres na Nuvem Minas Gerais (Women in the Cloud Minas Gerais) project. Previously, she was an AWS Community Builder, producing technical content, speaking at Cloud and DevOps events, and mentoring those eager to deepen their technical skills.

Shaoyi Li – Shenzhen, China

Community Hero Shaoyi Li is a Lead Cloud Engineer focusing on cybersecurity and generative AI, advocating for cloud generative AI security and governance solutions to help the community build secure, compliant, and responsible generative AI applications. He is a regular speaker at AWS events, such as AWS Summits, AWS Community Days, and AWS User Group Meetups. Shaoyi also shares his insights into AWS technologies through various channels, including AWS case studies, AWS blogs, AWS WeChat channels, community.aws, and on his social networks.

Vishal Alhat – Pune, India

Community Hero Vishal Alhat is a Senior Software Engineer at Forcepoint, a leading cybersecurity company, where he leverages his 9+ years of experience to play a key role in cloud-based deployments. He focuses on cloud security and DevOps using AWS, implementing DevOps tools, AWS services, and best practices to automate deployments and ensure consistency across Forcepoint’s cloud infrastructure. Vishal is passionate about sharing his knowledge and was selected as the AWS Community Builder of the Year for the APJ region, which is a testament to his dedication. Furthermore, he is the AWS User Group Pune leader, and regularly speaks at conferences, meetups, AWS Community Days, and AWS Summits worldwide.

Learn More

Please visit the AWS Heroes website if you’d like to learn more about the AWS Heroes program or to connect with a Hero near you.

Taylor

[$] Rethinking the PostgreSQL CommitFest model

Post Syndicated from jzb original https://lwn.net/Articles/976793/

Many years ago, the PostgreSQL project started holding regular CommitFests to
help tackle the work of reviewing and committing patches in a more
organized fashion. That has served the project well, but some in
the project are concerned that CommitFests are no longer meeting
the needs of PostgreSQL or its contributors. A lengthy discussion on the
pgsql-hackers mailing list turned up a number of complaints, a few
suggestions for improvement, but little consensus or momentum toward
a solution.

[$] Removing GFP_NOFS

Post Syndicated from jake original https://lwn.net/Articles/976355/

The GFP_NOFS flag is meant for kernel memory allocations that
should not cause a call into the filesystems to reclaim memory because there are
already locks held that can potentially cause a deadlock. The “scoped
allocation” API is a better choice for filesystems to indicate that they
are holding a lock, so GFP_NOFS has long been on the chopping block, though
progress has been slow. In a filesystem-track session at
the 2024 Linux Storage,
Filesystem, Memory Management, and BPF Summit
, Matthew Wilcox wanted to
discuss how to move kernel filesystems away from the flag with the eventual
goal of removing it completely.

Genomics workflows, Part 7: analyze public RNA sequencing data using AWS HealthOmics

Post Syndicated from Rostislav Markov original https://aws.amazon.com/blogs/architecture/genomics-workflows-part-7-analyze-public-rna-sequencing-data-using-aws-healthomics/

Genomics workflows process petabyte-scale datasets on large pools of compute resources. In this blog post, we discuss how life science organizations can use Amazon Web Services (AWS) to run transcriptomic sequencing data analysis using public datasets. This allows users to quickly test research hypotheses against larger datasets in support of clinical diagnostics. We use AWS HealthOmics and AWS Step Functions to orchestrate the entire lifecycle of preparing and analyzing sequence data and remove the associated heavy lifting.

Use case

In genomics, transcription relates to the process of making a ribonucleic acid (RNA) copy from a gene’s deoxyribonucleic acid (DNA). Usually, RNA is single-stranded, although some RNA viruses are double-stranded. With RNA sequencing (RNA-Seq), scientists isolate the RNA, prepare an RNA library, and use next-generation sequencing technology to decode it. Organizations around the world use RNA-Seq to support clinical diagnostics.

In our use case, life science research teams use workflows written in Nextflow to process RNA-Seq datasets in FASTQ file format. Following their initial RNA-Seq studies on internal datasets, scientists can extend their insights by using public datasets. For example, the Gene Expression Omnibus (GEO) functional genomics data repository is hosted by the National Center for Biotechnology Information (NCBI) and offers multiple download options and formats. Scientists can download datasets in FASTQ format from GEO File Transfer Protocol (FTP) and compress them into the .gz format before further analysis.

Scaling and automating the data ingestion can be challenging. For example, scientists might need to do the following:

  • Manually download FASTQ files and invoke their analysis pipelines
  • Monitor the workflow runs, which can span hours, days, or weeks
  • Manage the infrastructure for performance and scale

This blog post presents a solution that removes this undifferentiated heavy lifting.

Prerequisites

To build this solution, you must be analyzing transcriptomic sequencing data with the Nextflow workflow system and make use of GEO FASTQ datasets. In addition, you must do the following:

  1. Create three Amazon Simple Storage Service (Amazon S3) buckets with the following purposes:
    • Uploaded GEO Accession IDs (GEO IDs)
    • Ingested FASTQ datasets
    • RNA-Seq output files
  2. Create one Amazon DynamoDB table to track the status of data ingestion. This helps with checkpointing and avoids repetitive ingestion jobs so that you can keep data ingestion cost to a minimum.

Solution overview

Using AWS, you can automate the entire RNA-Seq Nextflow pipeline. Users only need to provide the GEO IDs, then the pipeline ingests the corresponding FASTQ sample files and performs the subsequent data analysis.

Our solution, shown in Figure 1, uses a combination of AWS HealthOmics and AWS Step Functions. HealthOmics manages the compute, scalability, scheduling, and orchestration required for processing large RNA-Seq datasets. This helps scientists focus on writing their pipelines in Nextflow while AWS takes care of the underlying infrastructure. Step Functions adds reliability to the workflow from dataset ingestion to output archival. Automating the entire workflow also helps with tracing specific invocations and troubleshooting errors.

This figure visualizes the AWS services involved in each processing step, starting with users uploading CSV files with GEO metadata to Amazon S3, and concluding with AWS HealthOmics performing the RNA-Seq analysis and putting the output data on Amazon S3.

Figure 1. RNA sequencing using HealthOmics

Our solution includes the following:

  1. The scientist creates and uploads a CSV file to the GEO metadata S3 bucket. The CSV file includes a reference to the specific GEO ID that is ingested. An Amazon S3 Event Notification configured on s3:ObjectCreated events (in this case, the CSV file upload) invokes an AWS Lambda function.
  2. The Lambda function first extracts the corresponding Sequence Read Run (SRR) IDs of the GEO ID. Next, it starts a Step Functions state machine with the following input parameters: the SRR IDs, species of the samples, and GEO ID. The state machine uses an AWS Batch job queue for parallel ingestion.
  3. The Lambda function writes the following metadata to a DynamoDB table for future reference:
    • Ingested GEO ID and corresponding list of SRR IDs
    • Amazon S3 output paths to the ingested FASTQ files
    • Overall workflow status
    • Ingested species
  4. Upon ingestion completion, the state machine puts the RNA-Seq sample sheet into the FASTQ S3 bucket. This invokes a Lambda function, which launches the RNA-Seq analysis workflow with the following input parameters:
    • Sample sheet
    • GEO ID
    • Other relevant metadata
  5. Our RNA-Seq data analysis is run with HealthOmics and the associated sequence store. We use Step Functions to launch this workflow and ingest the relevant files to the sequence store.
  6. Upon workflow completion, HealthOmics writes the output data (BAM files) to the output S3 bucket.

Implementation considerations

Dataset preparation

The Step Functions state machine orchestrates the ingestion of FASTQ files through the following steps:

  1. The state machine invokes the Map state in Step Functions that uses dynamic parallelism for increased scale, with the SRR IDs array as input. You can now launch multiple AWS Batch jobs in parallel to ingest the FASTQ files that correspond to the SRR ID input.
  2. The state machine checks our ingestion DynamoDB table to see if the corresponding SRR ID has already been processed and has ingested the corresponding FASTQ files. If the SRR ID ingested the files, the state machine writes the sample sheet to the FASTQ S3 bucket and terminates successfully.
  3. The state machine uses the NCBI-provided sra-tools Docker container and fasterq-dump command to ingest the FASTQ files. The state machine generates the set of ingestion commands and starts the AWS Batch job. The ingestion commands are a set of shell commands that interact with NCBI for downloading FASTQ files. These commands compress the files with pigz, and then uploads them to an S3 bucket.
  4. The state machine updates the DynamoDB table with the ingestion status.
    1. If the ingestion is successful, then the state machine continues to step 5.
    2. If the ingestion isn’t successful, the state machine writes a message to Amazon Simple Notification Service (Amazon SNS) to notify scientists of the failure.
  5. A Lambda function generates the RNA-Seq sample sheet with the combined samples to analyze. This sample sheet is a CSV file containing:
    1. The paths to the ingested FASTQ files.
    2. The names of each corresponding SRR ID as input to the RNA-Seq workflow.
  6. The state machine notifies that the ingestion job is complete by publishing a message to an Amazon SNS topic before terminating itself.

Figure 2 provides a detailed overview of the state machine.

This Map state definition in AWS Step Functions visualizes the aforementioned steps for FASTQ file ingestion including orchestration of the associated AWS Batch job.

Figure 2. RNA sequencing data ingestion

Dataset analysis

A Lambda function divides the RNA-Seq sample sheet in compliance with the Step Functions service quota. This enables parallel processing using a Map state.

Our transcriptomic analysis workflow does the following:

  1. Checks if samples are single-end (one FASTQ file per sample) or paired-end (two sets of FASTQ files per sample).
  2. Ingests the appropriate set of FASTQ files into the HealthOmics sequence store.
  3. Monitors the status until all files are imported.

In parallel, a Lambda function initiates the HealthOmics RNA-Seq workflow.

Upon successful completion, HealthOmics stores the output data in Amazon S3. Finally, our state machine imports the output BAM files into the HealthOmics sequence store for future use.

Figure 3 provides a detailed overview of our state machine.

This AWS Step Functions workflow visualizes the aforementioned steps for data analysis including orchestration of the associated AWS HealthOmics workflow and FASTQ file ingestion into the HealthOmics Sequence Store.

Figure 3. RNA sequencing workflow

Cleanup (optional)

Delete all AWS resources that you no longer want to maintain.

Conclusion

HealthOmics removes the heavy lifting associated with gaining insights from genomics, transcriptomics, and other omics data. We used RNA-Seq analysis to showcase an example scientific workflow that can benefit from HealthOmics. When using HealthOmics in combination with Step Functions, scientists can automate the entire workflow from initial dataset preparation to archival. To learn more, we encourage you to explore our HealthOmics tutorials on GitHub.

Related information

По буквите: Остър, Грийн, Бойков

Post Syndicated from Зорница Христова original https://www.toest.bg/po-bukvite-auster-greene-boykov/

„Сънсет парк“ от Пол Остър

По буквите: Остър, Грийн, Бойков

превод от английски Иглика Василева, София: изд. „Колибри“, 2024

Остъровият „Сънсет парк“ започва с описание на къщи, изоставени от собствениците си и предназначени за събаряне. Началото хваща веднага – и как не, като темата е очевидно важна и централна за Остър, автобиографична, както става ясно от „Изобретяване на самотата“, в която писателят вижда точно такъв вход към живота на баща си през внезапно опустялата му, нуждаеща се от опразване къща. Опиши ми какво притежаваш, и ще ти кажа кой си.

По буквите: Остър, Грийн, Бойков

Изоставената къща е и образ, в който свободно се нанасят значения. Първият кандидат е самият живот – всичко, дето се трупа и събира, докато живеем, биографични факти и социални условности. Точно това, което героите на Остър обожават да изоставят, за да преживеят нашата фантазия какво би се случило, ако можехме да се изхлузим от собствената си роля и да живеем някак иначе. Всъщност

типичният Остъров герой е именно такъв – образован и многообещаващ, но избрал да скита извън собствения си път.

Избрал или принуден – в зависимост от книгата, но почти задължително живеещ поне два живота.

Майлс от „Сънсет парк“ не прави изключение. Той напуска внезапно живота си, средата си, семейството си, изчезва… и се преражда след низ от низови работни места, като част от екип за почистване на изоставени къщи. Докато обстоятелствата не му налагат сам да се нанесе в такава.

Но защо? Според Остър от „Сънсет парк“

има събития, които те правят непригоден да бъдеш себе си.

За Майлс това е травмата от смърт, за която той не е сигурен дали е виновен; за героите от „Най-хубавите години от нашия живот“ (филм, към който има повече от обстойна алюзия) това е войната. Една от героините на книгата – Алис, пише дисертация по темата – и обстойно подчертава как тази непригодност е потвърдена и от реалния опит на нейното и други семейства. Фактът на войната – опитът на военното време – прави тези мъже неприспособими към реалния живот. Нещо в тяхната постройка е рухнало.

„21 разказа“ от Греъм Грийн

превод от английски Иглика Василева, София: изд. „Кръг“, 2024

Неочакван паралел: първият от „21 разказа“ на Греъм Грийн е именно за това как непосредствено след войната кварталните вагабонти разрушават една пощадена по чудо къща, строена от Кристофър Рен (създателя на катедралата „Сейнт Пол“) – унищожават я буквално докато стопанинът ѝ стои заключен в градинската тоалетна, унищожават я, защото е красива. И това по някакъв начин е непоносимо.

Потресаващ, чудовищен разказ, който нищо не обяснява, никъде не спекулира, не казва дали гамените са заразени с някаква военновременна екзалтация на разрушението, или са травмирани, или пък обратно, именно войната е логичен завършек на вродения човешки инстинкт да унищожава. Греъм Грийн просто разказва какво правят, и ти се присвиваш, защото знаеш, че е точно така.

По буквите: Остър, Грийн, Бойков

Вторият вариант е подвид на първия. Къщата е тялото, а напусналият е човек – душата; в този смисъл споменатото по-горе прераждане следва да се разбира почти буквално. Опразненото от живота тяло се появява съвсем ясно в „Сънсет парк“ – най-вече в историята за женкаря, тръгнал с яхта и три млади „асистентки“, но предал неочаквано Богу дух, а те се носили из морето дни наред, тъй като не можели да управляват яхтата. Заедно с тях се носело и неговото тяло – ужасна история, знам, но повод за Остър да преподчертае какво има предвид, когато говори за телесност. Заедно с изброяването на мъртъвците, срещнати по пътя на един от героите.

Ако приемем тази линия за възможна, то

скиталчествата на Остъровите персонажи могат да бъдат описани като скиталчества на душата.

Впрочем пак неочакван паралел с Греъм Грийн – в един от разказите жертва на убийство влиза в треторазредно кино и коментира пред раздразнения си съсед по стол, че убийството на екрана е съвсем неубедително. Или онзи, в който един странно пресипнал лектор се опитва да говори пред одосадената публика за… относителната стойност на материята и духа.

„Верно на оригиналот“ от Николай Бойков

Пловдив: изд. „Жанет 45“, 2024

И Остър, и Грийн очевидно се вълнуват сериозно от представата за втори живот. А пък в книгата на Николай Бойков два живота живее езикът. Тя е

написана едновременно на български и македонски

и в нея разказвачът, двойник или несъщ близнак на своя автор, пребивава в съседната ни страна, за да превежда от македонски, двойник или несъщ близнак на родния му език. Първото нещо, което вижда читателят, е една комедия от грешки, защото каква е работата на близнаците, ако питате Шекспир, а и на двойниците, ако питаме Борхес, освен да не си съвпадат, да са различни в ключови („Дванайсета нощ“) или не толкова ключови („Двамата веронци“) пунктове? За смях на публиката и тотално объркване на всички, които настояват, че двамата са един и същ човек.

Ето, вижте какво може да настане, ако някой обърка двамата Дромио, тоест българския и македонския: да сложи лук вместо чесън, защото на македонски думата е такава, а нашият „лук“ е „кромид“; да сложи целина („зелер“) вместо зеле, да опипа някого, вместо просто да го потърси („бара“)…

Примерите ви се струват твърде ежедневни? Такива са. Героят разказвач на Николай Бойков не се интересува от романтичните сюблимности, неговият избор е обикновеното, простодушно битие на езика, уморил се да бъде идеологическо оръжие,

езикът, който се изхлузва от своето призвание да бъде „език свещен на моите деди“ и заживява по пазарите, при ваксаджиите, в задните дворове,

езикът като Остъров герой, който понякога издава своето амбициозно минало (омайното, сладко… пеене на мюезина), езикът, който просто обръща гръб на жадните за дуели реплики („Вие, българите, сте македонци, но не искате да си го признаете“) и се заема със своите дневни задачи. Да свари супа. Да преведе стихотворение. Да разбере кой е. Докато реже думите като лук.

По буквите: Остър, Грийн, Бойков

Николай Бойков е чудесен преводач от унгарски; може би най-отдалеченият от българския език в Европа. Захващането с македонския би било изненадващо, ако Бойков освен това не беше и поет, усетил именно

поетичния потенциал в разместените значения между двата езика.

Застанали до българските, македонските думи удрят едновременно два тона – на речниковото си значение, пояснено на момента, и на своя отглас в българския, днешен и някогашен. И читателят хем чува едно смислово многогласие, хем се вслушва в своя някогашен език. Този, разбира се, отпреди травмата, отпреди войните, когато близостта е била по-възможна.

В книгата на Николай Бойков това предвоенно езиково богатство се връща,

оглежда се объркано като непригоден за мирен живот ветеран и се чуди има ли място за него. Струва ми се знаково, че на перона го посреща не друг, а Иглика Василева, редактор на тази и преводач на другите две книги в днешния обзор, но и майстор на другите наши възможни езици. И това е по-убедителен хепиенд и от насвятканите решения на казусите на женските персонажи в „Сънсет парк“, и от завръщането на блудната дъщеря в „Разходка сред природата“. Българският език, казва ни този завършек, може да опознае себе си в цялата си сложност и несигурност, в пропускливите си и болезнени граници, за да стане все по-жизнен и силен.

„Сънсет парк“ не е от каноничните романи на Остър и е видно защо – второстепенните му персонажи не са особено убедителни, хеле женските – разказвачът уж се опитва да погледне през техните очи, но вижда само собствения си мъжки поглед и предполагаемия им стремеж към него. Развръзката пък е хем логична, хем озадачаваща – логична в първия, буквален план на самонастаняването в изоставена къща, озадачаваща спрямо очакванията за втори, символен план.

Това обаче не пречи на читателя да се самонастани в някои от редовете на тази книга, които предлагат дом и подслон за много конкретни безпризорни емоции. Разказите на Греъм Грийн са добре окръглени, завършени в своята фабулна цялост – някои така се търкулват, други обаче остават – в тях има нещо потрошено, болка, която стърчи над симетрията.

С какво затваря своята дневникова одисея Николай Бойков? С връщане в София, разбира се, с огледалната необходимост от премълчаване, когато някой ти каже „А, тя, Македония, е наша“. И с финален пасаж, който плува под повърхността на Дунава, издиша под водата, понякога остава със затворени очи и над повърхността, защото символните значения са си символни значения, а което има да тече, си го прави съвсем буквално. И все пак книгата те оставя и под, и над повърхността на езика:

ще ми се иска да кажа лесна работа, ще кажа лека работа.

А на мен ще ми се иска да не изоставяме още къщата на езика, макар да я обитаваме да не е нито лесно, нито леко.


Активните дарители на „Тоест“ получават постоянна отстъпка в размер на 20% от коричната цена на всички заглавия от каталозите на „Колибри“, „Кръг“, „Жанет 45“ и няколко други български издателства в рамките на партньорската програма Читателски клуб „Тоест“. За повече информация прочетете на toest.bg/club.

В емблематичната си колонка, започната още през 2008 г. във в-к „Култура“, Марин Бодаков ни представяше нови литературни заглавия и питаше с какво точно тези книги ни променят. Вярваме, че е важно тази рубрика да продължи. От човек до човек, с нова книга в ръка.

За практиките на отказа и изкуството на живота. Разговор с Крис Клийв

Post Syndicated from original https://www.toest.bg/za-praktikite-na-otkaza-i-izkustvoto-na-zhivota/

За практиките на отказа и изкуството на живота. Разговор с Крис Клийв

Крис Клийв (р. 1973) е английски писател, автор на разкази и на четири романа, всички от които са издадени у нас от издателство ICU – „Възпламеняване“, „Другата ръка“, „На смелите се прощава“, а сега и „Злато“. Възпитаник е на престижния Бейлиол Колидж в Оксфорд, където завършва психология, която също практикува. От 2007 до 2010 г. списва рубрика в „Гардиън“, посветена на децата и родителството. Живее в Лондон със съпругата си и трите им деца.

В първото ни интервю преди няколко години говорихме за предстоящия тогава превод на „Злато“ (последния Ви роман, преведен от Невена Дишлиева и Велин Кръстев). Тогава казахте, че това е Вашата най-позитивна, най-малко изпълнена с трупове книга. Сега в увода към нея добавяте и че е единствената, в която някой наистина печели в живота. Това означава ли, че не сте привърженик на хепиенда? Как гледате на финалите – различават ли се в живота и в литературата?

Страхотен въпрос! Мисля, че литературата започва точно както историите започват в реалния живот: двама души се срещат във влака или някой бяга от затвора – това са естествени и нормални начала. Финалите обаче са толкова различни! Романите трябва все някога да свършат, докато историите в истинския живот никога не приключват. Те се раздвояват ли, раздвояват, подобно на клони на дърво. Това е всъщност повече от липсата на свършек. Историите също така променят и значението си, както когато самите вие станете нечии предци и житейската ви история се превърне в историята на произхода на тези след вас. Така че, действително, да започнеш един роман е много лесно и естествено, но да го завършиш си е истинско изкуство. Много ми е трудно да пиша финали. Не харесвам книги, в които всичко е завършено, приключено, обяснено. Обичам книги, които ме оставят с неясно усещане – нещо като камъче в обувката, нещо, за което да мисля през следващите няколко дни. Така че се опитвам и аз да пиша по този начин – не твърде весело и не прекалено тъжно. Просто да оставя нещо интересно, с което съзнанието на читателя да е ангажирано за известно време.

Писал сте „Злато“ през 2010 г., две години преди Летните олимпийски игри в Лондон. „Възпламеняване“ е написана през 2005 г., когато тероризмът все още беше водеща новина (спомняме си, че публикуването на романа съвпадна с атентатите в Лондон). „На смелите се прощава“ пък е за Втората световна война. Съзнателно ли избирате конкретни, но мащабни събития за сюжетите си и по-лесно ли е да разгръщате обикновените човешки драми на техен фон?

Да, обичам да показвам обикновените животи в рамките на мащабни събития. Намирам художествените романи за интересни, защото ни позволяват да видим несъвършенствата и изключенията в чистите наративи на голямата история. Хората са толкова интересни същества – те правят наистина странни неща, решават дилемата и драмата на съществуването по напълно неочаквани начини. Вярвам, че животът на обикновения човек е значим и очарователен. Като писател използвам големите и силни наративи, които всички познаваме (тероризъм, принудителна миграция, спорт, война), под формата на рамка, съхраняваща по-малките и деликатни човешки истории. По същия начин може да използваме колчета, за да отгледаме грах в градината.

Говорейки за Лондон във въведението към „Злато“, споменавате, че всичките Ви книги са по един или друг начин свързани с града, който противопоставяте на други, доста по user-friendly места. И все пак бихте ли се съгласил, че едно такова огромно, космополитно, изобилно откъм разнообразие, култури и съдби пространство Ви дава по-голямо предимство в прозата, отколкото, да речем, ако като автор работите с едно далеч по-скромно, регионално, непознато място?

Трябва да бъда честен и да кажа, че не знам. Не знам какво би било да напиша книга, чието действие се развива на място, на което не живея. Странно е, нали? Мога да си представя да пиша от гледната точка на най-различни хора, но не мога да си представя история, която не е свързана с родния ми град. Не знам какво говори това за мен като писател. Може би чувствам, че всички истории са наистина за места и времена, и за начина, по който човешкото сърце се проявява в тях. Като писател трябва да усетя мястото буквално в костите си, за да се почувствам уверен, че съм способен да зная как би се чувствало едно или друго човешко сърце на него.

Треньорът Том, един от героите ви в „Злато“, казва: „На моята възраст забележителното събитие не е онова, което те плаши.“ Какво Ви плаши лично Вас днес, на сегашната Ви възраст (може би в сравнение с – да речем – времето, когато написахте книгата)?

Отново страхотен въпрос! Когато написах книгата, не се страхувах много. Бях на около четирийсет, възраст, на която всичко ми изглеждаше възможно. Сега съм на петдесет и много добре осъзнавам, че вече не притежавам това безстрашие. Това, което ме плаши сега, е, че ще се откажем един от друг, че ще разлюбим собствената си човечност. Живеем в епоха на силно емоционално насилие и разединение. Мисля, че сме изтощени един от друг. Мисля, че на хората им е трудно да оценят и да се зарадват на големите, оригиналните, изобретателните сърца, които са ни дадени, защото толкова често тези сърца са ни разочаровали. Всъщност това е причината да уча за психотерапевт и сега да практикувам тази професия, докато същевременно пиша. Интересувам се от проекти, които ни помагат да обикнем отново човека и човешкото – и у себе си, и у другите.

В книгата става дума за два вида успех, пресъздадени през образите и съдбите на двете Ви героини. Ако всичко в този живот минава за някаква форма на постижение или триумф (като това да се излекуваш например), то какво тогава отличава спорта от другите ни житейски победи? Твърде буквален ли е успехът в него?

Спортът е интересен, защото именно абсолютното изискване за абсолютен успех е онова, което прави другите успехи възможни. Добър пример е миналогодишния „Тур дьо Франс“ – имам предвид съперничеството между Йонас Вингегор и Тадей Погакар. Те се отнасяха един към друг безкрайно учтиво и с уважение. Това беше успех на човещината, на който наистина се радвах повече, отколкото на успеха в реалното състезание. Имаше нужда обаче от изискването за спортна победа, защото именно то направи забележителен начина, по който двамата се държаха един с друг. Лесно е да се отнасяш към някого на опашката пред автобуса с любов и уважение – много по-трудно е да направиш същото за най-заклетия си съперник. Ето как спортът ни дава изключителни уроци какви можем да бъдем като човешки същества. Друг чудесен пример са маратоните „Баркли“ – състезание, което е почти невъзможно да бъде завършено. Тази година невероятната Джасмин Парис стана първата жена в историята, постигнала това. Отново подчертавам, че не се интересувам толкова от маратоните сами по себе си, но спортният успех трябва да бъде абсолютно безспорен именно за да направи човешкия успех толкова значим.

Смятате ли, че има момент, в който човек трябва да отстъпи и може би дори да се откаже, имайки предвид, че една от днешните мантри е Never Give Up, Never Give In – никога, нито за миг да не се предаваме?

Абсолютно! Отказването е това, което трябва да практикуваме най-вече. Повечето ни идеи, планове и проекти, оказва се, всъщност не струват и няма смисъл да жертваме живота си на олтара на ината. Изкуството на живота, на това да бъдем хора, е в това да открием малкото неща, от които никога няма да се откажем.

А как може успехът да ни тегли надолу?

Наистина е важно да извоюваме победи и да бележим успехи от време на време. Имаме нужда от доказателства за собствената ни компетентност и умения. В противен случай дори най-смелите сред нас губят кураж и храброст. Но при успеха има два проблема: единият е, че не ни учи на кой знае какво, а другият – че започваме да се страхуваме да не го загубим. Твърде многото успехи ни карат да се боим да рискуваме онова, което вече имаме; твърде многото провали ни карат да губим смелост. Между тези два вида страх ние трябва да преценяваме намеренията и проектите си. Добре е да успяваме и да печелим в около половината от времето, поне така мисля аз.

Една от героините Ви често е на второ място заради своята емпатия и доброта. Възможно ли е любовта да ни прави неспособни да се състезаваме, да ни прави „губещи“?

Не мисля така. Любовта може да ни направи и свирепи, и ожесточени. Вярно е, че спортистите трябва да отнемат победата от съперниците си. Но те също така дават нещо. На спорта, на зрителите, на своите колеги състезатели. Да се ​​състезаваш с цялото си сърце е дар, а този дар може да бъде даден с любов. Вярно е обаче, че има много форми на любов. Мисля, че най-деликатните моменти в спорта са тези, в които любовта към състезанието се балансира с други видове любов.

В предговора разказвате донякъде забавната история за една двойка, която си разпъва сгъваеми столчета на улицата и седи на тях, докато чака да се приближат размирни и опасни улични протестиращи. Какъв е препоръчителният подход в очакване на насилието, на неконтролируемото, на неизбежното?

Любов един към друг – във всяка минута, която ни остава.

Спомняте си лозунга от Втората световна война Keep Calm and Carry On, който споменавате и в „На смелите се прощава“, нали? Възможно ли е забравянето да е част от инстинкта ни да продължим? Или паметта е обратното – необходим пътен знак в процеса на продължаването?

Мисля, че паметта е дълбока форма на любов. Ето защо тоталитарните режими и упражняващите газлайтинг винаги се опитват да я изтрият или контролират. Паметта не е запис на някакви неопровержими факти, подобно на видеозаписа. Тя е непрекъснато променяща се история, която си разказваме, за нещата, имали значение за нас някога. За да продължим, трябва да сме в любяща връзка точно с тези неща – които имат значение за нас. Мисля, че нашата памет и нашата духовност са изключително тясно свързани.

Ще се срещате с читатели в няколко български града. Смятате ли, че публиките ви по света се различават значително една от друга и Ви четат по различен начин, или… романите ви предлагат нещо като есперанто, универсален език?

Мисля, че историите ми са доста универсални и общочовешки, но също така могат и да разделят. Открих, че хората или ги обичат, или ги мразят, така че дори двама души в един и същи град могат да ги прочетат по много различен начин! Изключително съм благодарен за предстоящото ми посещение в България и наистина очаквам с нетърпение да се срещна с читателите. Благодаря ви още веднъж за това, че прочетохте моите книги, и за невероятните въпроси! Това значи страшно много за мен.


Може да видите Крис Клийв на живо в България и да си вземете автограф на следните дати и места:

11 юни, 18.30 ч.
По покана на „Книжарница в куфар“
Стара Загора, РБ „Захарий Княжески“
Крис Клийв заедно с Иво Иванов и Камен Алипиев – Кедъра

12 юни, 19.30 ч.
По покана на „Пловдив чете“
Пловдив, Stage Park, Младежки хълм
Крис Клийв заедно с Иво Иванов и Камен Алипиев – Кедъра

14 юни, 20.00 ч.
По покана на „Литературни срещи“ 
София, Борисова градина, One More Park Bar
Крис Клийв в разговор с Лора Ненковска

The state of SourceHut

Post Syndicated from jzb original https://lwn.net/Articles/977174/

Drew DeVault has published
an update about the state of the SourceHut software development
platform and its plans for the coming months. This is the first update
since the January post-mortem
following a distributed denial-of-service (DDoS) attack that resulted
in a prolonged
outage
:

As you can imagine, it has been a stressful time for us. However, I
wish to stress that everything we’ve been dealing with is planned for
in our models, both technical and financial. There is no existential
threat to SourceHut. Nevertheless, we are grateful for your patience
and support.

[…] We have been focusing on two things this year: provisioning
and managing our infrastructure and getting as much rest as
possible. Our situation has calmed down, and while we still have a lot
of loose ends to attend to I’m happy to say that we’re resuming a
sense of normalcy here and preparing to resume our work on the
features you need.

[$] Comparing BPF performance between implementations

Post Syndicated from daroc original https://lwn.net/Articles/976317/

Alan Jowett returned for a second remote presentation at the 2024
Linux Storage,
Filesystem, Memory Management, and BPF Summit
to compare the performance of
different BPF runtimes. He showed the results of the MIT-licensed BPF

microbenchmark suite
he has been working on.
The benchmark suite does not yet provide a good direct comparison between all
platforms, so the results should be
taken with a grain of salt. They do
seem to indicate that there is some significant variation between
implementations, especially for different types of BPF maps.

Security updates for Wednesday

Post Syndicated from jzb original https://lwn.net/Articles/977233/

Security updates have been issued by Fedora (deepin-qt5integration, deepin-qt5platform-plugins, dotnet8.0, dwayland, fcitx-qt5, fcitx5-qt, gammaray, kddockwidgets, keepassxc, kf5-akonadi-server, kf5-frameworkintegration, kf5-kwayland, plasma-integration, python-qt5, qadwaitadecorations, qgnomeplatform, qt5, qt5-qt3d, qt5-qtbase, qt5-qtcharts, qt5-qtconnectivity, qt5-qtdatavis3d, qt5-qtdeclarative, qt5-qtdoc, qt5-qtgamepad, qt5-qtgraphicaleffects, qt5-qtimageformats, qt5-qtlocation, qt5-qtmultimedia, qt5-qtnetworkauth, qt5-qtquickcontrols, qt5-qtquickcontrols2, qt5-qtremoteobjects, qt5-qtscript, qt5-qtscxml, qt5-qtsensors, qt5-qtserialbus, qt5-qtserialport, and qt5-qtspeech), Oracle (389-ds-base and ruby:3.1), Red Hat (389-ds-base, glibc, and kernel), SUSE (python-PyMySQL), and Ubuntu (libarchive).

European Union elections 2024: securing democratic processes in light of new threats

Post Syndicated from Petra Arts original https://blog.cloudflare.com/eu-elections-2024


Between June 6-9 2024, hundreds of millions of European Union (EU) citizens will be voting to elect their members of the European Parliament (MEPs). The European elections, held every five years, are one of the biggest democratic exercises in the world. Voters in each of the 27 EU countries will elect a different number of MEPs according to population size and based on a proportional system, and the 720 newly elected MEPs will take their seats in July. All EU member states have different election processes, institutions, and methods, and the security risks are significant, both in terms of cyber attacks but also with regard to influencing voters through disinformation. This makes the task of securing the European elections a particularly complex one, which requires collaboration between many different institutions and stakeholders, including the private sector. Cloudflare is well positioned to support governments and political campaigns in managing large-scale cyber attacks. We have also helped election entities around the world by providing tools and expertise to protect them from attack. Moreover, through the Athenian Project, Cloudflare works with state and local governments in the United States, as well as governments around the world through international nonprofit partners, to provide Cloudflare’s highest level of protection for free to ensure that constituents have access to reliable election information.

Election security in 2024: dealing with new and upcoming threats

Ensuring a free, fair, and open electoral process and securing candidate campaigns is understandably a top priority for the EU institutions, as well as for national governments and cybersecurity agencies across the EU. European authorities have already taken a number of measures to ensure the elections are well-protected. Efforts to coordinate election security measures amongst the EU countries are led by the NIS Cooperation Group, with the support of the EU Agency for Cybersecurity (ENISA), the European Commission, and the European External Action Service (the EU’s foreign service).

The NIS Cooperation Group recently issued an updated Compendium on safeguarding the elections amidst cybersecurity challenges, noting that “since the last EU elections in 2019, the elections threat landscape has evolved significantly”. Governments note in particular the impact of Artificial Intelligence (AI), including deep fakes, but also the increased sophistication of threat actors and the trend of “hacktivists-for-hire” as new risks that need to be taken into account. European institutions also highlight today’s geopolitical context, with conflicts in Ukraine and the Middle East impacting cyber threats and foreign influence campaigns in Europe. The European External Action Service analyzed cases of FIMI (Foreign Information Manipulation and Interference) during recent national elections in Spain and Poland, and put together suggested plans for governments on how to respond to the various stages of those FIMI campaigns originating from foreign (e.g. non-EU) actors. EU High Representative for Foreign Affairs Josep Borrell said in a recent blog post that protecting the election process and more broadly European public debate from malign foreign actors “is a security challenge, which we need to tackle seriously”.

Some national governments have also warned against the risks of so-called hybrid threats, whereby foreign governments deploy various methods to exert influence on other states, including disinformation campaigns, cyberattacks and espionage. Germany’s Federal Ministry of the Interior notes that “elections are often a catalyst for increased levels of illegitimate activity by foreign governments, because stoking fear and spreading hate can contribute to the polarization of society, influencing voting habits. (…) We must make a determined effort to counter these threats.”

EU readiness for election season

As part of national and EU-level coordination amongst governments and agencies to prepare to mitigate threats and risks to the European elections, ENISA supports national governments’ measures to ensure the elections will be secure, including by organizing a cybersecurity exercise to test the various crisis plans and responses to potential attacks by national and EU level agencies and governments. ENISA has also put together a checklist for authorities in order to raise awareness on specific risks and threats to the election process.

The European Union has also prepared for other phenomena endangering the security and integrity of the election process, including the spread of disinformation via online platforms. For example, the European Commission recently issued strict guidelines for “Very Large Online Platforms” (VLOPs) and “Very Large Search Engines” (VLOSEs) under the EU Digital Services Act on measures to mitigate systemic risks online that may impact the integrity of elections. These large companies will be required to have dedicated staff to monitor for disinformation threats in the 23 official EU languages across the 27 member states, collaborating closely with European cybersecurity authorities. In addition, in line with upcoming EU legislation on transparency of political advertising, political ads on large social media platforms should be clearly labeled as such.

In its 11th EU Threat Landscape report, published in 2023, ENISA also warned about the risks associated with the rise of AI-enabled information manipulation, including the disruptive impacts of AI chatbots. The European Commission, in its efforts to fight the proliferation of deep fakes and sophisticated voter manipulation tactics through advanced generative AI systems, recently launched inquiries into major AI developers and promoted industry pledges in the context of the EU AI Pact.

The view from Cloudflare: increases in cyber attacks around elections

It is likely that the EU is going to see a trend similar to many other jurisdictions where there have been increases in cyber threats targeting election entities. In the period between November 2022 and August 2023, Cloudflare mitigated 213.78 million threats to government election websites in the United States. That amounts to 703,223 threats mitigated per day on average. There is indeed already evidence that European institutions are subject to increasing attacks.

In November 2023, the European Parliament website was subject to a large cyber attack. And in March 2024, French government websites faced attacks of “unprecedented intensity,” according to a spokesperson. A few days before the attacks, on February 25, 2024, Cloudflare blocked a significant DDoS attack on a French government website. It reached as much as 420 million requests per hour and lasted for over three hours.

The UK government warned last year that there were “sustained” cyberattacks against civil society organizations, journalists and public sector groups, as well as phishing attempts directed at British politicians. Most recently, the IT infrastructure of German political party CDU was hit by a “serious cyberattack” according to the German Interior Ministry.

We have also seen that the magnitude of cyber attacks overall is growing every year. As outlined in Cloudflare’s latest DDoS threat report, published in Q1 2024, Cloudflare’s defense systems automatically mitigated 4.5 million DDoS attacks during that first quarter, representing a 50% year-over-year (YoY) increase. EU governments noted in their 2024 Compendium on safeguarding the elections that DDoS attacks “can still be very effective in undermining the public’s trust in the electoral process, especially if affecting its most critical and visible phases – that is the transmission, aggregation and display of voting results”.

However, it is not only an increase in the size of attacks on websites that is keeping election officials up at night. There are often multiple attack vectors that need to be taken into account, and ensuring election processes and public institutions remain secure is a very complicated task. For example, in the three months leading up to the 2022 U.S. midterm elections, Cloudflare prevented around 150,000 phishing emails targeting campaign officials. ENISA’s latest EU Threat Landscape report, when discussing phishing campaigns, pointed to the risks of AI applied to social engineering (e.g. used for crafting more convincing phishing messages), which can make phishing less costly, easier to scale-up, and more effective. These developments all show how securing voter registration systems, ensuring the integrity of election-related information, and planning effective incident response are necessary as online threats grow more and more sophisticated.

Securing the democratic process in the digital age requires partnerships between governments, civil society, and the private sector. Cloudflare has helped election entities around the world by providing tools and expertise to protect themselves from cyberattack. For example, in 2020, we partnered with the International Foundation for Electoral Systems to provide Enterprise-level services to six election management bodies, including the Central Election Commission of Kosovo, State Election Commission of North Macedonia, and many local election bodies in Canada.

Impact on Internet traffic

Cloudflare’s global network, which spans more than 120 countries and protects around 20% of all websites, allows us a unique view of the trends and patterns seen in Internet traffic. Some of those trends, including traffic, connection quality, and Internet outages, can be seen in our Internet insights platform, Cloudflare Radar.

Several of these trends are especially important to watch during election season. Upon deeper analysis, we observed spikes in traffic to websites related to elections, and to news websites, during this time. From data obtained in 2023 through an analysis of US state and local government websites protected under the Athenian Project, as well as US nonprofit organizations that work in voting rights and promoting democracy under Project Galileo, and political campaigns and parties under Cloudflare for Campaigns, Cloudflare observed an increase in traffic to US election and non-profit websites during the run-up to elections, and then a significant spike on election day as seen in the graphs below.

Cloudflare observed similar patterns for election information websites and news media during the first day of the 2022 French Presidential elections and during the Presidential elections in Brazil that same year.

DNS traffic to election domains observed through Cloudflare’s 1.1.1.1 resolver in April 2022, during the first round of the French Presidential elections

Coordinated efforts are key

The protection of election entities and related organizations and institutions is a huge and complex task. As noted, this requires partnerships and collaboration between different actors, both public and private, with specific expertise. The work done by EU governments and agencies to prepare, be ready and collaborate on election security precautions as outlined above is both welcome and necessary in order to ensure free, fair and above all secure elections. This can only ever be a coordinated effort, with both governments and industry working together to ensure a robust response to any threats to the democratic process. For its part, Cloudflare is protecting a number of governmental and political campaign websites across the EU.

We want to ensure that all groups working to promote democracy around the world have the tools they need to stay secure online. If you work in the election space and need our help, please get in touch. If you are an organization looking for protection under Project Galileo, please visit our website at cloudflare.com/galileo.

More information about the European Union elections can be found here. And if you are based in the EU, do not forget to vote!

Securing AI Development in the Cloud: Navigating the Risks and Opportunities

Post Syndicated from Rapid7 original https://blog.rapid7.com/2024/06/05/securing-ai-development-in-the-cloud-navigating-the-risks-and-opportunities/

AI-TRiSM – Trust, Risk and Security Management in the Age of AI

Securing AI Development in the Cloud: Navigating the Risks and Opportunities

Co-authored by Lara Sunday and Pojan Shahrivar

As artificial intelligence (AI) and machine learning (ML) technologies continue to advance and proliferate, organizations across industries are investing heavily in these transformative capabilities. According to Gartner, by 2027, spending on AI software will grow to $297.9 billion at a compound annual growth rate of 19.1%. Generative AI (GenAI) software spend will rise from 8% of AI software in 2023 to 35% by 2027.

With the promise of enhanced efficiency, personalization, and innovation, organizations are increasingly turning to cloud environments to develop and deploy these powerful AI and ML technologies. However, this rapid innovation also introduces new security risks and challenges that must be addressed proactively to protect valuable data, intellectual property, and maintain the trust of customers and stakeholders.

Benefits of Cloud Environments for AI Development

Cloud platforms offer unparalleled scalability, allowing organizations to easily scale their computing resources up or down to meet the demanding requirements of training and deploying complex AI models.

“The ability to spin up and down resources on-demand has been a game-changer for our AI development efforts,” says Stuart Millar, Principal AI Engineer at Rapid7. “We can quickly provision the necessary compute power during peak training periods, then scale back down to optimize costs when those resources are no longer needed.”

Cloud environments also provide a cost-effective way to develop AI models, with usage-based pricing models that avoid large upfront investments in hardware and infrastructure. Additionally, major cloud providers offer access to cutting-edge AI hardware and pre-built tools and services, such as Amazon SageMaker, Azure Machine Learning, and Google Cloud AI Platform, which can accelerate development and deployment cycles.

Challenges and Risks of Cloud-Based AI Development

While the cloud offers numerous advantages for AI development, it also introduces unique challenges that organizations must navigate. Limited visibility into complex data flows and model updates can create blind spots for security teams, leaving them unable to effectively monitor for potential threats or anomalies.

In their  AI Threat Landscape Report, HiddenLayer highlighted that 98% of all the companies surveyed identified that elements of their AI models were crucial to their business success, and 77% identified breaches to their AI in the past year. Additionally, multi-cloud and hybrid deployments bring monitoring, governance, and reporting challenges, making it difficult to assess AI/ML risk in context across different cloud environments.

New Attack Vectors and Risk Types

Developing AI in the cloud also exposes organizations to new attack vectors and risk types that traditional security tools may not be equipped to detect or mitigate. Some examples include:

Prompt Injection (LLM01): Imagine a large language model used for generating marketing copy. An attacker could craft a special prompt that tricks the model into generating harmful or offensive content, damaging the company’s brand and reputation.

Training Data Poisoning (LLM03, ML02): Adversaries can tamper with training data to compromise the integrity and reliability of cloud-based AI models. In the case of an AI model used for image recognition in a security surveillance system, poisoned training data containing mislabeled images could cause the model to generate incorrect classifications, potentially missing critical threats.

Model Theft (LLM10, ML05): Unauthorized access to proprietary AI models deployed in the cloud poses risks to intellectual property and competitive advantage. If a competitor were to steal a model trained on a company’s sensitive data, they could potentially replicate its functionality and gain valuable insights.

Supply Chain Vulnerabilities (LLM05, ML06): Compromised libraries, datasets, or services used in cloud AI development pipelines can lead to widespread security breaches. A malicious actor might introduce a vulnerability into a widely used open-source library for AI, which could then be exploited to gain access to AI models deployed by multiple organizations.

Developing Best Practices for Securing AI Development

To address these challenges and risks, organizations need to develop and implement best practices and standards tailored to their specific business needs, striking the right balance between enabling innovation and introducing risk.

While guidelines like NCSC Secure AI System Development and The Open Standard for Responsible AI provide a valuable starting point, organizations must also develop their own customized best practices that align with their unique business requirements, risk appetite, and AI/ML use cases. For instance, a financial institution developing AI models for fraud detection might prioritize best practices around data governance and model explainability to ensure compliance with regulations and maintain transparency in decision-making processes.

Key considerations when developing these best practices include:

Ensuring secure data handling and governance throughout the AI lifecycle

  • Implementing robust access controls and identity management for AI/ML resources
  • Validating and monitoring AI models for potential biases, vulnerabilities, or anomalies
  • Establishing incident response and remediation processes for AI-specific threats
  • Maintaining transparency and explainability to understand and audit AI model behavior

Rapid7’s Approach to Securing AI Development

“At Rapid7, our InsightCloudSec solution offers real-time visibility into AI/ML resources running across major cloud providers, allowing security teams to continuously monitor for potential risks or misconfigurations,” says Aniket Menon, VP, Product Management. “Visibility is the foundation for effective security in any environment, and that’s especially true in the complex world of AI development. Without a clear view into your AI/ML assets and activities, you’re essentially operating blind, leaving your organization vulnerable to a range of threats.”

Here at Rapid7 our AI TRiSM (Trust, Risk, and Security Management) framework empowers our teams. The framework provides us with confidence not only in our operations but also in driving innovation. In their recent blog outlining the company’s AI principles, Laura Ellis and Sabeen Malik shared how Rapid7 tackles and addresses AI challenges. Centering on transparency, fairness, safety, security, privacy, and accountability, these principles are not just guidelines; they are integral to how Rapid7 builds, deploys, and manages AI systems.

Security and compliance are two key InsightCloudSec capabilities. Compliance Packs are out-of-the-box collections of related Insights focused on industry requirements and standards for all of your resources. Compliance packs may focus on security, costs, governance, or combinations of these across a variety of frameworks, e.g., HIPAA, PCI DSS, GDPR, etc.

Last year Rapid7 launched the Rapid7 AI/ML Security Best Practices compliance pack, the pack allows for real-time and continuous visibility into AI/ML resources running across your clouds with support for GenAI services across AWS, Azure and GCP. To empower you to assess this data in the context of your organizational requirements and priorities, you can then automatically prioritize AI/ML-related risk with Layered Context based on exploitability and potential business impact.

You can also leverage Identity Analysis in InsightCloudSec to collect and present the actions executed by a given user or role within a certain time period. These logged actions are collected and analyzed, providing you with a view across your organization of who can access AI/ML resources and automatically rightsize in accordance with the least privilege access (LPA) concept. This enables you to strategically inform your policies moving forward. Native automation allows you to then act on your assessments to alert on compliance drift, remediate AI/ML risk, and enact prevention mechanisms.

Rapid7’s Continued Dedication to AI Innovation

As an inaugural signer of the CISA Secure by Design Pledge, and through our partnership with Queen’s University Belfast Centre for Secure Information Technologies (CSIT), Rapid7 remains dedicated to collaborating with industry leaders and academic institutions to stay ahead of emerging threats and develop cutting-edge solutions for securing AI development.

As the adoption of AI and ML capabilities continues to accelerate, it’s imperative that organizations have the knowledge and tools to make informed decisions and build with confidence. By implementing robust best practices and leveraging advanced security tools like InsightCloudSec, organizations can harness the power of AI while mitigating the associated risks and ensuring their valuable data and intellectual property remain protected.

To learn more about how Rapid7 can help your organization develop and implement best practices for securing AI development, visit our website to request a demo.


Gartner, Forecast Analysis: Artificial Intelligence Software, 2023-2027, Worldwide, Alys Woodward, et al, 07 November 2023

The collective thoughts of the interwebz