Manage your Digital Microscopy Data using OMERO on AWS

Post Syndicated from Travis Berkley original https://aws.amazon.com/blogs/architecture/manage-your-digital-microscopy-data-using-omero-on-aws/

The Open Microscopy Environment (OME) consortium develops open-source software and format standards for microscopy data. OME Remote Objects (OMERO) is an open source, image data management platform designed to support digital pathology and cellular biology studies. You can access, share, and work with various biological data. This can include histopathology, high content screening, electron microscopy, and even non-image genotype data. Deploying this open source tool on Amazon Web Services (AWS) allows you to access your image data in a secure central repository. You can take advantage of elastic storage by growing the archive as needed without provisioning excess storage beforehand. OMERO has a web interface, which facilitates data access and visualization. It also supports connection through the OMERO client or other third-party image analysis tools, like CellProfilerTM, QuPath, Fiji, ImageJ, and others.

The challenge of microscopy data

Saint Louis University (SLU) School of Medicine Research Microscopy and Histology Core required a centralized system for both distribution and hosting. The solution must provide research imaging distribution to both internal and external clients. It also needed the capability of hosting an educational platform for microscope images. SLU decided that the open source software OMERO was an ideal fit for them.

In order to provide speed, ease of access, and security for the University’s computer networks, SLU decided the solution must be hosted in the cloud. By partnering with AWS, SLU established a robust system for their clients. The privately hosted images on OMERO represent research material databases used by University researchers. OMERO also hosts teaching datasets for resident and fellow education. Other publicly hosted repositories provide access to source images for future publishing standards and regulations. SLU reported that the implementation was extraordinarily smooth for a non-programmer. In addition, the system design allowed for advanced data management to control costs and security.

Reviewing the OMERO architecture

OMERO is a typical three-tier web application, consisting of the following components:

  • OMERO.web provides access to OMERO’s data hierarchies and also enables annotation, organization, and visualization of data. This web browser-based client of OMERO.server exposes the annotation-based data-sharing mechanism.
  • OMERO.server is a middleware server application that provides access to image data and metadata stored in a series of databases. It contains a multi-threaded, image-rendering engine and supports a wide range (>140) of image pyramid formats through the Bio-Formats Java library. This Java application facilitates remote access and interoperability for modern scientific studies. It also exposes an API to allow any OMERO client to access the original data and any derived measurements.
  • OMERO relational database (PostgreSQL) provides the underlying storage facilities. This storage backend contains the processed metadata associated with the binary images, measurement specification, user information, structured annotations, and more.
Figure 1. Architectural diagram for a highly available (HA) deployment of OMERO on AWS including data ingestion options

Figure 1. Architectural diagram for a highly available (HA) deployment of OMERO on AWS including data ingestion options

To achieve the highly available (HA) deployment in the diagram, follow the guidance from this GitHub repository. Since OMERO only supports one writer per mounted network file share, there is one OMERO read+write server and one read-only server in the HA deployment. Otherwise, multiple instances will compete to get first access to Amazon Elastic File System (EFS). If HA is not a requirement, you can lower costs by deploying only the read+write OMERO.server.

OMERO is deployed on AWS using AWS CloudFormation (CFN) templates, which will deploy two nested CFN stacks, one for storage, and one for compute. The storage template creates an EFS volume and an Amazon Relational Database Service (RDS) instance of PostgreSQL. EFS provides the option to move files to an infrequent accessed storage class after a certain number of days to save storage cost. RDS has Multi-AZ option to improve business continuity. The compute template creates Amazon Elastic Container Service (Amazon ECS) containers for the OMERO web and server functions. You have the option to deploy the OMERO containers on AWS Fargate or Amazon EC2 launch type. It also creates an Amazon Application Load Balancer (ALB) with duration-based stickiness enabled and an AWS Certificate Manager (ACM) certificate for Transport Layer Security (TLS) termination at ALB. Only the ALB is publicly accessible, as the web portal is protected behind it in private subnets. VPC and subnets are required, which can be obtained via this CFN template. It also requires the hosted zone ID and fully qualified domain name in Amazon Route 53, which will be used to validate the TLS certificate. If higher security is not a requirement, there is an option to deploy without the registered domain and the hosted zone in Route 53. You will then be able to access the OMERO web through Application Load Balancer DNS name without TLS encryption.

Additionally, the containers of OMERO.web and OMERO.server can be extended with plugins. The landing page for login can be customized with logos, brands, or disclaimers. Build a new Docker container image with specific configuration changes to enrich the functionality of this open source platform.

You can use Amazon ECS Exec to access the OMERO command line interface (CLI) to import images within the OMERO.server container, running on either AWS Fargate or EC2 launch type. You can also run Amazon ECS Exec via AWS CloudShell. The OMERO CFN templates enable Amazon ECS Exec commands by default. You will only need to install AWS CLI and SSM plugin on your clients or AWS CloudShell to initiate the commands. When you import images within the OMERO.server container instances, you can use the OMERO in-place import to avoid redundant copies of the image files on Amazon EFS. Alternatively, you can access the Windows desktop OMERO client OMERO.insight, via the application virtualization service Amazon AppStream 2.0. This connects to the OMERO.server in the same VPC. Amazon AppStream 2.0 allows Amazon S3 being used as home folder storage, so you can import images directly from Amazon S3 to OMERO.server.

AWS offers multiple options to move your microscopic image data from on premises facilities to the cloud storage, as illustrated in Figure 1:

  1. Use AWS Transfer Family to copy data directly from on premises devices to EFS
  2. Alternatively, transfer data directly from your on-premises Network File System (NFS) to EFS using AWS DataSync. AWS DataSync can also be used to transfer files from S3 to EFS.
  3. Set up AWS Storage Gateway, in particular File Gateway, to move your image files from on premises to Amazon Simple Storage Service (S3) first. A storage lifecycle policy can archive images. You can track the storage activity metrics using Amazon S3 Storage Lens and gain insights on storage cost using cost allocation tags. Once the files are in Amazon S3, you can either set up AWS DataSync to transfer files from S3 to EFS, or directly import files into OMERO.server.

To find the latest development to this solution, check out digital pathology on AWS repository on GitHub.

Conclusion

Researchers and scientists at St. Louis University were able to grow their image repository on AWS without the concern of fixed storage limits. They can scale their compute environment up or down as their research requirements dictate. The managed services, like Amazon ECS and RDS, are able to significantly reduce the operational workloads from researchers. SLU reports that this platform is of great use to their researchers. Other universities, academic medical centers, and pharmaceutical and biotechnology companies can also use this cloud-based image data management platform to collect, visualize, and share access to their image data assets.