This post was cowritten with Oscar Gali, Head of Technology and Architecture for GI in Zurich, Spain
About Zurich Spain
Zurich Spain is part of Zurich Insurance Group (Zurich), known for its financial soundness and solvency. With more than 135 years of history and over 2,000 employees, it is a leading company in the Spanish insurance market.
Introduction
Enterprise Content Management (ECM) is a key capability for business operations in Insurance, due to the number of documents that must be managed every day. In our digital world, managing and storing business documents and images (such as policies or claims) in a secure, available, scalable, and performant platform is critical.
Zurich Spain decided to use AWS to streamline management of their underlying infrastructure, in addition to the pay-as-you-go pricing model and advanced analytics services. All of these service features create a huge advantage for the company.
The challenge
Zurich Spain was managing all documents for non-life insurance on an on-premises proprietary solution. This was based on an ECM market standard product and specific storage infrastructure. That solution over time had several pain points: cost, scalability, and flexibility. This platform has become obsolete and was an obstacle for covering future analytical needs.
After considering different alternatives, Zurich Spain decided to base their new ECM platform on AWS, leveraging many of the managed services. AWS Managed Services helps to reduce your operational overhead and risk. AWS Managed Services automates common activities, such as change requests, monitoring, patch management, security, and backup services. It provides full lifecycle services to provision, run, and support your infrastructure.
Although the architecture design was clear, the challenge was huge. Zurich Spain had to integrate all the existing business applications with the new ECM platform. Concurrently, the company needed to migrate up to 150 million documents including metadata, in less than 6 months.
The Platform
Functionally, features provided by ECM are:
ECM Features
Authentication: every request must come from an authenticated user (OpenID Connect JWT).
Authorization: on every request, appropriate user permissions are validated.
Documentation Services: exposed API that allows interaction with documents (CRUD). For example:
The ability to Ingest a document either synchronously (attaching the document to the request) or asynchronously (providing a link to the requester that can be used to attach a document when required).
Documents Retrieve, similarly to the upload operation, can be obtained either synchronously or asynchronously. The latter provides a link to be used to download the document within a time range.
ECM has been developed to give the users the ability to search among all the documents uploaded into it.
Metadata: every document has technical and business metadata. This gives Zurich Spain the ability to enrich every single document with all the information that is relevant for their business, for example: Customers, Author, Date of creation.
Record Management: policies to manage documents lifecycle.
Audit: every transaction is logged into the system.
Observability: capabilities to monitor and operate all services involved: logging, performance metrics and transactions traceability.
The Architecture
The ECM platform uses AWS services such as Amazon S3 to store documents. In addition, it uses Amazon DocumentDB to store document metadata and audit trail.
The rationale for choosing these services was:
Amazon S3 delivers strong read-after-write consistency automatically for all applications, without changes to performance or availability. With strong consistency, Amazon S3 simplifies the migration of on-premises analytics workloads by removing the need to update applications. This reduces costs by removing the need for extra infrastructure to provide strong consistency.
Amazon DocumentDB is a NoSQL document-oriented database where its schema flexibility accommodates the different metadata needs. It was key to design the index strategy in advance to ensure the right query performance, considering the volume of data.
A microservices layer has been built on top to provide the right services for the business applications. These include access control, storing or retrieving documents, metadata, and more.
These microservices are built using Thunder, the internal framework and technology stack for digital applications of Zurich Spain. Thunder leverages AWS and provides a K8s environment based on Amazon Elastic Kubernetes Service (Amazon EKS) for microservice deployment.
Figure 2 – Zurich Spain Architecture
Zurich Spain uses AWS Direct Connect to connect from their data center to AWS. With AWS Direct Connect, Zurich Spain can connect to all their AWS resources in an AWS Region. They can transfer their business-critical data directly from their data center into and from AWS. This enables them to bypass their internet service provider and remove network congestion.
Amazon EKS gives Zurich Spain the flexibility to start, run, and scale Kubernetes applications in the AWS Cloud or on-premises. Amazon EKS is helping Zurich Spain to provide highly available and secure clusters while automating key tasks such as patching, node provisioning, and updates. Zurich Spain is also using Amazon Elastic Container Registry (Amazon ECR) to store, manage, share, and deploy container images and artifacts across their environment.
Some interesting metrics of the migration and platform:
Volume: 150+ millions (25 TB) of documents migrated
Duration: migration took 4 months due to the limited extraction throughput of the old platform
Activity: 50,000+ documents are ingested and 25,000+ retrieved daily
Average response time:
550 ms to upload a document
300 ms for retrieving a document hosted in the platform
Conclusion
Zurich Spain successfully replaced a market standard ECM product with a new flexible, highly available, and scalable ECM. This resulted in a 65% run cost reduction, improved performance, and enablement of AWS analytical services.
In addition, Zurich Spain has taken advantage of many benefits that AWS brings to their customers. They’ve demonstrated that Thunder, the new internal framework developed using AWS technology, provides fast application development with secure and frequent deployments.
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to prepare and load your data for analytics. AWS Glue has native connectors to connect to supported data sources on AWS or elsewhere using JDBC drivers. Additionally, AWS Glue now supports reading and writing to Amazon DocumentDB (with MongoDB compatibility) and MongoDB collections using AWS Glue Spark ETL jobs. This feature enables you to connect and read, transform, and load (write) data from and to Amazon DocumentDB and MongoDB collections into services such as Amazon Simple Storage Service (Amazon S3) and Amazon Redshift for downstream analytics. For more information, see Connection Types and Options for ETL in AWS Glue.
This post shows how to build AWS Glue ETL Spark jobs and set up connections with Amazon DocumentDB or MongoDB to read and load data using ConnectionType. The following diagram illustrates the three components of the solution architecture:
Save the following code as MongoDB-Glue-ETL.py in your S3 bucket.
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext, SparkConf
from awsglue.context import GlueContext
from awsglue.job import Job
import time
## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
output_path = "s3://<bucket>/<folder>/" + str(time.time()) + "/"
mongo_uri = "mongodb://<host name or IP>:27017"
write_uri = "mongodb://<host name or IP>:27017"
read_mongo_options = {
"uri": mongo_uri,
"database": "test",
"collection": "profiles",
"username": "<username>",
"password": "<password>",
"partitioner": "MongoSamplePartitioner",
"partitionerOptions.partitionSizeMB": "10",
"partitionerOptions.partitionKey": "_id"}
write_mongo_options = {
"uri": write_uri,
"database": "test",
"collection": "collection1",
"username": "<username>",
"password": "<password>"
}
# Get DynamicFrame from MongoDB
dynamic_frame = glueContext.create_dynamic_frame.from_options(connection_type="mongodb",
connection_options=read_mongo_options)
# Write DynamicFrame to MongoDB
glueContext.write_dynamic_frame.from_options(dynamic_frame, connection_type="mongodb", connection_options=write_mongo_options)
job.commit()
Provisioning resources with AWS CloudFormation
For this post, we provide CloudFormation templates for you to review and customize to your needs. Some of the resources deployed by this stack incur costs as long as they remain in use, such as Amazon DocumentDB and Amazon EC2.
The Amazon DocumentDB stack creation can take up to 15 minutes, and MongoDB stack creation can take up 60 minutes.
When stack creation is complete, go to the Outputs tab for the stack on the AWS CloudFormation console and note down the following values (you use these in later steps):
DocumentDB CloudFormation – ClusterEndpoint and ClusterPort
You’re now ready to configure AWS Glue ETL jobs using Amazon DocumentDB and MongoDB ConnectionType.
Setting up AWS Glue connections
You set up two separate connections for Amazon DocumentDB and MongoDB when the databases are in two different VPCs (or if you deployed the databases using the provided CloudFormation template). Complete the following steps for both connections. We first walk you through the Amazon DocumentDB connection.
On the AWS Glue console, under Databases, choose Connections.
Choose Add connection.
For Connection name, enter a name for your connection.
If you have SSL enabled on your Amazon DocumentDB cluster (which is what the CloudFormation template in this post used), select Require SSL connection.
For Connection Type, choose Amazon DocumentDB or MongoDB.
Choose Next.
For Amazon DocumentDB URL, enter a URL using the output from the CloudFormation stack, such as mongodb://host:port/databasename (use the default port, 27017).
For Username and Password, enter the credentials you entered as parameters when creating the CloudFormation stack.
For VPC, choose the VPC in which you created databases (Amazon DocumentDB and MongoDB).
For Subnet, choose the subnet within your VPC.
For Security groups, select your security group.
Choose Next.
Review the connection details and choose Finish.
Similarly, add the connection for MongoDB with the following changes to the steps:
If you used the CloudFormation template in this post, don’t select Require SSL connection for MongoDB
For Connection Type, choose MongoDB
For MongoDB URL, enter a URL using the output from the CloudFormation stack, such as mongodb://host:port/databasename (use the default port, 27017)
Creating an AWS Glue endpoint, S3 endpoint, and security group
Before testing the connections, make sure you create an AWS Glue endpoint and S3 endpoint in the VPC in which the databases are created. Complete the following steps for both Amazon DocumentDB and MongoDB instances separately:
To create your AWS Glue endpoint, on the Amazon VPC console, choose Endpoints.
Choose Createendpoint.
For Service Name, choose AWS Glue.
Search for and select com.amazonaws.<region>.glue (for example, com.amazonaws.us-west-2.glue). Enter the appropriate Region where the database instance was created.
For VPC, choose the VPC of the Amazon DocumentDB
For Security group, select the security groups of the Amazon DocumentDB cluster.
Choose Create endpoint.
To create your S3 endpoint, on the Amazon VPC console, choose Endpoints.
Choose Create endpoint.
For Service Name, choose Amazon S3.
Search for and select com.amazonaws.<region>.s3 (for example, com.amazonaws.us-west-2.s3). Enter the appropriate Region.
For VPC, choose the VPC of the Amazon DocumentDB
For Configure route tables, select the route table ID of the associated subnet of the database.
Choose Create endpoint.
Similarly, add an AWS Glue endpoint and S3 endpoint for MongoDB with the following changes:
Choose the VPC of the Amazon MongoDB instance
The Amazon security group must include itself as a source in its inbound rules. Complete the following steps for both Amazon DocumentDB and MongoDB instances separately:
On the Security Groups page, choose Edit Inbound Rules.
Choose Add rule.
For Type, choose All traffic.
For Source, choose the same security group.
Choose Save rules.
The objective of setting up a connection is to establish private connections between the Amazon DocumentDB and MongoDB instances in the VPC and AWS Glue via the S3 endpoint, AWS Glue endpoint, and security group. It’s not required to test the connection because that connection is established by the AWS Glue job when you run it. At the time of writing, testing an AWS Glue connection is not supported for Amazon DocumentDB connections.
Code for building the AWS Glue ETL job
The following sample code sets up a read connection with Amazon DocumentDB for your AWS Glue ETL job (PySpark):
The following sample code creates an AWS Glue DynamicFrame by using the read and write connections for your AWS Glue ETL job (PySpark):
# Get DynamicFrame from DocumentDB
dynamic_frame2 = glueContext.create_dynamic_frame.from_options(connection_type="documentdb",
connection_options=read_docdb_options)
# Write DynamicFrame to DocumentDB
glueContext.write_dynamic_frame.from_options(dynamic_frame2, connection_type="documentdb",
connection_options=write_documentdb_options)
Setting up AWS Glue ETL jobs
You’re now ready to set up your ETL job in AWS Glue. Complete the following steps for both Amazon DocumentDB and MongoDB instances separately:
On the AWS Glue console, under ETL, choose Jobs.
Choose Add job.
For Job Name, enter a name.
For IAM role, choose the IAM role you created as a prerequisite.
For Type, choose Spark.
For Glue Version, choose Python (latest version).
For This job runs, choose An existing script that you provide.
Choose the Amazon S3 path where the script (DocumentDB-Glue-ETL.py) is stored.
Under Advanced properties, enable Job bookmark.
Job bookmarks help AWS Glue maintain state information and prevent the reprocessing of old data.
Keep the remaining settings at their defaults and choose Next.
For Connections, choose the Amazon DocumentDB connection you created.
Choose Save job and edit scripts.
Edit the following parameters:
documentdb_uri or mongo_uri
documentdb_write_uri or write_uri
user
password
output_path
Choose Run job.
When the job is finished, validate the data loaded in the collection.
Similarly, add the job for MongoDB with the following changes:
Choose the Amazon S3 path where the script (MongoDB-Glue-ETL.py) is stored
For Connections, choose the Amazon MongoDB connection you created
Change the parameters applicable to MongoDB (mongo_uri and write_uri)
Cleaning up
After you finish, don’t forget to delete the CloudFormation stack, because some of the AWS resources deployed by the stack in this post incur a cost as long as you continue to use them.
You can delete the CloudFormation stack to delete all AWS resources created by the stack.
On the AWS CloudFormation console, on the Stacks page, select the stack to delete. The stack must be currently running.
On the stack details page, choose Delete.
Choose Delete stack when prompted.
Additionally, delete the AWS Glue endpoint, S3 endpoint, AWS Glue connections, and AWS Glue ETL jobs.
Summary
In this post, we showed you how to build AWS Glue ETL Spark jobs and set up connections using ConnectionType for Amazon DocumentDB and MongoDB databases using AWS CloudFormation. You can use this solution to read data from Amazon DocumentDB or MongoDB, and transform it and write to Amazon DocumentDB or MongoDB or other targets like Amazon S3 (using Amazon Athena to query), Amazon Redshift, Amazon DynamoDB, Amazon Elasticsearch Service (Amazon ES), and more.
If you have any questions or suggestions, please leave a comment.
About the Authors
Naresh Gautam is a Sr. Analytics Specialist Solutions Architect at AWS. His role is helping customers architect highly available, high-performance, and cost-effective data analytics solutions to empower customers with data-driven decision-making. In his free time, he enjoys meditation and cooking.
Srikanth Sopirala is a Sr. Analytics Specialist Solutions Architect at AWS. He is a seasoned leader with over 20 years of experience, who is passionate about helping customers build scalable data and analytics solutions to gain timely insights and make critical business decisions. In his spare time, he enjoys reading, spending time with his family and road biking.
This post was co-written by Michael Wirig, Software Engineering Manager at Grōv Technologies.
A substantial percentage of the world’s habitable land is used for livestock farming for dairy and meat production. The dairy industry has leveraged technology to gain insights that have led to drastic improvements and are continuing to accelerate. A gallon of milk in 2017 involved 30% less water, 21% less land, a 19% smaller carbon footprint, and 20% less manure than it did in 2007 (US Dairy, 2019). By focusing on smarter water usage and sustainable land usage, livestock farming can grow to provide sustainable and nutrient-dense food for consumers and livestock alike.
Grōv Technologies (Grōv) has pioneered the Olympus Tower Farm, a fully automated Controlled Environment Agriculture (CEA) system. Unique amongst vertical farming startups, Grōv is growing cattle feed to improve that sustainable use of land for livestock farming while increasing the economic margins for dairy and beef producers.
The challenges of CEA
The set of growing conditions for a CEA is called a “recipe,” which is a combination of ingredients like temperature, humidity, light, carbon dioxide levels, and water. The optimal recipe is dynamic and is sensitive to its ingredients. Crops must be monitored in near-real time, and CEAs should be able to self-correct in order to maintain the recipe. To build a system with these capabilities requires answers to the following questions:
What parameters are needed to measure for indoor cattle feed production?
What sensors enable the accuracy and price trade-offs at scale?
Where do you place the sensors to ensure a consistent crop?
How do you correlate the data from sensors to the nutrient value?
To progress from a passively monitored system to a self-correcting, autonomous one, the CEA platform also needs to address:
How to maintain optimum crop conditions
How the system can learn and adapt to new seed varieties
How to communicate key business drivers such as yield and dry matter percentage
Grōv partnered with AWS Professional Services (AWS ProServe) to build a digital CEA platform addressing the challenges posed above.
Tower automation and edge platform
The Olympus Tower is instrumented for measuring recipe ingredients by combining the mechanical, electrical, and domain expertise of the Grōv team with the IoT edge and sensor expertise of the AWS ProServe team. The teams identified a primary set of features such as height, weight, and evenness of the growth to be measured at multiple stages within the Tower. Sensors were also added to measure secondary features such as water level, water pH, temperature, humidity, and carbon dioxide.
The teams designed and developed a purpose-built modular and industrial sensor station. Each sensor station has sensors for direct measurement of the features identified. The sensor stations are extended to support indirect measurement of features using a combination of Computer Vision and Machine Learning (CV/ML).
The trays with the growing cattle feed circulate through the Olympus Tower. A growth cycle starts on a tray with seeding, circulates through the tower over the cycle, and returns to the starting position to be harvested. The sensor station at the seeding location on the Olympus Tower tags each new growth cycle in a tray with a unique “Grow ID.” As trays pass by, each sensor station in the Tower collects the feature data. The firmware, jointly developed for the sensor station, uses AWS IoT SDK to stream the sensor data along with the Grow ID and metadata that’s specific to the sensor station. This information is sent every five minutes to an on-site edge gateway powered by AWS IoT Greengrass. Dedicated AWS Lambda functions manage the lifecycle of the Grow IDs and the sensor data processing on the edge.
The Grōv team developed AWS Greengrass Lambda functions running at the edge to ingest critical metrics from the operation automation software running the Olympus Towers. This information provides the ability to not just monitor the operational efficiency, but to provide the hooks to control the feedback loop.
The two sources of data were augmented with site-level data by installing sensor stations at the building level or site level to capture environmental data such as weather and energy consumption of the Towers.
All three sources of data are streamed to AWS IoT Greengrass and are processed by AWS Lambda functions. The edge software also fuses the data and correlates all categories of data together. This enables two major actions for the Grōv team – operational capability in real-time at the edge and enhanced data streamed into the cloud.
Cloud pipeline/platform: analytics and visualization
A ReactJS-based dashboard application is powered using Amazon API Gateway and AWS Lambda functions to report relevant metrics such as daily yield and machine uptime.
A data pipeline is deployed to analyze data using Amazon QuickSight. AWS Glue is used to create a dataset from the data stored in Amazon S3. Amazon Athena is used to query the dataset to make it available to Amazon QuickSight. This provides the extended Grōv tech team of research scientists the ability to perform a series of what-if analyses on the data coming in from the Tower Systems beyond what is available in the react-based dashboard.
Completing the data-driven loop
Now that the data has been collected from all sources and stored it in a data lake architecture, the Grōv CEA platform established a strong foundation for harnessing the insights and delivering the customer outcomes using machine learning.
The integrated and fused data from the edge (sourced from the Olympus Tower instrumentation, Olympus automation software data, and site-level data) is co-related to the lab analysis performed by Grōv Research Center (GRC). Harvest samples are routinely collected and sent to the lab, which performs wet chemistry and microbiological analysis. Trays sent as samples to the lab are associated with the results of the analysis with the sensor data by corresponding Grow IDs. This serves as a mechanism for labeling and correlating the recipe data with the parameters used by dairy and beef producers – dry matter percentage, micro and macronutrients, and the presence of myco-toxins.
Grōv has chosen Amazon SageMaker to build a machine learning pipeline on its comprehensive data set, which will enable fine tuning the growing protocols in near real-time. Historical data collection unlocks machine learning use cases for future detection of anomalous sensors readings and sensor health monitoring, as well.
Because the solution is flexible, the Grōv team plans to integrate data from animal studies on their health and feed efficiency into the CEA platform. Machine learning on the data from animal studies will enhance the tuning of recipe ingredients that impact the animals’ health. This will give the farmer an unprecedented view of the impact of feed nutrition on the end product and consumer.
Conclusion
Grōv Technologies and AWS ProServe have built a strong foundation for an extensible and scalable architecture for a CEA platform that will nourish animals for better health and yield, produce healthier foods and to enable continued research into dairy production, rumination and animal health to empower sustainable farming practices.
You may have already received an email or seen a console notification, but I don’t want you to be taken by surprise!
Rotate Now If you are using Amazon Aurora, Amazon Relational Database Service (RDS), or Amazon DocumentDB and are taking advantage of SSL/TLS certificate validation when you connect to your database instances, you need to download & install a fresh certificate, rotate the certificate authority (CA) for the instances, and then reboot the instances.
If you are not using SSL/TLS connections or certificate validation, you do not need to make any updates, but I recommend that you do so in order to be ready in case you decide to use SSL/TLS connections in the future. In this case, you can use a new CLI option that rotates and stages the new certificates but avoids a restart.
The new certificate (CA-2019) is available as part of a certificate bundle that also includes the old certificate (CA-2015) so that you can make a smooth transition without getting into a chicken and egg situation.
What’s Happening? The SSL/TLS certificates for RDS, Aurora, and DocumentDB expire and are replaced every five years as part of our standard maintenance and security discipline. Here are some important dates to know:
September 19, 2019 – The CA-2019 certificates were made available.
January 14, 2020 – Instances created on or after this date will have the new (CA-2019) certificates. You can temporarily revert to the old certificates if necessary.
February 5 to March 5, 2020 – RDS will stage (install but not activate) new certificates on existing instances. Restarting the instance will activate the certificate.
March 5, 2020 – The CA-2015 certificates will expire. Applications that use certificate validation but have not been updated will lose connectivity.
How to Rotate Earlier this month I created an Amazon RDS for MySQL database instance and set it aside in preparation for this blog post. As you can see from the screen shot above, the RDS console lets me know that I need to perform a Certificate update.
I visit Using SSL/TLS to Encrypt a Connection to a DB Instance and download a new certificate. If my database client knows how to handle certificate chains, I can download the root certificate and use it for all regions. If not, I download a certificate that is specific to the region where my database instance resides. I decide to download a bundle that contains the old and new root certificates:
Next, I update my client applications to use the new certificates. This process is specific to each app and each database client library, so I don’t have any details to share.
Once the client application has been updated, I change the certificate authority (CA) to rds-ca-2019. I can Modify the instance in the console, and select the new CA:
After my instance has been rebooted (either immediately or during the maintenance window), I test my application to ensure that it continues to work as expected.
If I am not using SSL and want to avoid a restart, I use --no-certificate-rotation-restart:
The database engine will pick up the new certificate during the next planned or unplanned restart.
I can also use the RDS ModifyDBInstance API function or a CloudFormation template to change the certificate authority.
Once again, all of this must be completed by March 5, 2020 or your applications may be unable to connect to your database instance using SSL or TLS.
Things to Know Here are a couple of important things to know:
Amazon Aurora Serverless – AWS Certificate Manager (ACM) is used to manage certificate rotations for this database engine, and no action is necessary.
Regions – Rotation is needed for database instances in all commercial AWS regions except Asia Pacific (Hong Kong), Middle East (Bahrain), and China (Ningxia).
Cluster Scaling – If you add more nodes to an existing cluster, the new nodes will receive the CA-2019 certificate if one or more of the existing nodes already have it. Otherwise, the CA-2015 certificate will be used.
Learning More Here are some links to additional information:
Learn about AWS Services & Solutions – September AWS Online Tech Talks
Join us this September to learn about AWS services and solutions. The AWS Online Tech Talks are live, online presentations that cover a broad range of topics at varying technical levels. These tech talks, led by AWS solutions architects and engineers, feature technical deep dives, live demonstrations, customer examples, and Q&A with AWS experts. Register Now!
Note – All sessions are free and in Pacific Time.
Tech talks this month:
Compute:
September 23, 2019 | 11:00 AM – 12:00 PM PT – Build Your Hybrid Cloud Architecture with AWS – Learn about the extensive range of services AWS offers to help you build a hybrid cloud architecture best suited for your use case.
September 25, 2019 | 1:00 PM – 2:00 PM PT – What’s New in Amazon DocumentDB (with MongoDB compatibility) – Learn what’s new in Amazon DocumentDB, a fully managed MongoDB compatible database service designed from the ground up to be fast, scalable, and highly available.
September 23, 2019 | 9:00 AM – 10:00 AM PT – Training Machine Learning Models Faster – Learn how to train machine learning models quickly and with a single click using Amazon SageMaker.
September 30, 2019 | 11:00 AM – 12:00 PM PT – Using Containers for Deep Learning Workflows – Learn how containers can help address challenges in deploying deep learning environments.
September 24, 2019 | 11:00 AM – 12:00 PM PT – Application Migrations Using AWS Server Migration Service (SMS) – Learn how to use AWS Server Migration Service (SMS) for automating application migration and scheduling continuous replication, from your on-premises data centers or Microsoft Azure to AWS.
September 30, 2019 | 1:00 PM – 2:00 PM PT – AWS Office Hours: Amazon CloudFront – Just getting started with Amazon CloudFront and [email protected]? Get answers directly from our experts during AWS Office Hours.
Robotics:
October 1, 2019 | 11:00 AM – 12:00 PM PT – Robots and STEM: AWS RoboMaker and AWS Educate Unite! – Come join members of the AWS RoboMaker and AWS Educate teams as we provide an overview of our education initiatives and walk you through the newly launched RoboMaker Badge.
October 2, 2019 | 9:00 AM – 10:00 AM PT – Deep Dive on Amazon EventBridge – Learn how to optimize event-driven applications, and use rules and policies to route, transform, and control access to these events that react to data from SaaS apps.
A glance at the AWS Databases page will show you that we offer an incredibly wide variety of databases, each one purpose-built to address a particular need! In order to help you build the coolest and most powerful applications, you can mix and match relational, key-value, in-memory, graph, time series, and ledger databases.
Introducing Amazon DocumentDB (with MongoDB compatibility) Today we are launching Amazon DocumentDB (with MongoDB compatibility), a fast, scalable, and highly available document database that is designed to be compatible with your existing MongoDB applications and tools. Amazon DocumentDB uses a purpose-built SSD-based storage layer, with 6x replication across 3 separate Availability Zones. The storage layer is distributed, fault-tolerant, and self-healing, giving you the the performance, scalability, and availability needed to run production-scale MongoDB workloads.
Each MongoDB database contains a set of collections. Each collection (similar to a relational database table) contains a set of documents, each in the JSON-like BSON format. For example:
Each document can have a unique set of field-value pairs and data; there are no fixed or predefined schemas. The MongoDB API includes the usual CRUD (create, read, update, and delete) operations along with a very rich query model. This is just the tip of the iceberg (the MongoDB API is very powerful and flexible), so check out the list of supported MongoDB operations, data types, and functions to learn more.
All About Amazon DocumentDB Here’s what you need to know about Amazon DocumentDB:
Compatibility – Amazon DocumentDB is compatible with version 3.6 of MongoDB.
Scalability – Storage can be scaled from 10 GB up to 64 TB in increments of 10 GB. You don’t need to preallocate storage or monitor free space; Amazon DocumentDB will take care of that for you. You can choose between six instance sizes (15.25 GiB to 488 GiB of memory), and you can create up to 15 read replicas. Storage and compute are decoupled and you can scale each one independently and as-needed.
Performance – Amazon DocumentDB stores database changes as a log stream, allowing you to process millions of reads per second with millisecond latency. The storage model provides a nice performance increase without compromising data durability, and greatly enhances overall scalability.
Reliability – The 6-way storage replication ensures high availability. Amazon DocumentDB can failover from a primary to a replica within 30 seconds, and supports MongoDB replica set emulation so applications can handle failover quickly.
Fully Managed – Like the other AWS database services, Amazon DocumentDB is fully managed, with built-in monitoring, fault detection, and failover. You can set up daily snapshot backups, take manual snapshots, and use either one to create a fresh cluster if necessary. You can also do point-in-time restores (with second-level resolution) to any point within the 1-35 day backup retention period.
Secure – You can choose to encrypt your active data, snapshots, and replicas with the KMS key of your choice when you create each of your Amazon DocumentDB clusters. Authentication is enabled by default, as is encryption of data in transit.
Compatible – As I said earlier, Amazon DocumentDB is designed to work with your existing MongoDB applications and tools. Just be sure to use drivers intended for MongoDB 3.4 or newer. Internally, Amazon DocumentDB implements the MongoDB 3.6 API by emulating the responses that a MongoDB client expects from a MongoDB server.
Creating An Amazon DocumentDB (with MongoDB compatibility) Cluster You can create a cluster from the Console, Command Line, CloudFormation, or by making a call to the CreateDBCluster function. I’ll use the Amazon DocumentDB Console today. I open the console and click Launch Amazon DocumentDB to get started:
I name my cluster, choose the instance class, and specify the number of instances (one is the primary and the rest are replicas). Then I enter a master username and password:
I can use any of the following instance classes for my cluster:
At this point I can click Create cluster to use default settings, or I can click Show advanced settings for additional control. I can choose any desired VPC, subnets, and security group. I can also set the port and parameter group for the cluster:
I can control encryption (enabled by default), set the backup retention period, and establish the backup window for point-in-time restores:
I can also control the maintenance window for my new cluster. Once I am ready I click Create cluster to proceed:
My cluster starts out in creating status, and switches to available very quickly:
As do the instances in the cluster:
Connecting to a Cluster With the cluster up and running, I install the mongo shell on an EC2 instance (details depend on your distribution) and fetch a certificate so that I can make a secure connection:
The console shows me the command that I need to use to make the connection:
I simply customize the command with the password that I specified when I created the cluster:
From there I can use any of the mongo shell commands to insert, query, and examine data. I inserted some very simple documents and then ran an equally simple query (I’m sure you can do a lot better):
Now Available Amazon DocumentDB (with MongoDB compatibility) is available now and you can start using it today in the US East (N. Virginia), US East (Ohio), US West (Oregon), and Europe (Ireland) Regions. Pricing is based on the instance class, storage consumption for current documents and snapshots, I/O operations, and data transfer.
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.