Security updates have been issued by Arch Linux (sthttpd), Debian (clamav, libreoffice, and pound), openSUSE (ipsec-tools and leptonica), SUSE (libreoffice), and Ubuntu (exim4, firefox, php5, puppet, and wavpack).
Security updates have been issued by Arch Linux (clamav), Debian (mailman, mpv, and simplesamlphp), Fedora (tomcat-native), openSUSE (docker, docker-runc, containerd,, kernel, mupdf, and python-mistune), Red Hat (kernel), and Ubuntu (mailman and postgresql-9.3, postgresql-9.5, postgresql-9.6).
Post Syndicated from Mahendra Chheda original https://aws.amazon.com/blogs/security/how-to-search-more-efficiently-in-amazon-cloud-directory/
Using Amazon Cloud Directory, you can build flexible, cloud-native directories for organizing hierarchies of data along multiple dimensions. And now, you can search more efficiently by searching across only a subset of objects in your directory. For example, instead of searching through all of the employees in a company directory built using Cloud Directory, you can choose to search only full-time employees or contractors.
To search across such a subset of objects, you must first create a facet-based index. A facet is a set of attributes defined in a schema that is associated with a directory object. Using facets, you can create different object types in your directory. For instance, you can create different facets for full-time employees and contractors in a schema and then create full-time employee objects and contractor objects. You then can create an index of all the objects that include a specific facet and search those objects more efficiently.
In this blog post, I show how you can create a facet-based index in Cloud Directory to more efficiently search for objects in your directory.
Scenario: Searching a company directory for a specific employee type
Let’s say a company called AnyCompany wants to be able to efficiently search in Cloud Directory for information about its full-time employees and contractors. To do this, AnyCompany must create a company directory using Cloud Directory. (If AnyCompany already had a company directory using Cloud Directory, they could use that directory instead.) AnyCompany starts by creating
DirectorySchema, which is a schema that includes three facets:
The following diagram is a visual representation of AnyCompany’s company directory, and it includes full-time employees and contractors in a reporting hierarchy. The full-time employees are shown in blue nodes and the contractors are shown in green nodes. The directory’s three facets are shown as they correspond to full-time employees, managers, and contractors.
To more efficiently search your directory, follow these steps:
- Create a facet-based index that includes the facets you want to use when searching.
- Populate the index with the appropriate employee objects.
- List all the objects in the index.
- List objects in the index that include a specific facet.
1. Create a facet-based index that includes the facets you want to use when searching
The following code example creates a facet-based index of the employee objects in the directory. Cloud Directory currently supports only simple indexes, which means that an index object can only store one type of value, such as a facet.
2. Populate the index with the appropriate employee objects
Next, I add all the objects that I want to include in the index. The following code example adds objects to
3. List all the objects in the index
Now, I can query my directory efficiently for the set of objects I have in
facetIndex. The following code example returns all the objects in your index.
4. List objects in the index that include a specific facet
I can add a filter for retrieving subsets of objects in the index that contain a specific facet. The following code example shows how to add a filter to the query so that only objects that contain the facet
FullTimeEmployeeFacet are returned.
Using this subset of objects, I can now search for a specific employee without searching across all the objects in my directory.
You can use facet-based indexing to search your directory more efficiently by searching across only a subset of objects in of your directory. For more information about this feature, see Indexing and Search.
If you have comments about this blog post, submit them in the “Comments” section below. If you have questions about implementing the solution in this blog post, start a new thread in the Directory Service forum or contact AWS Support.
Post Syndicated from Gleb Budman original https://www.backblaze.com/blog/all-in-on-unlimited-backup/
The cloud backup industry has seen its share of tumultuousness. BitCasa, Dell DataSafe, Xdrive, and a dozen others have closed up shop. Mozy, Amazon, and Microsoft offered, but later canceled, their unlimited offerings. Recently, CrashPlan for Home customers were notified that their service was being end-of-lifed. Then today we’ve heard from Carbonite customers who are frustrated by this morning’s announcement of a price increase from Carbonite.
We believe that the fundamental goal of a cloud backup is having peace-of-mind: knowing your data — all of it — is safe. For over 10 years Backblaze has been providing that peace-of-mind by offering completely unlimited cloud backup to our customers. And we continue to be committed to that. Knowing that your cloud backup vendor is not going to disappear or fundamentally change their service is an essential element in achieving that peace-of-mind.
Committed to Unlimited Backup
When Mozy discontinued their unlimited backup on Jan 31, 2011, a lot of people asked, “Does this mean Backblaze will discontinue theirs as well?” At that time I wrote the blog post Backblaze is committed to unlimited backup. That was seven years ago. Since then we’ve continued to make Backblaze cloud backup better: dramatically speeding up backups and restores, offering the unique and very popular Restore Return Refund program, enabling direct access and sharing of any file in your backup, and more. We also introduced Backblaze Groups to enable businesses and families to manage backups — all at no additional cost.
How That’s Possible
I’d like to answer the question of “How have you been able to do this when others haven’t?
First, commitment. It’s not impossible to offer unlimited cloud backup, but it’s not easy. The Backblaze team has been committed to unlimited as a core tenet.
Second, we have pursued the technical, business, and cultural steps required to make it happen. We’ve designed our own servers, written our cloud storage software, run our own operations, and been continually focused on every place we could optimize a penny out of the cost of storage. We’ve built a culture at Backblaze that cares deeply about that.
Price increases and plan changes happen in our industry, but Backblaze has consistently been the low price leader, and continues to stand by the foundational element of our service — truly unlimited backup storage. Carbonite just announced a price increase from $60 to $72/year, and while that’s not an astronomical increase, it’s important to keep in mind the service that they are providing at that rate. The basic Carbonite plan provides a service that doesn’t back up videos or external hard drives by default. We think that’s dangerous. No one wants to discover that their videos weren’t backed up after their computer dies, or have to worry about the safety and durability of their data. That is why we have continued to build on our foundation of unlimited, as well as making our service faster and more accessible. All of these serve the goal of ensuring peace-of-mind for our customers.
3 Months Free For You & A Friend
As part of our commitment to unlimited, refer your friends to receive three months of Backblaze service through March 15, 2018. When you Refer-a-Friend with your personal referral link, and they subscribe, both of you will receive three months of service added to your account. See promotion details on our Refer-a-Friend page.
Want A Reminder When Your Carbonite Subscription Runs Out?
If you’re considering switching from Carbonite, we’d love to be your new backup provider. Enter your email and the date you’d like to be reminded in the form below and you’ll get a friendly reminder email from us to start a new backup plan with Backblaze. Or, you could start a free trial today.
We think you’ll be glad you switched, and you’ll have a chance to experience some of that Backblaze peace-of-mind for your data.
Please Send Me a Reminder When I Need a New Backup Provider
Security updates have been issued by Debian (xen), Fedora (clamav, community-mysql, dnsmasq, flatpak, libtasn1, mupdf, p7zip, rsync, squid, thunderbird, tomcat, unbound, and zziplib), Mageia (clamav, curl, dovecot, ffmpeg, gcab, kernel, libtiff, libvpx, php-smarty, pure-ftpd, redis, and thunderbird), openSUSE (apache-commons-email), Red Hat (rh-mariadb100-mariadb), SUSE (firefox), and Ubuntu (clamav, squid3, and systemd).
Security updates have been issued by Debian (dokuwiki and p7zip), Fedora (kernel, pdns, rsync, and webkitgtk4), openSUSE (chromium and translate-toolkit), Red Hat (jboss-ec2-eap and Red Hat Satellite 6), Slackware (php), and SUSE (bind and firefox).
Security updates have been issued by Debian (chromium-browser, krb5, and smarty3), Fedora (firefox, GraphicsMagick, and moodle), Mageia (rsync), openSUSE (bind, chromium, freeimage, gd, GraphicsMagick, libtasn1, libvirt, nodejs6, php7, systemd, and webkit2gtk3), Red Hat (chromium-browser, systemd, and thunderbird), Scientific Linux (systemd), and Ubuntu (curl, firefox, and ruby2.3).
Post Syndicated from Yev original https://www.backblaze.com/blog/500-petabytes-and-counting/
It seems like only yesterday that we crossed the 350 petabyte mark. It was actually June 2017, but boy have we been growing since. In October 2017 we crossed 400 petabytes. Today, we’re proud to announce we’ve crossed the 500 petabyte mark. That’s a very healthy clip, see for yourself!
Whether you have 50 GB, 500 GB or are just an avid blog reader, thank you for being on this incredible journey with us through the years.
…we’re literally moving at 1,000,000 files per hour.
We’re extremely proud of our track record. Throughout these 11 years we’ve striven to be the simplest, fastest, and most affordable online backup (and now cloud storage) solution available. We’re not just focusing on data ingress, but also adhering to our original goal of making sure that “no one ever loses data again.” How quickly are we restoring data? On average, we’re literally moving at 1,000,000 files per hour.
Even after all these years, one of the most frequent questions asked is, “How has Backblaze maintained such affordable pricing, particularly when the industry continues to move away from unlimited data plans?”
The cloud storage industry is very competitive, with cloud sync, storage, and backup providers leaving the unlimited market every single day: OneDrive, Amazon Cloud Storage, and most recently CrashPlan. Other providers either have tiered pricing (iDrive), or charge almost double or even triple for all the features we provide for our unlimited backup service (Carbonite). So how do we do it?
The answer comes down to our relentless pursuit of lowering costs. Our open-source Backblaze Storage Pods comprise our Backblaze Vaults, and the less expensive and more performant our Storage Pods are, the better the service that we can provide. This all directly translates into the service and pricing we can offer you.
A key part of our service is to be as open as possible with our costs and structure. After all, you are entrusting us with some of your most valuable assets. Still, it is very difficult to find an apples to apples comparison to what our competitors are doing. For example, we can gain some insight from a 2011 interview with Carbonite’s CEO, who gave an interview in which he said Carbonite’s cost of storing a petabyte was $250,000. At the time, our cost to store a petabyte was $76,481 (more on that calculation can be found here and here). If Backblaze’s fundamental cost to store data is one-third that of Carbonite’s, it makes sense that Carbonite’s cost to its customers would be more than Backblaze’s. Today, Backblaze backup is $50/year and Carbonite’s equivalent service is $149.99.
Our continued focus on reducing costs has allowed us to maintain a healthy business. And after accepting customer data for almost 10 years, we sincerely want to thank you all for giving us your trust, and allowing us to protect your important data and memories for you. Here’s to the next 500 petabytes; they’ll be here before we know it.
Since publishing this post, we have posted the latest in our series of Hard Drive Stats, in which we summarize the performance of the hard drives we used in our data centers in 2017 and previously.
“Не го насилвай, вземи по-голям чук”
Каня се от много време да направя debugging workshop, и около мисленето как точно да стане днес стигнах до интересен извод за инструментите, дето ползвам и си правя за дебъгващи цели и като цяло за разни мои начини на работа.
Чукът е хубаво нещо. Какъвто и проблем да имаш, след удара с чука резултатът има същия вид (сплескан) и донякъде ми се вижда като хубава метафора за начина, по който оправям някакви проблеми. Той може да се опише като “най-краткия и прост начин за достигане на нужното крайно състояние, без да има особено значение какво е началното.
Като за пример, тия дни ми се налагаше да подменя едно парче софтуер в 50-тина клъстера, като всеки от тях имаше м/у 3 и 50 машини. Понеже инструментите, които имам са pssh и pscp, се оказа най-лесно на един пас да копирам нужните файлове по всички сървъри, и на втори пас да се логне pssh и ако трябва, да копира където трябва, иначе просто да изтрие това, което бях копирал. Някакъв по-подреден начин би било да извадя списък на всички машини, на които има нужда да се направи действието и да го направя само там, но щях да го напиша и направя по-бавно, отколкото по грубия и бърз начин.
По подобен начин за друг инструмент си бях написал скрипт, който го налива в цял клъстер и отделен, който го update-ва. В един момент осъзнах, че това е тъпо и направих инсталатора така, че да не му пука, ако има вече нещо инсталирано и просто спокойно да може да слага отгоре (както и ако го прекъсна и го пусна пак, да свърши пак нужната работа). Крайният резултат беше, че общото количество код намаля.
Принципът изглежда да може да се приложи към любимите ми начини за дебъгване – това, което ползвай най-често е strace, което спокойно може да се опише като един от най-тежките чукове за дебъгване. Почти без значение какво дебъгвам – компилиран C код, php, python, perl, java – успявам да видя симптомите и да се ориентирам какво става, въпреки че като цяло за всеки от тия езици има специализиран и вероятно доста по-нежен вариант да се гледа какво става.
(искам да отбележа, че има и други тежки случаи – имам колега, който за да смята някакви математически изрази от време на време вместо да си пусне някакъв калкулатор като bc, пуска gdb и прави в него нещо като “p 1024*1024*231/1.1”)
Замислил се бях дали това всъщност не е погрешно и че трябва да се избягва, и стигнах до извода, че не виждам друг работещ начин. Много често ни се налага да дебъгваме чужд код (който сме link-нали/който е под нас някъде/от който зависим, или просто това са ни изсипали) и вариантът да го прочетем и разберем не е опция, понеже в наши дни почти няма проекти, които да могат да бъдат изчетени и опознати за под седмица-две (рекордно малкият код, който в една от фирмите, в които съм работил и търкаляше основните услуги беше около 20000 реда, което е горе-долу в човешките възможности, и пак ще отнеме доста време да се разгледа, а фирмата в това отношение беше сериозно изключение). Това води до нуждата за всякакви помощни средства, за да можем да се справим, понеже човешката глава има сериозни ограничения по темата, и тук на помощ ни идват чуковете, с които всеки проблем може да бъде сведен до пирон (или хлебарка, която трябва да се прасне достатъчно силно).
(да не говорим, че хората искат да пишат умно, и колкото по-умно пишат, толкова по-трудно се дебъгва това, което са сътворили)
Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/the-floodgates-are-open-increased-network-bandwidth-for-ec2-instances/
I hope that you have configured your AMIs and your current-generation EC2 instances to use the Elastic Network Adapter (ENA) that I told you about back in mid-2016. The ENA gives you high throughput and low latency, while minimizing the load on the host processor. It is designed to work well in the presence of multiple vCPUs, with intelligent packet routing backed up by multiple transmit and receive queues.
Today we are opening up the floodgates and giving you access to more bandwidth in all AWS Regions. Here are the specifics (in each case, the actual bandwidth is dependent on the instance type and size):
EC2 to S3 – Traffic to and from Amazon Simple Storage Service (S3) can now take advantage of up to 25 Gbps of bandwidth. Previously, traffic of this type had access to 5 Gbps of bandwidth. This will be of benefit to applications that access large amounts of data in S3 or that make use of S3 for backup and restore.
EC2 to EC2 – Traffic to and from EC2 instances in the same or different Availability Zones within a region can now take advantage of up to 5 Gbps of bandwidth for single-flow traffic, or 25 Gbps of bandwidth for multi-flow traffic (a flow represents a single, point-to-point network connection) by using private IPv4 or IPv6 addresses, as described here.
EC2 to EC2 (Cluster Placement Group) – Traffic to and from EC2 instances within a cluster placement group can continue to take advantage of up to 10 Gbps of lower-latency bandwidth for single-flow traffic, or 25 Gbps of lower-latency bandwidth for multi-flow traffic.
To take advantage of this additional bandwidth, make sure that you are using the latest, ENA-enabled AMIs on current-generation EC2 instances. ENA-enabled AMIs are available for Amazon Linux, Ubuntu 14.04 & 16.04, RHEL 7.4, SLES 12, and Windows Server (2008 R2, 2012, 2012 R2, and 2016). The FreeBSD AMI in AWS Marketplace is also ENA-enabled, as is VMware Cloud on AWS.
Security updates have been issued by CentOS (389-ds-base, dhcp, kernel, and nautilus), Debian (curl, openssh, and wireshark), Fedora (clamav, firefox, java-9-openjdk, and poco), Gentoo (clamav), openSUSE (curl, libevent, mupdf, mysql-community-server, newsbeuter, php5, redis, and tre), Oracle (389-ds-base, dhcp, kernel, and nautilus), Slackware (mozilla), and Ubuntu (kernel and linux-hwe, linux-azure, linux-gcp, linux-oem).
An ETL (Extract, Transform, Load) process enables you to load data from source systems into your data warehouse. This is typically executed as a batch or near-real-time ingest process to keep the data warehouse current and provide up-to-date analytical data to end users.
Amazon Redshift is a fast, petabyte-scale data warehouse that enables you easily to make data-driven decisions. With Amazon Redshift, you can get insights into your big data in a cost-effective fashion using standard SQL. You can set up any type of data model, from star and snowflake schemas, to simple de-normalized tables for running any analytical queries.
To operate a robust ETL platform and deliver data to Amazon Redshift in a timely manner, design your ETL processes to take account of Amazon Redshift’s architecture. When migrating from a legacy data warehouse to Amazon Redshift, it is tempting to adopt a lift-and-shift approach, but this can result in performance and scale issues long term. This post guides you through the following best practices for ensuring optimal, consistent runtimes for your ETL processes:
- COPY data from multiple, evenly sized files.
- Use workload management to improve ETL runtimes.
- Perform table maintenance regularly.
- Perform multiple steps in a single transaction.
- Loading data in bulk.
- Use UNLOAD to extract large result sets.
- Use Amazon Redshift Spectrum for ad hoc ETL processing.
- Monitor daily ETL health using diagnostic queries.
1. COPY data from multiple, evenly sized files
Amazon Redshift is an MPP (massively parallel processing) database, where all the compute nodes divide and parallelize the work of ingesting data. Each node is further subdivided into slices, with each slice having one or more dedicated cores, equally dividing the processing capacity. The number of slices per node depends on the node type of the cluster. For example, each DS2.XLARGE compute node has two slices, whereas each DS2.8XLARGE compute node has 16 slices.
When you load data into Amazon Redshift, you should aim to have each slice do an equal amount of work. When you load the data from a single large file or from files split into uneven sizes, some slices do more work than others. As a result, the process runs only as fast as the slowest, or most heavily loaded, slice. In the example shown below, a single large file is loaded into a two-node cluster, resulting in only one of the nodes, “Compute-0”, performing all the data ingestion:
When splitting your data files, ensure that they are of approximately equal size – between 1 MB and 1 GB after compression. The number of files should be a multiple of the number of slices in your cluster. Also, I strongly recommend that you individually compress the load files using gzip, lzop, or bzip2 to efficiently load large datasets.
When loading multiple files into a single table, use a single COPY command for the table, rather than multiple COPY commands. Amazon Redshift automatically parallelizes the data ingestion. Using a single COPY command to bulk load data into a table ensures optimal use of cluster resources, and quickest possible throughput.
2. Use workload management to improve ETL runtimes
Use Amazon Redshift’s workload management (WLM) to define multiple queues dedicated to different workloads (for example, ETL versus reporting) and to manage the runtimes of queries. As you migrate more workloads into Amazon Redshift, your ETL runtimes can become inconsistent if WLM is not appropriately set up.
I recommend limiting the overall concurrency of WLM across all queues to around 15 or less. This WLM guide helps you organize and monitor the different queues for your Amazon Redshift cluster.
When managing different workloads on your Amazon Redshift cluster, consider the following for the queue setup:
- Create a queue dedicated to your ETL processes. Configure this queue with a small number of slots (5 or fewer). Amazon Redshift is designed for analytics queries, rather than transaction processing. The cost of COMMIT is relatively high, and excessive use of COMMIT can result in queries waiting for access to the commit queue. Because ETL is a commit-intensive process, having a separate queue with a small number of slots helps mitigate this issue.
- Claim extra memory available in a queue. When executing an ETL query, you can take advantage of the wlm_query_slot_count to claim the extra memory available in a particular queue. For example, a typical ETL process might involve COPYing raw data into a staging table so that downstream ETL jobs can run transformations that calculate daily, weekly, and monthly aggregates. To speed up the COPY process (so that the downstream tasks can start in parallel sooner), the wlm_query_slot_count can be increased for this step.
- Create a separate queue for reporting queries. Configure query monitoring rules on this queue to further manage long-running and expensive queries.
- Take advantage of the dynamic memory parameters. They swap the memory from your ETL to your reporting queue after the ETL job has completed.
3. Perform table maintenance regularly
Amazon Redshift is a columnar database, which enables fast transformations for aggregating data. Performing regular table maintenance ensures that transformation ETLs are predictable and performant. To get the best performance from your Amazon Redshift database, you must ensure that database tables regularly are VACUUMed and ANALYZEd. The Analyze & Vacuum schema utility helps you automate the table maintenance task and have VACUUM & ANALYZE executed in a regular fashion.
- Use VACUUM to sort tables and remove deleted blocks
During a typical ETL refresh process, tables receive new incoming records using COPY, and unneeded data (cold data) is removed using DELETE. New rows are added to the unsorted region in a table. Deleted rows are simply marked for deletion.
DELETE does not automatically reclaim the space occupied by the deleted rows. Adding and removing large numbers of rows can therefore cause the unsorted region and the number of deleted blocks to grow. This can degrade the performance of queries executed against these tables.
After an ETL process completes, perform VACUUM to ensure that user queries execute in a consistent manner. The complete list of tables that need VACUUMing can be found using the Amazon Redshift Util’s table_info script.
Use the following approaches to ensure that VACCUM is completed in a timely manner:
- Use wlm_query_slot_count to claim all the memory allocated in the ETL WLM queue during the VACUUM process.
- DROP or TRUNCATE intermediate or staging tables, thereby eliminating the need to VACUUM them.
- If your table has a compound sort key with only one sort column, try to load your data in sort key order. This helps reduce or eliminate the need to VACUUM the table.
- Consider using time series This helps reduce the amount of data you need to VACUUM.
- Use ANALYZE to update database statistics
Amazon Redshift uses a cost-based query planner and optimizer using statistics about tables to make good decisions about the query plan for the SQL statements. Regular statistics collection after the ETL completion ensures that user queries run fast, and that daily ETL processes are performant. The Amazon Redshift utility table_info script provides insights into the freshness of the statistics. Keeping the statistics off (pct_stats_off) less than 20% ensures effective query plans for the SQL queries.
4. Perform multiple steps in a single transaction
ETL transformation logic often spans multiple steps. Because commits in Amazon Redshift are expensive, if each ETL step performs a commit, multiple concurrent ETL processes can take a long time to execute.
To minimize the number of commits in a process, the steps in an ETL script should be surrounded by a BEGIN…END statement so that a single commit is performed only after all the transformation logic has been executed. For example, here is an example multi-step ETL script that performs one commit at the end:
5. Loading data in bulk
Amazon Redshift is designed to store and query petabyte-scale datasets. Using Amazon S3 you can stage and accumulate data from multiple source systems before executing a bulk COPY operation. The following methods allow efficient and fast transfer of these bulk datasets into Amazon Redshift:
- Use a manifest file to ingest large datasets that span multiple files. The manifest file is a JSON file that lists all the files to be loaded into Amazon Redshift. Using a manifest file ensures that Amazon Redshift has a consistent view of the data to be loaded from S3, while also ensuring that duplicate files do not result in the same data being loaded more than one time.
- Use temporary staging tables to hold the data for transformation. These tables are automatically dropped after the ETL session is complete. Temporary tables can be created using the CREATE TEMPORARY TABLE syntax, or by issuing a SELECT … INTO #TEMP_TABLE query. Explicitly specifying the CREATE TEMPORARY TABLE statement allows you to control the DISTRIBUTION KEY, SORT KEY, and compression settings to further improve performance.
- User ALTER table APPEND to swap data from the staging tables to the target table. Data in the source table is moved to matching columns in the target table. Column order doesn’t matter. After data is successfully appended to the target table, the source table is empty. ALTER TABLE APPEND is much faster than a similar CREATE TABLE AS or INSERT INTO operation because it doesn’t involve copying or moving data.
6. Use UNLOAD to extract large result sets
Fetching a large number of rows using SELECT is expensive and takes a long time. When a large amount of data is fetched from the Amazon Redshift cluster, the leader node has to hold the data temporarily until the fetches are complete. Further, data is streamed out sequentially, which results in longer elapsed time. As a result, the leader node can become hot, which not only affects the SELECT that is being executed, but also throttles resources for creating execution plans and managing the overall cluster resources. Here is an example of a large SELECT statement. Notice that the leader node is doing most of the work to stream out the rows:
Use UNLOAD to extract large results sets directly to S3. After it’s in S3, the data can be shared with multiple downstream systems. By default, UNLOAD writes data in parallel to multiple files according to the number of slices in the cluster. All the compute nodes participate to quickly offload the data into S3.
If you are extracting data for use with Amazon Redshift Spectrum, you should make use of the MAXFILESIZE parameter to and keep files are 150 MB. Similar to item 1 above, having many evenly sized files ensures that Redshift Spectrum can do the maximum amount of work in parallel.
7. Use Redshift Spectrum for ad hoc ETL processing
Events such as data backfill, promotional activity, and special calendar days can trigger additional data volumes that affect the data refresh times in your Amazon Redshift cluster. To help address these spikes in data volumes and throughput, I recommend staging data in S3. After data is organized in S3, Redshift Spectrum enables you to query it directly using standard SQL. In this way, you gain the benefits of additional capacity without having to resize your cluster.
For tips on getting started with and optimizing the use of Redshift Spectrum, see the previous post, 10 Best Practices for Amazon Redshift Spectrum.
8. Monitor daily ETL health using diagnostic queries
Monitoring the health of your ETL processes on a regular basis helps identify the early onset of performance issues before they have a significant impact on your cluster. The following monitoring scripts can be used to provide insights into the health of your ETL processes:
|commit_stats.sql – Commit queue statistics from past days, showing largest queue length and queue time first||DML statements such as INSERT/UPDATE/COPY/DELETE operations take several times longer to execute when multiple of these operations are in progress||Set up separate WLM queues for the ETL process and limit the concurrency to < 5.|
|copy_performance.sql – Copy command statistics for the past days||Daily COPY operations take longer to execute||• Follow the best practices for the COPY command.
• Analyze data growth with the incoming datasets and consider cluster resize to meet the expected SLA.
|table_info.sql – Table skew and unsorted statistics along with storage and key information||Transformation steps take longer to execute||• Set up regular VACCUM jobs to address unsorted rows and claim the deleted blocks so that transformation SQL execute optimally.
• Consider a table redesign to avoid data skewness.
|v_check_transaction_locks.sql – Monitor transaction locks||INSERT/UPDATE/COPY/DELETE operations on particular tables do not respond back in timely manner, compared to when run after the ETL||Multiple DML statements are operating on the same target table at the same moment from different transactions. Set up ETL job dependency so that they execute serially for the same target table.|
|v_get_schema_priv_by_user.sql – Get the schema that the user has access to||Reporting users can view intermediate tables||Set up separate database groups for reporting and ETL users, and grants access to objects using GRANT.|
|v_generate_tbl_ddl.sql – Get the table DDL||You need to create an empty table with same structure as target table for data backfill||Generate DDL using this script for data backfill.|
|v_space_used_per_tbl.sql – monitor space used by individual tables||Amazon Redshift data warehouse space growth is trending upwards more than normal||
Analyze the individual tables that are growing at higher rate than normal. Consider data archival using UNLOAD to S3 and Redshift Spectrum for later analysis.
Use unscanned_table_summary.sql to find unused table and archive or drop them.
|top_queries.sql – Return the top 50 time consuming statements aggregated by its text||ETL transformations are taking longer to execute||Analyze the top transformation SQL and use EXPLAIN to find opportunities for tuning the query plan.|
There are several other useful scripts available in the amazon-redshift-utils repository. The AWS Lambda Utility Runner runs a subset of these scripts on a scheduled basis, allowing you to automate much of monitoring of your ETL processes.
Example ETL process
The following ETL process reinforces some of the best practices discussed in this post. Consider the following four-step daily ETL workflow where data from an RDBMS source system is staged in S3 and then loaded into Amazon Redshift. Amazon Redshift is used to calculate daily, weekly, and monthly aggregations, which are then unloaded to S3, where they can be further processed and made available for end-user reporting using a number of different tools, including Redshift Spectrum and Amazon Athena.
Step 1: Extract from the RDBMS source to a S3 bucket
In this ETL process, the data extract job fetches change data every 1 hour and it is staged into multiple hourly files. For example, the staged S3 folder looks like the following:
Organizing the data into multiple, evenly sized files enables the COPY command to ingest this data using all available resources in the Amazon Redshift cluster. Further, the files are compressed (gzipped) to further reduce COPY times.
Step 2: Stage data to the Amazon Redshift table for cleansing
Ingesting the data can be accomplished using a JSON-based manifest file. Using the manifest file ensures that S3 eventual consistency issues can be eliminated and also provides an opportunity to dedupe any files if needed. A sample manifest20170702.json file looks like the following:
The data can be ingested using the following command:
Because the downstream ETL processes depend on this COPY command to complete, the wlm_query_slot_count is used to claim all the memory available to the queue. This helps the COPY command complete as quickly as possible.
Step 3: Transform data to create daily, weekly, and monthly datasets and load into target tables
Data is staged in the “stage_tbl” from where it can be transformed into the daily, weekly, and monthly aggregates and loaded into target tables. The following job illustrates a typical weekly process:
As shown above, multiple steps are combined into one transaction to perform a single commit, reducing contention on the commit queue.
Step 4: Unload the daily dataset to populate the S3 data lake bucket
The transformed results are now unloaded into another S3 bucket, where they can be further processed and made available for end-user reporting using a number of different tools, including Redshift Spectrum and Amazon Athena.
Amazon Redshift lets you easily operate petabyte-scale data warehouses on the cloud. This post summarized the best practices for operating scalable ETL natively within Amazon Redshift. I demonstrated efficient ways to ingest and transform data, along with close monitoring. I also demonstrated the best practices being used in a typical sample ETL workload to transform the data into Amazon Redshift.
If you have questions or suggestions, please comment below.
About the Author
Thiyagarajan Arumugam is a Big Data Solutions Architect at Amazon Web Services and designs customer architectures to process data at scale. Prior to AWS, he built data warehouse solutions at Amazon.com. In his free time, he enjoys all outdoor sports and practices the Indian classical drum mridangam.
Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/digital-media-management/
This post summarizes the responses we received to our November 28 post asking our readers how they handle the challenge of digital asset management (DAM). You can read the previous posts in this series below:
- What’s the Best Solution for Managing Digital Photos and Videos?
- An Introduction to Managing Digital Photos and Videos
This past November, we published a blog post entitled What’s the Best Solution for Managing Digital Photos and Videos? We asked our readers to tell us how they’re currently backing up their digital media assets and what their ideal system might be. We posed these questions:
- How are you currently backing up your digital photos, video files, and/or file libraries/catalogs? Do you have a backup system that uses attached drives, a local network, the cloud, or offline storage media? Does it work well for you?
- Imagine your ideal digital asset backup setup. What would it look like? Don’t be constrained by current products, technologies, brands, or solutions. Invent a technology or product if you wish. Describe an ideal system that would work the way you want it to.
We were thrilled to receive a large number of responses from readers. What was clear from the responses is that there is no consensus on solutions for either amateur or professional, and that users had many ideas for how digital media management could be improved to meet their needs.
We asked our readers to contribute to this dialog for a number of reasons. As a cloud backup and cloud storage service provider, we want to understand how our users are working with digital media so we know how to improve our services. Also, we want to participate in the digital media community, and hope that sharing the challenges our readers are facing and the solutions they are using will make a contribution to that community.
The State of Managing Digital Media
While a few readers told us they had settled on a system that worked for them, most said that they were still looking for a better solution. Many expressed frustration with dealing with the growing amount of data for digital photos and videos that is only getting larger with the increasing resolution of still and video cameras. Amateurs are making do with a number of consumer services, while professionals employ a wide range of commercial, open source, or jury rigged solutions for managing data and maintaining its integrity.
I’ve summarized the responses we received in three sections on, 1) what readers are doing today, 2) common wishes they have for improvements, and 3) concerns that were expressed by a number of respondents.
The Digital Media Workflow
Protecting Media From Camera to Cloud
We heard from a wide range of smartphone users, DSLR and other format photographers, and digital video creators. Speed of operation, the ability to share files with collaborators and clients, and product feature sets were frequently cited as reasons for selecting their particular solution. Also of great importance was protecting the integrity of media through the entire capture, transfer, editing, and backup workflow.
Avid Media Composer
- Many readers said they backed up their camera memory cards as soon as possible to a computer or external drive and erased cards only when they had more than one backup of the media. Some said that they used dual memory cards that are written to simultaneously by the camera for peace-of-mind.
- While some cameras now come equipped with Wi-Fi, no one other than smartphone users said they were using Wi-Fi as part of their workflow. Also, we didn’t receive feedback from any photographers who regularly shoot tethered.
- Some readers said they still use CDs and DVDs for storing media. One user admitted to previously using VHS tape.
- NAS (Network Attached Storage) is in wide use. Synology, Drobo, FreeNAS, and other RAID and non-RAID storage devices were frequently mentioned.
- A number were backing up their NAS to the cloud for archiving. Others said they had duplicate external drives that were stored onsite or offsite, including in a physical safe, other business locations, a bank lock box, and even “mom’s house.”
- Many said they had regular backup practices, including nightly backups, weekly and other regularly scheduled backups, often in non-work hours.
- One reader said that a monthly data scrub was performed on the NAS to ensure data integrity.
- Hardware used for backups included Synology, QNAP, Drobo, and FreeNAS systems.
- Services used by our readers for backing up included Backblaze Backup, Backblaze B2 Cloud Storage, CrashPlan, SmugMug, Amazon Glacier, Google Photos, Amazon Prime Photos, Adobe Creative Cloud, Apple Photos, Lima, DropBox, and Tarsnap. Some readers made a distinction between how they used sync (such as DropBox), backup (such as Backblaze Backup), and storage (such as Backblaze B2), but others did not. (See Sync vs. Backup vs. Storage on our blog for an explanation of the differences.)
- Software used for backups and maintaining file integrity included Arq, Carbon Copy Cloner, ChronoSync, SoftRAID, FreeNAS, corz checksum, rclone, rsync, Apple Time Machine, Capture One, Btrfs, BorgBackup, SuperDuper, restic, Acronis True Image, custom Python scripts, and smartphone apps PhotoTransfer and PhotoSync.
- Cloud torrent services mentioned were Offcloud, Bitport, and Seedr.
- A common practice mentioned is to use SSD (Solid State Drives) in the working computer or attached drives (or both) to improve speed and reliability. Protection from magnetic fields was another reason given to use SSDs.
- Many users copy their media to multiple attached or network drives for redundancy.
- Users of Lightroom reported keeping their Lightroom catalog on a local drive and their photo files on an attached drive. They frequently had different backup schemes for the catalog and the media. Many readers are careful to have multiple backups of their Lightroom catalog. Some expressed the desire to back up both their original raw files and their edited (working) raw files, but limitations in bandwidth and backup media caused some to give priority to good backups of their raw files, since the edited files could be recreated if necessary.
- A number of smartphone users reported using Apple or Google Photos to store their photos and share them.
Digital Editing and Enhancement
Adobe still rules for many users for photo editing. Some expressed interest in alternatives from Phase One, Skylum (formerly Macphun), ON1, and DxO.
- While Adobe Lightroom (and Adobe Photoshop for some) are the foundation of many users’ photo media workflow, others are still looking for something that might better suit their needs. A number of comments were made regarding Adobe’s switch to a subscription model.
- Software used for image and video editing and enhancement included Adobe Lightroom, Adobe Photoshop, Luminar, Affinity Photo, Phase One, DxO, ON1, GoPro Quik, Apple Aperture (discontinued), Avid Media Composer, Adobe Premiere, and Apple Final Cut Studio (discontinued) or Final Cut Pro.
Luminar 2018 DAM preview
Managing, Archiving, Adding Metadata, Searching for Media Files
While some of our respondents are casual or serious amateur digital media users, others make a living from digital photography and videography. A number of our readers report having hundreds of thousands of files and many terabytes of data — even approaching one petabyte of data for one professional who responded. Whether amateur or professional, all shared the desire to preserve their digital media assets for the future. Consequently, they want to be able to attach metadata quickly and easily, and search for and retrieve files from wherever they are stored when necessary.
- It’s not surprising that metadata was of great interest to our readers. Tagging, categorizing, and maintaining searchable records is important to anyone dealing with digital media.
- While Lightroom was frequently used to manage catalogs, metadata, and files, others used spreadsheets to record archive location and grep for searching records.
- Some liked the idea of Adobe’s Creative Cloud but weren’t excited about its cost and lack of choice in cloud providers.
- Others reported using Photo Mechanic, DxO, digiKam, Google Photos, Daminion, Photo Supreme, Phraseanet, Phase One Media Pro, Google Picasa (discontinued), Adobe Bridge, Synology Photo Station, FotoStation, PhotoShelter, Flickr, and SmugMug.
Photo Mechanic 5
Common Wishes For Managing Digital Media in the Future
Our readers came through with numerous suggestions for how digital media management could be improved. There were a number of common themes centered around bigger and better storage, faster broadband or other ways to get data into the cloud, managing metadata, and ensuring integrity of their data.
- Many wished for faster internet speeds that would make transferring and backing up files more efficient. This desire was expressed multiple times. Many said that the sheer volume of digital data they worked with made cloud services and storage impractical.
- A number of readers would like the option to be able to ship files on a physical device to a cloud provider so that the initial large transfer would not take as long. Some wished to be able to send monthly physical transfers with incremental transfers send over the internet. (Note that Backblaze supports adding data via a hardware drive to B2 Cloud Storage with our Fireball service.)
- Reasonable service cost, not surprisingly, was a desire expressed by just about everyone.
- Many wished for not just backup, but long-term archiving of data. One suggestion was to be able to specify the length-of-term for archiving and pay by that metric for specific sets of files.
- An easy-to-use Windows, Macintosh, or Linux client was a feature that many appreciated. Some were comfortable with using third-party apps for cloud storage and others wanted a vendor-supplied client.
- A number of users like the combination of NAS and cloud. Many backed up their NAS devices to the cloud. Some suggested that the NAS should be the local gateway to unlimited virtual storage in the cloud. (They should read our recent blog post on Morro Data’s CloudNAS solution.)
- Some just wanted the storage problem solved. They would like the computer system to manage storage intelligently so they don’t have to. One reader said that storage should be managed and optimized by the system, as RAM is, and not by the user.
Common Concerns Expressed by our Readers
Over and over again our readers expressed similar concerns about the state of digital asset management.
- Dealing with large volumes of data was a common challenge. As digital media files increase in size, readers struggle to manage the amount of data they have to deal with. As one reader wrote, “Why don’t I have an online backup of my entire library? Because it’s too much damn data!”
- Many said they would back up more often, or back up even more files if they had the bandwidth or storage media to do so.
- The cloud is attractive to many, but some said that they didn’t have the bandwidth to get their data into the cloud in an efficient manner, the cloud is too expensive, or they have other concerns about trusting the cloud with their data.
- Most of our respondents are using Apple computer systems, some Windows, and a few Linux. A lot of the Mac users are using Time Machine. Some liked the concept of Time Machine but said they had experienced corrupted data when using it.
- Visibility into the backup process was mentioned many times. Users want to know what’s happening to their data. A number said they wanted automatic integrity checks of their data and reports sent to them if anything changes.
- A number of readers said they didn’t want to be locked into one vendor’s proprietary solution. They prefer open standards to prevent loss if a vendor leaves the market, changes the product, or makes a turn in strategy that they don’t wish to follow.
- A number of users talked about how their practices differed depending on whether they were working in the field or working in a studio or at home. Access to the internet and data transfer speed was an issue for many.
- It’s clear that people working in high resolution photography and videography are pushing the envelope for moving data between storage devices and the cloud.
- Some readers expressed concern about the integrity of their stored data. They were concerned that over time, files would degrade. Some asked for tools to verify data integrity manually, or that data integrity should be monitored and reported by the storage vendor on a regular basis. The OpenZFS and Btrfs file systems were mentioned by some.
- A few readers mentioned that they preferred redundant data centers for cloud storage.
- Metadata is an important element for many, and making sure that metadata is easily and permanently associated with their files is essential.
- The ability to share working files with collaborators or finished media with clients, friends, and family also is a common requirement.
Thank You for Your Comments and Suggestions
As a cloud backup and storage provider, your contributions were of great interest to us. A number of readers made suggestions for how we can improve or augment our services to increase the options for digital media management. We listened and are considering your comments. They will be included in our discussions and planning for possible future services and offerings from Backblaze. We thank everyone for your contributions.
Digital media management
Let’s Keep the Conversation Going!
Were you surprised by any of the responses? Do you have something further to contribute? This is by no means the end of our exploration of how to better serve media professionals, so let’s keep the lines of communication open.
Bring it on in the comments!
Security updates have been issued by Debian (bind9, couchdb, lucene-solr, mysql-5.5, openocd, and php5), Mageia (gdk-pixbuf2.0, golang, and mariadb), openSUSE (curl, gd, ImageMagick, lxterminal, ncurses, newsbeuter, perl-XML-LibXML, and xmltooling), Oracle (kernel), and SUSE (xmltooling).
This post contributed by AWS Senior Cloud Infrastructure Architect Anabell St Vincent.
Some systems or applications require Transport Layer Security (TLS) traffic from the client all the way through to the Docker container, without offloading or terminating certificates at a load balancer. Some highly time-sensitive services may require communication over TLS without any decryption and re-encryption in the communication path.
There are multiple options for this type of implementation on AWS. One option is to use a service discovery tool to implement the requirements, but that creates overhead from an implementation and management perspective. In this post, I examine the option of using Amazon Elastic Container Service (Amazon ECS) with a Network Load Balancer.
Amazon ECS is a highly scalable, high-performance service for running Docker containers on AWS. It is integrated with each of the Elastic Load Balancing load balancers offered by AWS:
- The Classic Load Balancer supports application or network level traffic and can be used to pass through the TLS traffic when configured with TCP listeners. This approach limits the number of containers to be deployed by Amazon ECS to one per EC2 host. This approach is only available for the ECS launch type of EC2, not Fargate. This is due to Classic Load Balancer not supporting target groups, so dynamic port mapping can’t be leveraged for this type of implementation.
- The Application Load Balancer functions at the application layer, the layer 7 of Open System Interconnection (OSI) and supports HTTP and HTTPS protocols. By operating at layer 7, the Application Load Balancer is able to route traffic based on the request path in the URL. It can also provide SSL offloading or termination of the SSL certificates for applications, by hosting the SSL certificate. The Application Load Balancer integrates well with ECS by providing the functionality of the target groups for the container instances, which allows for port mapping and targeting container groups. Port mappings allows the containers to run on different ports but still be represented by the one port configured on the ALB as part of the target group of the task.
- The Network Load Balancer, which is a high performance, ultra-low latency load balancer that operates at layer 4. It is designed to handle tens of millions of requests per second while maintaining high throughput at ultra-low latency. Operating at layer 4 networking means that the traffic is passed onto the targets based on the IP addresses and ports as oppose to session information or cookies, as is the case with Application Load Balancers. The Network Load Balancer also supports long-lived TCP connections, which are ideal for WebSocket type of applications . Some applications communicate on TCP protocols rather than solely on HTTP and HTTPS. The Network Load Balancer provides the support to load balance between services that require protocols besides HTTP and HTTPS.
The Network Load Balancer is the best option for managing secure traffic as it provides support for TCP traffic pass through, without decrypting and then re-encrypting the traffic. Additionally, the Network Load Balancer supports target groups, which means port mapping can be used, as well as configuring the load balancer as part of the task definition for the containers.
The diagram below shows a high-level architecture with TLS traffic coming from the internet that passes through the VPC to the Network Load Balancer, then to a container without offloading or terminating the certificate in the path.
In this diagram, there are three containers running in an ECS cluster. The ECS cluster can be either the EC2 or Fargate launch type.
The following diagram shows an architecture that contains two different services deployed in two different ECS clusters.
In this architecture, the containers in each of the clusters can reference the other containers securely via the Network Load Balancer without terminating or offloading the certificates until it reaches the destination container.
This approach addresses the requirements where containers need to communicate securely whether they are deployed in one VPC, across VPCs, or in separate AWS accounts.
The following diagram below shows both TLS connections from the internet, as well as connections from within the same cluster.
The above approach is achieved by using the same Network Load Balancer for the cluster to serve two different services. Implement this by using two different target groups (one for each service) and associating them with the ECS task definition. The containers use the DNS name associated with the Network Load Balancer to access the containers in the other service.
The architectures in this post show how the Network Load Balancer integrates seamlessly with ECS and other AWS services, providing end-to-end TLS communication across services without offloading or terminating the certificates. This gives you the ability to use dynamic ports in ECS containers. It can also handle tens of millions of requests per second while maintaining high throughput at ultra-low latency for applications that require the TCP protocol.
If you have questions or suggestions, please comment below.
Security updates have been issued by Debian (bind9, wordpress, and xbmc), Fedora (awstats, docker, gifsicle, irssi, microcode_ctl, mupdf, nasm, osc, osc-source_validator, and php), Gentoo (newsbeuter, poppler, and rsync), Mageia (gifsicle), Red Hat (linux-firmware and microcode_ctl), Scientific Linux (linux-firmware and microcode_ctl), SUSE (kernel and openssl), and Ubuntu (bind9, eglibc, glibc, and transmission).
This post courtesy of Shane Baldacchino, Solutions Architect at Amazon Web Services.
Many customers ask for guidance on migrating end-to-end solutions running on virtual machines over to AWS. This post provides an overview of moving a common WordPress blog running on a virtualized platform to AWS, including re-pointing the DNS records associated to with the website.
AWS Server Migration Service (AWS SMS) is an agentless service that makes it easier and faster for you to migrate thousands of on-premises workloads to AWS. AWS SMS allows you to automate, schedule, and track incremental replications of live server volumes, making it easier for you to coordinate large-scale server migrations.
The key elements of this migration process include the following steps:
- Establish your AWS environment.
- Replicate your database.
- Download the SMS Connector from the AWS Management Console.
- Configure AWS SMS and Hypervisor permissions.
- Install and configure the SMS Connector appliance.
- Import your virtual machine inventory and create a replication job.
- Launch your Amazon EC2 instance.
- Change your DNS records to resolve the WordPress blog to your EC2 instance.
Before you start, ensure that your source systems OS and vCenter version are supported by AWS. For more information, see the Server Migration Service FAQ.
Establish your AWS environment
For this walkthrough, your WordPress blog is currently running as a two-tier LAMP stack in a corporate data center. You have a frontend running Apache and PHP, plus a backend database running on MySQL. All systems are hosted on a virtualized platform.
First, establish your AWS environment. If your organization is new to AWS, this may include account or subaccount creation, a new virtual private cloud (VPC), and associated subnets, route tables, internet gateways, and so on. Think of this phase as setting up your software-defined data center. For more information, see Getting Started with Amazon EC2.
The blog is a two-tier stack, so go with two private subnets. Because you want it to be highly available, use multiple Availability Zones. A zone resides within an AWS Region. Each zone is isolated, but the zones within a region are connected through low-latency links. This allows architects and solution designers to build highly available solutions.
Replicate your database
WordPress uses a MySQL relational database. You could continue to manage MySQL and the associated EC2 instances associated with maintaining and scaling a database. For this walkthrough, use this opportunity to migrate to an RDS instance of Amazon Aurora, as it is a MySQL compliant database. Not only is Amazon Aurora a high-performant database engine but it frees you up to focus on application development by managing time-consuming database administration tasks, including backups, software patching, monitoring, scaling, and replication.
Use AWS Database Migration Service to migrate your MySQL database to Amazon Aurora easily and securely. After a database migration instance has been instantiated, configure the source and destination endpoints and create a replication task.
By attaching to the MySQL binlog, you can seed in the current data in the database and also capture all future state changes in near real time. For more information, see Migrating a MySQL-Compatible Database to Amazon Aurora.
Finally, the task shows that you are replicating current data in your WordPress blog database and future changes from MySQL into Amazon Aurora.
Download the SMS Connector from the AWS Management Console
Now, use AWS SMS to migrate your Apache PHP frontend to EC2. AWS SMS is delivered as an appliance for your hypervisor.
To download the SMS Connector, log in to the console and choose Server Migration Service, Connectors, SMS Connector setup guide.
Configure AWS SMS
Your hypervisor and AWS SMS will need an appropriate user with sufficient privileges to perform migrations.
- AWS SMS – Use the AWS CLI or the IAM console to create an IAM user with the ServerMigrationConnector policy attached.
- Hypervisor – Follow the specific instructions for your hypervisor in the Getting Started with AWS Server Migration Service.
Install and configure the SMS Connector appliance
Launch a new VM based on the SMS Connector that you downloaded. To configure the connector, connect to it via HTTPS. You can obtain the SMS Connector IP address from your hypervisor.
Connect to the SMS Connector via HTTPS. In the example above, the connector IP address is 10.0.0.31. In your browser, enter https://10.0.0.31.
Configure the connector with the IAM and hypervisor credentials that you created earlier.
After it’s configured, and the associated connectivity and authentication checks have passed, return to the console and view your connector in AWS SMS.
Import your virtual machine inventory and create a replication job
After validating that the SMS Connector is in a “HEALTHY” state, import your server catalog to AWS SMS. This process can take up to a minute.
Select the server to migrate and choose Create replication job. The console guides you through the process. The time that the initial replication task takes to complete is dependent on the available bandwidth and the size of your VM. After the initial seed replication, network bandwidth is minimized as AWS SMS replicates only incremental changes occurring on the VM.
Launch your EC2 instance
When your replication task is complete, the artifact created by AWS SMS is a custom AMI that you can use to deploy an EC2 instance. Follow the usual process to launch your EC2 instance, noting that you may need to replace any host-based firewalls with security groups and NACLs.
When you create an EC2 instance, ensure that you pick the most suitable EC2 instance type and size to match your performance requirements while optimizing for cost.
While your new EC2 instance is a replica of your on-premises VM, you should always validate that applications are functioning. How you do this differs on an application-by-application basis. You can use a combination of approaches, such as editing a local host file and testing your application, SSH, or Telnet.
From the RDS console, get your connection string details and update your WordPress configuration file to point to the Amazon Aurora database. As WordPress is expecting a MySQL database and Amazon Aurora is MySQL-compliant, this change of database engine is transparent to WordPress.
Change your DNS records to resolve the WordPress blog to your EC2 instance
You have validated that your WordPress application is running correctly, as you are still receiving changes from your on-premises data center via AWS DMS into your Amazon Aurora database.
You can now update your DNS zone file using Amazon Route 53. Route 53 can be driven by multiple methods: console, SDK, or AWS CLI.
For this walkthrough, update your DNS zone file via the AWS CLI. The JSON example shows upserting the A record in your zone to resolve to your EC2 instance.
Use the AWS CLI to execute the request and update the record in your zone file. The cut-over period between the original off-cloud location and AWS is defined by the TTL in the SOA (statement of authority) in your DNS zone. During this period, any requests resolving to your off-cloud server that result in database writes are automatically replicated to your Amazon Aurora instance via AWS DMS.
You have now successfully migrated your WordPress blog to AWS. Based on the TTL of your DNS zone file, end users slowly resolve the WordPress blog to AWS.
After you have validated your successful migration, be sure to delete your AWS DMS task and your AWS SMS replication job.
In this post, you moved a WordPress blog to AWS, using AWS SMS and AWS DMS to re-point the associated DNS records.
Many architectures can be extended to use many of the inherent benefits of AWS, with little effort. For example, by using Amazon CloudWatch metrics to drive Auto Scaling policies, you can use an Application Load Balancer as your frontend. This removes the single point of failure for a single Amazon EC2 instance and ensures that your deployed capacity closely follows customer demand. Think big and get building!
Security updates have been issued by Arch Linux (qtpass), Debian (libkohana2-php, libxml2, transmission, and xmltooling), Fedora (kernel and qpid-cpp), Gentoo (PolarSSL and xen), Mageia (flash-player-plugin, irssi, kernel, kernel-linus, kernel-tmb, libvorbis, microcode, nvidia-current, php & libgd, poppler, webkit2, and wireshark), openSUSE (gifsicle, glibc, GraphicsMagick, gwenhywfar, ImageMagick, libetpan, mariadb, pngcrush, postgresql94, rsync, tiff, and wireshark), and Oracle (kernel).
Security updates have been issued by Arch Linux (graphicsmagick and linux-lts), CentOS (thunderbird), Debian (kernel, opencv, php5, and php7.0), Fedora (electrum), Gentoo (libXfont), openSUSE (gimp, java-1_7_0-openjdk, and libvorbis), Oracle (thunderbird), Slackware (irssi), SUSE (kernel, kernel-firmware, and kvm), and Ubuntu (awstats, nvidia-graphics-drivers-384, python-pysaml2, and tomcat7, tomcat8).
Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/advanced-cloud-storage-tips/
If you’ve been using B2 Cloud Storage for a while, you probably think you know all that you can do with it. But do you?
We’ve put together a list of blazing power tips for experts and developers that will take you to the next level. Take a look below.
1 Manage File Versions
Use Lifecycle Rules on a Bucket to set how many days to keep files that are no longer the current version. This is a great way to manage the amount of space your B2 account is using.
2 Easily Stay on Top of Your B2 Account Limits
Set usage caps and get text/email alerts for your B2 account when you approach limits that you define.
3 Bring on Your Big Files
You can upload files as large as 10TB to B2.
4 You Can Use FedEx to Get Your Data into B2
If you have over 20TB of data, you can use Backblaze’s Fireball hard disk array to load large volumes of data directly into your B2 account. We ship a Fireball to you and you ship it back.
5 You Have Command-Line Control of All B2 Functions
You have complete control over B2 using our command line tool that is available for Macintosh, Windows, and Linux.
6 You Can Use Your Own Domain Name To Front a Public B2 Bucket
You can create a vanity URL for your B2 account.
7 See What’s Happening in Your Account with Graphical Reports
You can view graphical reports summarizing your B2 usage — transactions, downloads, averages, data stored — in your B2 account dashboard.
8 Create a B2 SDK
You can build your own B2 SDK for JVM-based or JVM-compatible languages using our B2 Java SDK on Github.
9 B2’s API is Easy to Use
B2’s API is similar to, but simpler than Amazon’s S3 API, making it super easy for developers to integrate with B2 Cloud Storage.
10 View Code Examples To Get Your B2 Project Started
The B2 API is well documented and has code examples for cURL, Java, Python, Swift, Ruby, C#, and PHP. For example, here’s how to create a B2 Bucket.
11 Developers can set the B2 part size as low as 5 MB
When working with large files, the minimum file part size can be set as low as 5MB or as high as 5GB. This gives developers the ability to maximize the throughput of B2 data uploads and downloads. See Large Files and Downloading for more developer tips.
12 Your App or Device Can Work with B2, as well
Your B2 integration can be listed on Backblaze’s website. Visit Submit an Integration to get started.