Strasbourg Observers традиционно обявяват най-добро и най-лошо решение на ЕСПЧ всяка година. За най-лошо решение от 2017 г. е обявено особеното мнение по делото Bayev v. Russiaотносно закона за анти-гей-пропагандата в Русия: “Хомофобският характер на несъгласието на съдията от Русия относно така наречения гей пропаганден закон беше толкова шокиращ за нашите читатели, че спечели наградата за най-лошото решение, въпреки че технически не е самостоятелно решение, а само особено мнение”.
Делото се отнася до молбите на руски активисти за правата на хомосексуалните, всеки от които е признат за виновен за административното нарушение на “обществени дейности, насочени към насърчаване на хомосексуалността сред малолетните и непълнолетните”. Първият жалбоподател е провел демонстрация пред средно училище с две знамена, на които пише “Хомосексуализмът е нормален” и “Гордея се с моята хомосексуалност”. Вторият и третият кандидат демонстрират пред детска библиотека с банери, на които е написано, че “Русия има най-високата степен на тийнейджърско самоубийство в света, вкл. хомосексуалисти предприемат тази стъпка поради липсата на информация. Депутатите са убийци на деца. Хомосексуализмът е добър! ” и “Децата имат право да знаят. Големите хора също са понякога хомосексуални. Хомосексуалните също стават страхотни. Хомосексуалността е естествена и нормална “.
Жалбоподателите твърдят пред ЕСПЧ, че руското законодателство нарушава член 10 от ЕКПЧ и е дискриминационно, тъй като не се прилагат подобни ограничения по отношение на хетеросексуалното мнозинство.
Решението
Намеса в свободата на изразяване съществува, чл.10.2 ЕКПЧ предвижда възможност за намеса поради причини, свързани с морала и здравето, ЕСПЧ прави оценка дали в случая намесата има легитимна цел.
ЕСПЧ не вижда причина социалното приемане на хомосексуалността да е несъвместимо с поддържането на семейни ценности. Както е посочено в решението по делото Kozak v Полша, няма приет правилен начин за лицата да водят личния си семеен или личен живот.
Неприемливи са опитите да се правят паралели между хомосексуалността и педофилията. Дори мнозинството от руснаците да имат отрицателно мнение за хомосексуалността, би било несъвместимо с основните ценности на Конвенцията, ако упражняването на права от малцинствена група е обусловено от приемането й от мнозинството.
Правителството твърди, че насърчаването на взаимоотношения между лица от един и същ пол трябва да бъде забранено, тъй като отношенията между тях са риск за общественото здраве и демографското развитие. ЕСПЧ не вижда как подобен закон би могъл да помогне за постигането на желаните демографски цели или как липсата на такъв закон би ги засегнала неблагоприятно.
Правителството не е доказало и как педофилията и порнографията сред малолетните и непълнолетните (независимо от сексуалната ориентация на засегнатите лица) са свързани с хомосексуалността и с този закон.
Въпросните правни разпоредби не служат за постигане на легитимната цел на защитата на морала, защита на здравето и защита на правата на другите. Чрез приемането на такива закони властите засилват стигмата и предразсъдъците и насърчават хомофобията, която е несъвместима с понятията за равенство, плурализъм и толерантност, присъщи на едно демократично общество. Нарушение на член 10 от ЕКПЧ.
Особеното мнение може да се прочете на сайта на ЕСПЧ. Според него децата трябва да се консултират предимно с родителите си или близки членове на семейството, вместо да получават информация за секса от плакати на улицата, а също се твърди, че ЕСПЧ не е взел сериозно предвид факта, че личният живот на децата е по-важен от свободата на изразяване на хомосексуалистите.
Last year, we released Amazon Connect, a cloud-based contact center service that enables any business to deliver better customer service at low cost. This service is built based on the same technology that empowers Amazon customer service associates. Using this system, associates have millions of conversations with customers when they inquire about their shipping or order information. Because we made it available as an AWS service, you can now enable your contact center agents to make or receive calls in a matter of minutes. You can do this without having to provision any kind of hardware. 2
There are several advantages of building your contact center in the AWS Cloud, as described in our documentation. In addition, customers can extend Amazon Connect capabilities by using AWS products and the breadth of AWS services. In this blog post, we focus on how to get analytics out of the rich set of data published by Amazon Connect. We make use of an Amazon Connect data stream and create an end-to-end workflow to offer an analytical solution that can be customized based on need.
Solution overview
The following diagram illustrates the solution.
In this solution, Amazon Connect exports its contact trace records (CTRs) using Amazon Kinesis. CTRs are data streams in JSON format, and each has information about individual contacts. For example, this information might include the start and end time of a call, which agent handled the call, which queue the user chose, queue wait times, number of holds, and so on. You can enable this feature by reviewing our documentation.
In this architecture, we use Kinesis Firehose to capture Amazon Connect CTRs as raw data in an Amazon S3 bucket. We don’t use the recent feature added by Kinesis Firehose to save the data in S3 as Apache Parquet format. We use AWS Glue functionality to automatically detect the schema on the fly from an Amazon Connect data stream.
The primary reason for this approach is that it allows us to use attributes and enables an Amazon Connect administrator to dynamically add more fields as needed. Also by converting data to parquet in batch (every couple of hours) compression can be higher. However, if your requirement is to ingest the data in Parquet format on realtime, we recoment using Kinesis Firehose recently launched feature. You can review this blog post for further information.
By default, Firehose puts these records in time-series format. To make it easy for AWS Glue crawlers to capture information from new records, we use AWS Lambda to move all new records to a single S3 prefix called flatfiles. Our Lambda function is configured using S3 event notification. To comply with AWS Glue and Athena best practices, the Lambda function also converts all column names to lowercase. Finally, we also use the Lambda function to start AWS Glue crawlers. AWS Glue crawlers identify the data schema and update the AWS Glue Data Catalog, which is used by extract, transform, load (ETL) jobs in AWS Glue in the latter half of the workflow.
You can see our approach in the Lambda code following.
from __future__ import print_function
import json
import urllib
import boto3
import os
import re
s3 = boto3.resource('s3')
client = boto3.client('s3')
def convertColumntoLowwerCaps(obj):
for key in obj.keys():
new_key = re.sub(r'[\W]+', '', key.lower())
v = obj[key]
if isinstance(v, dict):
if len(v) > 0:
convertColumntoLowwerCaps(v)
if new_key != key:
obj[new_key] = obj[key]
del obj[key]
return obj
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key'].encode('utf8'))
try:
client.download_file(bucket, key, '/tmp/file.json')
with open('/tmp/out.json', 'w') as output, open('/tmp/file.json', 'rb') as file:
i = 0
for line in file:
for object in line.replace("}{","}\n{").split("\n"):
record = json.loads(object,object_hook=convertColumntoLowwerCaps)
if i != 0:
output.write("\n")
output.write(json.dumps(record))
i += 1
newkey = 'flatfiles/' + key.replace("/", "")
client.upload_file('/tmp/out.json', bucket,newkey)
s3.Object(bucket,key).delete()
return "success"
except Exception as e:
print(e)
print('Error coping object {} from bucket {}'.format(key, bucket))
raise e
We trigger AWS Glue crawlers based on events because this approach lets us capture any new data frame that we want to be dynamic in nature. CTR attributes are designed to offer multiple custom options based on a particular call flow. Attributes are essentially key-value pairs in nested JSON format. With the help of event-based AWS Glue crawlers, you can easily identify newer attributes automatically.
We recommend setting up an S3 lifecycle policy on the flatfiles folder that keeps records only for 24 hours. Doing this optimizes AWS Glue ETL jobs to process a subset of files rather than the entire set of records.
After we have data in the flatfiles folder, we use AWS Glue to catalog the data and transform it into Parquet format inside a folder called parquet/ctr/. The AWS Glue job performs the ETL that transforms the data from JSON to Parquet format. We use AWS Glue crawlers to capture any new data frame inside the JSON code that we want to be dynamic in nature. What this means is that when you add new attributes to an Amazon Connect instance, the solution automatically recognizes them and incorporates them in the schema of the results.
After AWS Glue stores the results in Parquet format, you can perform analytics using Amazon Redshift Spectrum, Amazon Athena, or any third-party data warehouse platform. To keep this solution simple, we have used Amazon Athena for analytics. Amazon Athena allows us to query data without having to set up and manage any servers or data warehouse platforms. Additionally, we only pay for the queries that are executed.
Try it out!
You can get started with our sample AWS CloudFormation template. This template creates the components starting from the Kinesis stream and finishes up with S3 buckets, the AWS Glue job, and crawlers. To deploy the template, open the AWS Management Console by clicking the following link.
In the console, specify the following parameters:
BucketName: The name for the bucket to store all the solution files. This name must be unique; if it’s not, template creation fails.
etlJobSchedule: The schedule in cron format indicating how often the AWS Glue job runs. The default value is every hour.
KinesisStreamName: The name of the Kinesis stream to receive data from Amazon Connect. This name must be different from any other Kinesis stream created in your AWS account.
s3interval: The interval in seconds for Kinesis Firehose to save data inside the flatfiles folder on S3. The value must between 60 and 900 seconds.
sampledata: When this parameter is set to true, sample CTR records are used. Doing this lets you try this solution without setting up an Amazon Connect instance. All examples in this walkthrough use this sample data.
Select the “I acknowledge that AWS CloudFormation might create IAM resources.” check box, and then choose Create. After the template finishes creating resources, you can see the stream name on the stack Outputs tab.
If you haven’t created your Amazon Connect instance, you can do so by following the Getting Started Guide. When you are done creating, choose your Amazon Connect instance in the console, which takes you to instance settings. Choose Data streaming to enable streaming for CTR records. Here, you can choose the Kinesis stream (defined in the KinesisStreamName parameter) that was created by the CloudFormation template.
Now it’s time to generate the data by making or receiving calls by using Amazon Connect. You can go to Amazon Connect Cloud Control Panel (CCP) to make or receive calls using a software phone or desktop phone. After a few minutes, we should see data inside the flatfiles folder. To make it easier to try this solution, we provide sample data that you can enable by setting the sampledata parameter to true in your CloudFormation template.
You can navigate to the AWS Glue console by choosing Jobs on the left navigation pane of the console. We can select our job here. In my case, the job created by CloudFormation is called glueJob-i3TULzVtP1W0; yours should be similar. You run the job by choosing Run job for Action.
After that, we wait for the AWS Glue job to run and to finish successfully. We can track the status of the job by checking the History tab.
When the job finishes running, we can check the Database section. There should be a new table created called ctr in Parquet format.
To query the data with Athena, we can select the ctr table, and for Action choose View data.
Doing this takes us to the Athena console. If you run a query, Athena shows a preview of the data.
When we can query the data using Athena, we can visualize it using Amazon QuickSight. Before connecting Amazon QuickSight to Athena, we must make sure to grant Amazon QuickSight access to Athena and the associated S3 buckets in the account. For more information on doing this, see Managing Amazon QuickSight Permissions to AWS Resources in the Amazon QuickSight User Guide. We can then create a new data set in Amazon QuickSight based on the Athena table that was created.
After setting up permissions, we can create a new analysis in Amazon QuickSight by choosing New analysis.
Then we add a new data set.
We choose Athena as the source and give the data source a name (in this case, I named it connectctr).
Choose the name of the database and the table referencing the Parquet results.
Then choose Visualize.
After that, we should see the following screen.
Now we can create some visualizations. First, search for the agent.username column, and drag it to the AutoGraph section.
We can see the agents and the number of calls for each, so we can easily see which agents have taken the largest amount of calls. If we want to see from what queues the calls came for each agent, we can add the queue.arn column to the visual.
After following all these steps, you can use Amazon QuickSight to add different columns from the call records and perform different types of visualizations. You can build dashboards that continuously monitor your connect instance. You can share those dashboards with others in your organization who might need to see this data.
Conclusion
In this post, you see how you can use services like AWS Lambda, AWS Glue, and Amazon Athena to process Amazon Connect call records. The post also demonstrates how to use AWS Lambda to preprocess files in Amazon S3 and transform them into a format that recognized by AWS Glue crawlers. Finally, the post shows how to used Amazon QuickSight to perform visualizations.
You can use the provided template to analyze your own contact center instance. Or you can take the CloudFormation template and modify it to process other data streams that can be ingested using Amazon Kinesis or stored on Amazon S3.
Luis Caro is a Big Data Consultant for AWS Professional Services. He works with our customers to provide guidance and technical assistance on big data projects, helping them improving the value of their solutions when using AWS.
Peter Dalbhanjan is a Solutions Architect for AWS based in Herndon, VA. Peter has a keen interest in evangelizing AWS solutions and has written multiple blog posts that focus on simplifying complex use cases. At AWS, Peter helps with designing and architecting variety of customer workloads.
Businesses and organizations that rely on macOS server for essential office and data services are facing some decisions about the future of their IT services.
Apple recently announced that it is deprecating a significant portion of essential network services in macOS Server, as they described in a support statement posted on April 24, 2018, “Prepare for changes to macOS Server.” Apple’s note includes:
macOS Server is changing to focus more on management of computers, devices, and storage on your network. As a result, some changes are coming in how Server works. A number of services will be deprecated, and will be hidden on new installations of an update to macOS Server coming in spring 2018.
The note lists the services that will be removed in a future release of macOS Server, including calendar and contact support, Dynamic Host Configuration Protocol (DHCP), Domain Name Services (DNS), mail, instant messages, virtual private networking (VPN), NetInstall, Web server, and the Wiki.
Apple assures users who have already configured any of the listed services that they will be able to use them in the spring 2018 macOS Server update, but the statement ends with links to a number of alternative services, including hosted services, that macOS Server users should consider as viable replacements to the features it is removing. These alternative services are all FOSS (Free and Open-Source Software).
As difficult as this could be for organizations that use macOS server, this is not unexpected. Apple left the server hardware space back in 2010, when Steve Jobs announced the company was ending its line of Xserve rackmount servers, which were introduced in May, 2002. Since then, macOS Server has hardly been a prominent part of Apple’s product lineup. It’s not just the product itself that has lost some luster, but the entire category of SMB office and business servers, which has been undergoing a gradual change in recent years.
Some might wonder how important the news about macOS Server is, given that macOS Server represents a pretty small share of the server market. macOS Server has been important to design shops, agencies, education users, and small businesses that likely have been on Macs for ages, but it’s not a significant part of the IT infrastructure of larger organizations and businesses.
What Comes After macOS Server?
Lovers of macOS Server don’t have to fear having their Mac minis pried from their cold, dead hands quite yet. Installed services will continue to be available. In the fall of 2018, new installations and upgrades of macOS Server will require users to migrate most services to other software. Since many of the services of macOS Server were already open-source, this means that a change in software might not be required. It does mean more configuration and management required from those who continue with macOS Server, however.
Users can continue with macOS Server if they wish, but many will see the writing on the wall and look for a suitable substitute.
The Times They Are A-Changin’
For many people working in organizations, what is significant about this announcement is how it reflects the move away from the once ubiquitous server-based IT infrastructure. Services that used to be centrally managed and office-based, such as storage, file sharing, communications, and computing, have moved to the cloud.
In selecting the next office IT platforms, there’s an opportunity to move to solutions that reflect and support how people are working and the applications they are using both in the office and remotely. For many, this means including cloud-based services in office automation, backup, and business continuity/disaster recovery planning. This includes Software as a Service, Platform as a Service, and Infrastructure as a Service (Saas, PaaS, IaaS) options.
IT solutions that integrate well with the cloud are worth strong consideration for what comes after a macOS Server-based environment.
Synology NAS as a macOS Server Alternative
One solution that is becoming popular is to replace macOS Server with a device that has the ability to provide important office services, but also bridges the office and cloud environments. Using Network-Attached Storage (NAS) to take up the server slack makes a lot of sense. Many customers are already using NAS for file sharing, local data backup, automatic cloud backup, and other uses. In the case of Synology, their operating system, Synology DiskStation Manager (DSM), is Linux based, and integrates the basic functions of file sharing, centralized backup, RAID storage, multimedia streaming, virtual storage, and other common functions.
Synology NAS
Since DSM is based on Linux, there are numerous server applications available, including many of the same ones that are available for macOS Server, which shares conceptual roots with Linux as it comes from BSD Unix.
Synology DiskStation Manager Package Center
According to Ed Lukacs, COO at 2FIFTEEN Systems Management in Salt Lake City, their customers have found the move from macOS Server to Synology NAS not only painless, but positive. DSM works seamlessly with macOS and has been faster for their customers, as well. Many of their customers are running Adobe Creative Suite and Google G Suite applications, so a workflow that combines local storage, remote access, and the cloud, is already well known to them. Remote users are supported by Synology’s QuickConnect or VPN.
Business continuity and backup are simplified by the flexible storage capacity of the NAS. Synology has built-in backup to Backblaze B2 Cloud Storage with Synology’s Cloud Sync, as well as a choice of a number of other B2-compatible applications, such as Cloudberry, Comet, and Arq.
Customers have been able to get up and running quickly, with only initial data transfers requiring some time to complete. After that, management of the NAS can be handled in-house or with the support of a Managed Service Provider (MSP).
Are You Sticking with macOS Server or Moving to Another Platform?
If you’re affected by this change in macOS Server, please let us know in the comments how you’re planning to cope. Are you using Synology NAS for server services? Please tell us how that’s working for you.
I’m in danger of contradicting myself, after previously pointing out that x86 machine code is a high-level language, but this article claiming C is a not a low level language is bunk. C certainly has some problems, but it’s still the closest language to assembly. This is obvious by the fact it’s still the fastest compiled language. What we see is a typical academic out of touch with the real world.
The author makes the (wrong) observation that we’ve been stuck emulating the PDP-11 for the past 40 years. C was written for the PDP-11, and since then CPUs have been designed to make C run faster. The author imagines a different world, such as where CPU designers instead target something like LISP as their preferred language, or Erlang. This misunderstands the state of the market. CPUs do indeed supports lots of different abstractions, and C has evolved to accommodate this.
The author criticizes things like “out-of-order” execution which has lead to the Spectre sidechannel vulnerabilities. Out-of-order execution is necessary to make C run faster. The author claims instead that those resources should be spent on having more slower CPUs, with more threads. This sacrifices single-threaded performance in exchange for a lot more threads executing in parallel. The author cites Sparc Tx CPUs as his ideal processor.
But here’s the thing, the Sparc Tx was a failure. To be fair, it’s mostly a failure because most of the time, people wanted to run old C code instead of new Erlang code. But it was still a failure at running Erlang.
Time after time, engineers keep finding that “out-of-order”, single-threaded performance is still the winner. A good example is ARM processors for both mobile phones and servers. All the theory points to in-order CPUs as being better, but all the products are out-of-order, because this theory is wrong. The custom ARM cores from Apple and Qualcomm used in most high-end phones are so deeply out-of-order they give Intel CPUs competition. The same is true on the server front with the latest Qualcomm Centriq and Cavium ThunderX2 processors, deeply out of order supporting more than 100 instructions in flight.
The Cavium is especially telling. Its ThunderX CPU had 48 simple cores which was replaced with the ThunderX2 having 32 complex, deeply out-of-order cores. The performance increase was massive, even on multithread-friendly workloads. Every competitor to Intel’s dominance in the server space has learned the lesson from Sparc Tx: many wimpy cores is a failure, you need fewer beefy cores. Yes, they don’t need to be as beefy as Intel’s processors, but they need to be close.
Even Intel’s “Xeon Phi” custom chip learned this lesson. This is their GPU-like chip, running 60 cores with 512-bit wide “vector” (sic) instructions, designed for supercomputer applications. Its first version was purely in-order. Its current version is slightly out-of-order. It supports four threads and focuses on basic number crunching, so in-order cores seems to be the right approach, but Intel found in this case that out-of-order processing still provided a benefit. Practice is different than theory.
As an academic, the author of the above article focuses on abstractions. The criticism of C is that it has the wrong abstractions which are hard to optimize, and that if we instead expressed things in the right abstractions, it would be easier to optimize.
This is an intellectually compelling argument, but so far bunk.
The reason is that while the theoretical base language has issues, everyone programs using extensions to the language, like “intrinsics” (C ‘functions’ that map to assembly instructions). Programmers write libraries using these intrinsics, which then the rest of the normal programmers use. In other words, if your criticism is that C is not itself low level enough, it still provides the best access to low level capabilities.
Given that C can access new functionality in CPUs, CPU designers add new paradigms, from SIMD to transaction processing. In other words, while in the 1980s CPUs were designed to optimize C (stacks, scaled pointers), these days CPUs are designed to optimize tasks regardless of language.
The author of that article criticizes the memory/cache hierarchy, claiming it has problems. Yes, it has problems, but only compared to how well it normally works. The author praises the many simple cores/threads idea as hiding memory latency with little caching, but misses the point that caches also dramatically increase memory bandwidth. Intel processors are optimized to read a whopping 256 bits every clock cycle from L1 cache. Main memory bandwidth is orders of magnitude slower.
The author goes onto criticize cache coherency as a problem. C uses it, but other languages like Erlang don’t need it. But that’s largely due to the problems each languages solves. Erlang solves the problem where a large number of threads work on largely independent tasks, needing to send only small messages to each other across threads. The problems C solves is when you need many threads working on a huge, common set of data.
For example, consider the “intrusion prevention system”. Any thread can process any incoming packet that corresponds to any region of memory. There’s no practical way of solving this problem without a huge coherent cache. It doesn’t matter which language or abstractions you use, it’s the fundamental constraint of the problem being solved. RDMA is an important concept that’s moved from supercomputer applications to the data center, such as with memcached. Again, we have the problem of huge quantities (terabytes worth) shared among threads rather than small quantities (kilobytes).
The fundamental issue the author of the the paper is ignoring is decreasing marginal returns. Moore’s Law has gifted us more transistors than we can usefully use. We can’t apply those additional registers to just one thing, because the useful returns we get diminish.
For example, Intel CPUs have two hardware threads per core. That’s because there are good returns by adding a single additional thread. However, the usefulness of adding a third or fourth thread decreases. That’s why many CPUs have only two threads, or sometimes four threads, but no CPU has 16 threads per core.
You can apply the same discussion to any aspect of the CPU, from register count, to SIMD width, to cache size, to out-of-order depth, and so on. Rather than focusing on one of these things and increasing it to the extreme, CPU designers make each a bit larger every process tick that adds more transistors to the chip.
The same applies to cores. It’s why the “more simpler cores” strategy fails, because more cores have their own decreasing marginal returns. Instead of adding cores tied to limited memory bandwidth, it’s better to add more cache. Such cache already increases the size of the cores, so at some point it’s more effective to add a few out-of-order features to each core rather than more cores. And so on.
The question isn’t whether we can change this paradigm and radically redesign CPUs to match some academic’s view of the perfect abstraction. Instead, the goal is to find new uses for those additional transistors. For example, “message passing” is a useful abstraction in languages like Go and Erlang that’s often more useful than sharing memory. It’s implemented with shared memory and atomic instructions, but I can’t help but think it couldn’t better be done with direct hardware support.
Of course, as soon as they do that, it’ll become an intrinsic in C, then added to languages like Go and Erlang.
Summary Academics live in an ideal world of abstractions, the rest of us live in practical reality. The reality is that vast majority of programmers work with the C family of languages (JavaScript, Go, etc.), whereas academics love the epiphanies they learned using other languages, especially function languages. CPUs are only superficially designed to run C and “PDP-11 compatibility”. Instead, they keep adding features to support other abstractions, abstractions available to C. They are driven by decreasing marginal returns — they would love to add new abstractions to the hardware because it’s a cheap way to make use of additional transitions. Academics are wrong believing that the entire system needs to be redesigned from scratch. Instead, they just need to come up with new abstractions CPU designers can add.
Brandon Williams writes about the new Git remote protocol that will debut in the 2.18 release. “We recently rolled out support for protocol version 2 at Google and have seen a performance improvement of 3x for no-op fetches of a single branch on repositories containing 500k references. Protocol v2 has also enabled a reduction of 8x of the overhead bytes (non-packfile) sent from googlesource.com servers. A majority of this improvement is due to filtering references advertised by the server to the refs the client has expressed interest in.”
The EU’s General Data Protection Regulation (GDPR) describes data processor and data controller roles, and some customers and AWS Partner Network (APN) partners are asking how this affects the long-established AWS Shared Responsibility Model. I wanted to take some time to help folks understand shared responsibilities for us and for our customers in context of the GDPR.
How does the AWS Shared Responsibility Model change under GDPR? The short answer – it doesn’t. AWS is responsible for securing the underlying infrastructure that supports the cloud and the services provided; while customers and APN partners, acting either as data controllers or data processors, are responsible for any personal data they put in the cloud. The shared responsibility model illustrates the various responsibilities of AWS and our customers and APN partners, and the same separation of responsibility applies under the GDPR.
AWS responsibilities as a data processor
The GDPR does introduce specific regulation and responsibilities regarding data controllers and processors. When any AWS customer uses our services to process personal data, the controller is usually the AWS customer (and sometimes it is the AWS customer’s customer). However, in all of these cases, AWS is always the data processor in relation to this activity. This is because the customer is directing the processing of data through its interaction with the AWS service controls, and AWS is only executing customer directions. As a data processor, AWS is responsible for protecting the global infrastructure that runs all of our services. Controllers using AWS maintain control over data hosted on this infrastructure, including the security configuration controls for handling end-user content and personal data. Protecting this infrastructure, is our number one priority, and we invest heavily in third-party auditors to test our security controls and make any issues they find available to our customer base through AWS Artifact. Our ISO 27018 report is a good example, as it tests security controls that focus on protection of personal data in particular.
AWS has an increased responsibility for our managed services. Examples of managed services include Amazon DynamoDB, Amazon RDS, Amazon Redshift, Amazon Elastic MapReduce, and Amazon WorkSpaces. These services provide the scalability and flexibility of cloud-based resources with less operational overhead because we handle basic security tasks like guest operating system (OS) and database patching, firewall configuration, and disaster recovery. For most managed services, you only configure logical access controls and protect account credentials, while maintaining control and responsibility of any personal data.
Customer and APN partner responsibilities as data controllers — and how AWS Services can help
Our customers can act as data controllers or data processors within their AWS environment. As a data controller, the services you use may determine how you configure those services to help meet your GDPR compliance needs. For example, AWS Services that are classified as Infrastructure as a Service (IaaS), such as Amazon EC2, Amazon VPC, and Amazon S3, are under your control and require you to perform all routine security configuration and management that would be necessary no matter where the servers were located. With Amazon EC2 instances, you are responsible for managing: guest OS (including updates and security patches), application software or utilities installed on the instances, and the configuration of the AWS-provided firewall (called a security group).
To help you realize data protection by design principles under the GDPR when using our infrastructure, we recommend you protect AWS account credentials and set up individual user accounts with Amazon Identity and Access Management (IAM) so that each user is only given the permissions necessary to fulfill their job duties. We also recommend using multi-factor authentication (MFA) with each account, requiring the use of SSL/TLS to communicate with AWS resources, setting up API/user activity logging with AWS CloudTrail, and using AWS encryption solutions, along with all default security controls within AWS Services. You can also use advanced managed security services, such as Amazon Macie, which assists in discovering and securing personal data stored in Amazon S3.
For more information, you can download the AWS Security Best Practices whitepaper or visit the AWS Security Resources or GDPR Center webpages. In addition to our solutions and services, AWS APN partners can provide hundreds of tools and features to help you meet your security objectives, ranging from network security and configuration management to access control and data encryption.
A new PGP vulnerability was announced today. Basically, the vulnerability makes use of the fact that modern e-mail programs allow for embedded HTML objects. Essentially, if an attacker can intercept and modify a message in transit, he can insert code that sends the plaintext in a URL to a remote website. Very clever.
The EFAIL attacks exploit vulnerabilities in the OpenPGP and S/MIME standards to reveal the plaintext of encrypted emails. In a nutshell, EFAIL abuses active content of HTML emails, for example externally loaded images or styles, to exfiltrate plaintext through requested URLs. To create these exfiltration channels, the attacker first needs access to the encrypted emails, for example, by eavesdropping on network traffic, compromising email accounts, email servers, backup systems or client computers. The emails could even have been collected years ago.
The attacker changes an encrypted email in a particular way and sends this changed encrypted email to the victim. The victim’s email client decrypts the email and loads any external content, thus exfiltrating the plaintext to the attacker.
A few initial comments:
1. Being able to intercept and modify e-mails in transit is the sort of thing the NSA can do, but is hard for the average hacker. That being said, there are circumstances where someone can modify e-mails. I don’t mean to minimize the seriousness of this attack, but that is a consideration.
2. The vulnerability isn’t with PGP or S/MIME itself, but in the way they interact with modern e-mail programs. You can see this in the two suggested short-term mitigations: “No decryption in the e-mail client,” and “disable HTML rendering.”
3. I’ve been getting some weird press calls from reporters wanting to know if this demonstrates that e-mail encryption is impossible. No, this just demonstrates that programmers are human and vulnerabilities are inevitable. PGP almost certainly has fewer bugs than your average piece of software, but it’s not bug free.
3. Why is anyone using encrypted e-mail anymore, anyway? Reliably and easily encrypting e-mail is an insurmountably hard problem for reasons having nothing to do with today’s announcement. If you need to communicate securely, use Signal. If having Signal on your phone will arouse suspicion, use WhatsApp.
I’ll post other commentaries and analyses as I find them.
Today, we’re excited to announce local build support in AWS CodeBuild.
AWS CodeBuild is a fully managed build service. There are no servers to provision and scale, or software to install, configure, and operate. You just specify the location of your source code, choose your build settings, and CodeBuild runs build scripts for compiling, testing, and packaging your code.
In this blog post, I’ll show you how to set up CodeBuild locally to build and test a sample Java application.
By building an application on a local machine you can:
Test the integrity and contents of a buildspec file locally.
Test and build an application locally before committing.
Identify and fix errors quickly from your local development environment.
Prerequisites
In this post, I am using AWS Cloud9 IDE as my development environment.
If you would like to use AWS Cloud9 as your IDE, follow the express setup steps in the AWS Cloud9 User Guide.
The AWS Cloud9 IDE comes with Docker and Git already installed. If you are going to use your laptop or desktop machine as your development environment, install Docker and Git before you start.
Note: We need to provide three environment variables namely IMAGE_NAME, SOURCE and ARTIFACTS.
IMAGE_NAME: The name of your build environment image.
SOURCE: The absolute path to your source code directory.
ARTIFACTS: The absolute path to your artifact output folder.
When you run the sample project, you get a runtime error that says the YAML file does not exist. This is because a buildspec.yml file is not included in the sample web project. AWS CodeBuild requires a buildspec.yml to run a build. For more information about buildspec.yml, see Build Spec Example in the AWS CodeBuild User Guide.
Let’s add a buildspec.yml file with the following content to the sample-web-app folder and then rebuild the project.
version: 0.2
phases:
build:
commands:
- echo Build started on `date`
- mvn install
artifacts:
files:
- target/javawebdemo.war
This time your build should be successful. Upon successful execution, look in the /artifacts folder for the final built artifacts.zip file to validate.
Conclusion:
In this blog post, I showed you how to quickly set up the CodeBuild local agent to build projects right from your local desktop machine or laptop. As you see, local builds can improve developer productivity by helping you identify and fix errors quickly.
I hope you found this post useful. Feel free to leave your feedback or suggestions in the comments.
EC2 Spot Fleets are really cool. You can launch a fleet of Spot Instances that spans EC2 instance types and Availability Zones without having to write custom code to discover capacity or monitor prices. You can set the target capacity (the size of the fleet) in units that are meaningful to your application and have Spot Fleet create and then maintain the fleet on your behalf. Our customers are creating Spot Fleets of all sizes. For example, one financial service customer runs Monte Carlo simulations across 10 different EC2 instance types. They routinely make requests for hundreds of thousands of vCPUs and count on Spot Fleet to give them access to massive amounts of capacity at the best possible price.
EC2 Fleet Today we are extending and generalizing the set-it-and-forget-it model that we pioneered in Spot Fleet with EC2 Fleet, a new building block that gives you the ability to create fleets that are composed of a combination of EC2 On-Demand, Reserved, and Spot Instances with a single API call. You tell us what you need, capacity and instance-wise, and we’ll handle all the heavy lifting. We will launch, manage, monitor and scale instances as needed, without the need for scaffolding code.
You can specify the capacity of your fleet in terms of instances, vCPUs, or application-oriented units, and also indicate how much of the capacity should be fulfilled by Spot Instances. The application-oriented units allow you to specify the relative power of each EC2 instance type in a way that directly maps to the needs of your application. All three capacity specification options (instances, vCPUs, and application-oriented units) are known as weights.
I think you’ll find a number ways this feature makes managing a fleet of instances easier, and believe that you will also appreciate the team’s near-term feature roadmap of interest (more on that in a bit).
Using EC2 Fleet There are a number of ways that you can use this feature, whether you’re running a stateless web service, a big data cluster or a continuous integration pipeline. Today I’m going to describe how you can use EC2 Fleet for genomic processing, but this is similar to workloads like risk analysis, log processing or image rendering. Modern DNA sequencers can produce multiple terabytes of raw data each day, to process that data into meaningful information in a timely fashion you need lots of processing power. I’ll be showing you how to deploy a “grid” of worker nodes that can quickly crunch through secondary analysis tasks in parallel.
Projects in genomics can use the elasticity EC2 provides to experiment and try out new pipelines on hundreds or even thousands of servers. With EC2 you can access as many cores as you need and only pay for what you use. Prior to today, you would need to use the RunInstances API or an Auto Scaling group for the On-Demand & Reserved Instance portion of your grid. To get the best price performance you’d also create and manage a Spot Fleet or multiple Spot Auto Scaling groups with different instance types if you wanted to add Spot Instances to turbo-boost your secondary analysis. Finally, to automate scaling decisions across multiple APIs and Auto Scaling groups you would need to write Lambda functions that periodically assess your grid’s progress & backlog, as well as current Spot prices – modifying your Auto Scaling Groups and Spot Fleets accordingly.
You can now replace all of this with a single EC2 Fleet, analyzing genomes at scale for as little as $1 per analysis. In my grid, each step in in the pipeline requires 1 vCPU and 4 GiB of memory, a perfect match for M4 and M5 instances with 4 GiB of memory per vCPU. I will create a fleet using M4 and M5 instances with weights that correspond to the number of vCPUs on each instance:
m4.16xlarge – 64 vCPUs, weight = 64
m5.24xlarge – 96 vCPUs, weight = 96
This is expressed in a template that looks like this:
By default, EC2 Fleet will select the most cost effective combination of instance types and Availability Zones (both specified in the template) using the current prices for the Spot Instances and public prices for the On-Demand Instances (if you specify instances for which you have matching RIs, your discounts will apply). The default mode takes weights into account to get the instances that have the lowest price per unit. So for my grid, fleet will find the instance that offers the lowest price per vCPU.
Now I can request capacity in terms of vCPUs, knowing EC2 Fleet will select the lowest cost option using only the instance types I’ve defined as acceptable. Also, I can specify how many vCPUs I want to launch using On-Demand or Reserved Instance capacity and how many vCPUs should be launched using Spot Instance capacity:
The above means that I want a total of 2880 vCPUs, with 960 vCPUs fulfilled using On-Demand and 1920 using Spot. The On-Demand price per vCPU is lower for m5.24xlarge than the On-Demand price per vCPU for m4.16xlarge, so EC2 Fleet will launch 10 m5.24xlarge instances to fulfill 960 vCPUs. Based on current Spot pricing (again, on a per-vCPU basis), EC2 Fleet will choose to launch 30 m4.16xlarge instances or 20 m5.24xlarges, delivering 1920 vCPUs either way.
Putting it all together, I have a single file (fl1.json) that describes my fleet:
My entire fleet is created within seconds and was built using 10 m5.24xlarge On-Demand Instances and 30 m4.16xlarge Spot Instances, since the current Spot price was 1.5¢ per vCPU for m4.16xlarge and 1.6¢ per vCPU for m5.24xlarge.
Now lets imagine my grid has crunched through its backlog and no longer needs the additional Spot Instances. I can then modify the size of my fleet by changing the target capacity in my fleet specification, like this:
{
"TotalTargetCapacity": 960,
}
Since 960 was equal to the amount of On-Demand vCPUs I had requested, when I describe my fleet I will see all of my capacity being delivered using On-Demand capacity:
Earlier I described how RI discounts apply when EC2 Fleet launches instances for which you have matching RIs, so you might be wondering how else RI customers benefit from EC2 Fleet. Let’s say that I own regional RIs for M4 instances. In my EC2 Fleet I would remove m5.24xlarge and specify m4.10xlarge and m4.16xlarge. Then when EC2 Fleet creates the grid, it will quickly find M4 capacity across the sizes and AZs I’ve specified, and my RI discounts apply automatically to this usage.
In the Works We plan to connect EC2 Fleet and EC2 Auto Scaling groups. This will let you create a single fleet that mixed instance types and Spot, Reserved and On-Demand, while also taking advantage of EC2 Auto Scaling features such as health checks and lifecycle hooks. This integration will also bring EC2 Fleet functionality to services such as Amazon ECS, Amazon EKS, and AWS Batch that build on and make use of EC2 Auto Scaling for fleet management.
Available Now You can create and make use of EC2 Fleets today in all public AWS Regions!
Researchers at Princeton University have released IoT Inspector, a tool that analyzes the security and privacy of IoT devices by examining the data they send across the Internet. They’ve already used the tool to study a bunch of different IoT devices. From their blog post:
Finding #3: Many IoT Devices Contact a Large and Diverse Set of Third Parties
In many cases, consumers expect that their devices contact manufacturers’ servers, but communication with other third-party destinations may not be a behavior that consumers expect.
We have found that many IoT devices communicate with third-party services, of which consumers are typically unaware. We have found many instances of third-party communications in our analyses of IoT device network traffic. Some examples include:
Samsung Smart TV. During the first minute after power-on, the TV talks to Google Play, Double Click, Netflix, FandangoNOW, Spotify, CBS, MSNBC, NFL, Deezer, and Facebookeven though we did not sign in or create accounts with any of them.
Amcrest WiFi Security Camera. The camera actively communicates with cellphonepush.quickddns.com using HTTPS. QuickDDNS is a Dynamic DNS service provider operated by Dahua. Dahua is also a security camera manufacturer, although Amcrest’s website makes no references to Dahua. Amcrest customer service informed us that Dahua was the original equipment manufacturer.
Halo Smoke Detector. The smart smoke detector communicates with broker.xively.com. Xively offers an MQTT service, which allows manufacturers to communicate with their devices.
Geeni Light Bulb. The Geeni smart bulb communicates with gw.tuyaus.com, which is operated by TuYa, a China-based company that also offers an MQTT service.
We also looked at a number of other devices, such as Samsung Smart Camera and TP-Link Smart Plug, and found communications with third parties ranging from NTP pools (time servers) to video storage services.
Their first two findings are that “Many IoT devices lack basic encryption and authentication” and that “User behavior can be inferred from encrypted IoT device traffic.” No surprises there.
Enterprises adopt containers because they recognize the benefits: speed, agility, portability, and high compute density. They understand how accelerating application delivery and deployment pipelines makes it possible to rapidly slipstream new features to customers. Although the benefits are indisputable, this acceleration raises concerns about security and corporate compliance with software governance. In this blog post, I provide a solution that shows how Layered Insight, the pioneer and global leader in container-native application protection, can be used with seamless application build and delivery pipelines like those available in AWS CodeBuild to address these concerns.
Layered Insight solutions
Layered Insight enables organizations to unify DevOps and SecOps by providing complete visibility and control of containerized applications. Using the industry’s first embedded security approach, Layered Insight solves the challenges of container performance and protection by providing accurate insight into container images, adaptive analysis of running containers, and automated enforcement of container behavior.
AWS CodeBuild
AWS CodeBuild is a fully managed build service that compiles source code, runs tests, and produces software packages that are ready to deploy. With CodeBuild, you don’t need to provision, manage, and scale your own build servers. CodeBuild scales continuously and processes multiple builds concurrently, so your builds are not left waiting in a queue. You can get started quickly by using prepackaged build environments, or you can create custom build environments that use your own build tools.
Problem Definition
Security and compliance concerns span the lifecycle of application containers. Common concerns include:
Visibility into the container images. You need to verify the software composition information of the container image to determine whether known vulnerabilities associated with any of the software packages and libraries are included in the container image.
Governance of container images is critical because only certain open source packages/libraries, of specific versions, should be included in the container images. You need support for mechanisms for blacklisting all container images that include a certain version of a software package/library, or only allowing open source software that come with a specific type of license (such as Apache, MIT, GPL, and so on). You need to be able to address challenges such as:
· Defining the process for image compliance policies at the enterprise, department, and group levels.
· Preventing the images that fail the compliance checks from being deployed in critical environments, such as staging, pre-prod, and production.
Visibility into running container instances is critical, including:
· CPU and memory utilization.
· Security of the build environment.
· All activities (system, network, storage, and application layer) of the application code running in each container instance.
Protection of running container instances that is:
· Zero-touch to the developers (not an SDK-based approach).
· Zero touch to the DevOps team and doesn’t limit the portability of the containerized application.
· This protection must retain the option to switch to a different container stack or orchestration layer, or even to a different Container as a Service (CaaS ).
· And it must be a fully automated solution to SecOps, so that the SecOps team doesn’t have to manually analyze and define detailed blacklist and whitelist policies.
Solution Details
In AWS CodeCommit, we have three projects: ● “Democode” is a simple Java application, with one buildspec to build the app into a Docker container (run by build-demo-image CodeBuild project), and another to instrument said container (instrument-image CodeBuild project). The resulting container is stored in ECR repo javatestasjavatest:20180415-layered. This instrumented container is running in AWS Fargate cluster demo-java-appand can be seen in the Layered Insight runtime console as the javatestapplication in us-east-1. ● aws-codebuild-docker-imagesis a clone of the official aws-codebuild-docker-images repo on GitHub . This CodeCommit project is used by the build-python-builder CodeBuild project to build the python 3.3.6 codebuild image and is stored at the codebuild-python ECR repo. We then manually instructed the Layered Insight console to instrument the image. ● scan-java-imagecontains just a buildspec.yml file. This file is used by the scan-java-image CodeBuild project to instruct Layered Assessment to perform a vulnerability scan of the javatest container image built previously, and then run the scan results through a compliance policy that states there should be no medium vulnerabilities. This build fails — but in this case that is a success: the scan completes successfully, but compliance fails as there are medium-level issues found in the scan.
This build is performed using the instrumented version of the Python 3.3.6 CodeBuild image, so the activity of the processes running within the build are recorded each time within the LI console.
Build container image
Create or use a CodeCommit project with your application. To build this image and store it in Amazon Elastic Container Registry (Amazon ECR), add a buildspec file to the project and build a container image and create a CodeBuild project.
Scan container image
Once the image is built, create a new buildspec in the same project or a new one that looks similar to below (update ECR URL as necessary):
version: 0.2
phases:
pre_build:
commands:
- echo Pulling down LI Scan API client scripts
- git clone https://github.com/LayeredInsight/scan-api-example-python.git
- echo Setting up LI Scan API client
- cd scan-api-example-python
- pip install layint_scan_api
- pip install -r requirements.txt
build:
commands:
- echo Scanning container started on `date`
- IMAGEID=$(./li_add_image – name <aws-region>.amazonaws.com/javatest:20180415)
- ./li_wait_for_scan -v – imageid $IMAGEID
- ./li_run_image_compliance -v – imageid $IMAGEID – policyid PB15260f1acb6b2aa5b597e9d22feffb538256a01fbb4e5a95
Add the buildspec file to the git repo, push it, and then build a CodeBuild project using with the instrumented Python 3.3.6 CodeBuild image at <aws-region>.amazonaws.com/codebuild-python:3.3.6-layered. Set the following environment variables in the CodeBuild project: ● LI_APPLICATIONNAME – name of the build to display ● LI_LOCATION – location of the build project to display ● LI_API_KEY – ApiKey:<key-name>:<api-key> ● LI_API_HOST – location of the Layered Insight API service
Instrument container image
Next, to instrument the new container image:
In the Layered Insight runtime console, ensure that the ECR registry and credentials are defined (click the Setup icon and the ‘+’ sign on the top right of the screen to add a new container registry). Note the name given to the registry in the console, as this needs to be referenced in the li_add_imagecommand in the script, below.
Next, add a new buildspec (with a new name) to the CodeCommit project, such as the one shown below. This code will download the Layered Insight runtime client, and use it to instruct the Layered Insight service to instrument the image that was just built:
version: 0.2
phases:
pre_build:
commands:
echo Pulling down LI API Runtime client scripts
git clone https://github.com/LayeredInsight/runtime-api-example-python
echo Setting up LI API client
cd runtime-api-example-python
pip install layint-runtime-api
pip install -r requirements.txt
build:
commands:
echo Instrumentation started on `date`
./li_add_image – registry "Javatest ECR" – name IMAGE_NAME:TAG – description "IMAGE DESCRIPTION" – policy "Default Policy" – instrument – wait – verbose
Commit and push the new buildspec file.
Going back to CodeBuild, create a new project, with the same CodeCommit repo, but this time select the new buildspec file. Use a Python 3.3.6 builder – either the AWS or LI Instrumented version.
Click Continue
Click Save
Run the build, again on the master branch.
If everything runs successfully, a new image should appear in the ECR registry with a -layered suffix. This is the instrumented image.
Run instrumented container image
When the instrumented container is now run — in ECS, Fargate, or elsewhere — it will log data back to the Layered Insight runtime console. It’s appearance in the console can be modified by setting the LI_APPLICATIONNAME and LI_LOCATION environment variables when running the container.
Conclusion
In the above blog we have provided you steps needed to embed governance and runtime security in your build pipelines running on AWS CodeBuild using Layered Insight.
In a multi-account environment where you require connectivity between accounts, and perhaps connectivity between cloud and on-premises workloads, the demand for a robust Domain Name Service (DNS) that’s capable of name resolution across all connected environments will be high.
The most common solution is to implement local DNS in each account and use conditional forwarders for DNS resolutions outside of this account. While this solution might be efficient for a single-account environment, it becomes complex in a multi-account environment.
In this post, I will provide a solution to implement central DNS for multiple accounts. This solution reduces the number of DNS servers and forwarders needed to implement cross-account domain resolution. I will show you how to configure this solution in four steps:
Set up your Central DNS account.
Set up each participating account.
Create Route53 associations.
Configure on-premises DNS (if applicable).
Solution overview
In this solution, you use AWS Directory Service for Microsoft Active Directory (AWS Managed Microsoft AD) as a DNS service in a dedicated account in a Virtual Private Cloud (DNS-VPC).
The DNS service included in AWS Managed Microsoft AD uses conditional forwarders to forward domain resolution to either Amazon Route 53 (for domains in the awscloud.com zone) or to on-premises DNS servers (for domains in the example.com zone). You’ll use AWS Managed Microsoft AD as the primary DNS server for other application accounts in the multi-account environment (participating accounts).
A participating account is any application account that hosts a VPC and uses the centralized AWS Managed Microsoft AD as the primary DNS server for that VPC. Each participating account has a private, hosted zone with a unique zone name to represent this account (for example, business_unit.awscloud.com).
You associate the DNS-VPC with the unique hosted zone in each of the participating accounts, this allows AWS Managed Microsoft AD to use Route 53 to resolve all registered domains in private, hosted zones in participating accounts.
The following diagram shows how the various services work together:
Figure 1: Diagram showing the relationship between all the various services
In this diagram, all VPCs in participating accounts use Dynamic Host Configuration Protocol (DHCP) option sets. The option sets configure EC2 instances to use the centralized AWS Managed Microsoft AD in DNS-VPC as their default DNS Server. You also configure AWS Managed Microsoft AD to use conditional forwarders to send domain queries to Route53 or on-premises DNS servers based on query zone. For domain resolution across accounts to work, we associate DNS-VPC with each hosted zone in participating accounts.
If, for example, server.pa1.awscloud.com needs to resolve addresses in the pa3.awscloud.com domain, the sequence shown in the following diagram happens:
Figure 2: How domain resolution across accounts works
1.1: server.pa1.awscloud.com sends domain name lookup to default DNS server for the name server.pa3.awscloud.com. The request is forwarded to the DNS server defined in the DHCP option set (AWS Managed Microsoft AD in DNS-VPC).
1.2: AWS Managed Microsoft AD forwards name resolution to Route53 because it’s in the awscloud.com zone.
1.3: Route53 resolves the name to the IP address of server.pa3.awscloud.com because DNS-VPC is associated with the private hosted zone pa3.awscloud.com.
Similarly, if server.example.com needs to resolve server.pa3.awscloud.com, the following happens:
2.1: server.example.com sends domain name lookup to on-premise DNS server for the name server.pa3.awscloud.com.
2.2: on-premise DNS server using conditional forwarder forwards domain lookup to AWS Managed Microsoft AD in DNS-VPC.
1.2: AWS Managed Microsoft AD forwards name resolution to Route53 because it’s in the awscloud.com zone.
1.3: Route53 resolves the name to the IP address of server.pa3.awscloud.com because DNS-VPC is associated with the private hosted zone pa3.awscloud.com.
Step 1: Set up a centralized DNS account
In previous AWS Security Blog posts, Drew Dennis covered a couple of options for establishing DNS resolution between on-premises networks and Amazon VPC. In this post, he showed how you can use AWS Managed Microsoft AD (provisioned with AWS Directory Service) to provide DNS resolution with forwarding capabilities.
To set up a centralized DNS account, you can follow the same steps in Drew’s post to create AWS Managed Microsoft AD and configure the forwarders to send DNS queries for awscloud.com to default, VPC-provided DNS and to forward example.com queries to the on-premise DNS server.
Here are a few considerations while setting up central DNS:
The VPC that hosts AWS Managed Microsoft AD (DNS-VPC) will be associated with all private hosted zones in participating accounts.
To be able to resolve domain names across AWS and on-premises, connectivity through Direct Connect or VPN must be in place.
Step 2: Set up participating accounts
The steps I suggest in this section should be applied individually in each application account that’s participating in central DNS resolution.
Create the VPC(s) that will host your resources in participating account.
Create VPC Peering between local VPC(s) in each participating account and DNS-VPC.
Create a private hosted zone in Route 53. Hosted zone domain names must be unique across all accounts. In the diagram above, we used pa1.awscloud.com / pa2.awscloud.com / pa3.awscloud.com. You could also use a combination of environment and business unit: for example, you could use pa1.dev.awscloud.com to achieve uniqueness.
Associate VPC(s) in each participating account with the local private hosted zone.
The next step is to change the default DNS servers on each VPC using DHCP option set:
Follow these steps to create a new DHCP option set. Make sure in the DNS Servers to put the private IP addresses of the two AWS Managed Microsoft AD servers that were created in DNS-VPC:
Figure 3: The “Create DHCP options set” dialog box
Follow these steps to assign the DHCP option set to your VPC(s) in participating account.
Step 3: Associate DNS-VPC with private hosted zones in each participating account
The next steps will associate DNS-VPC with the private, hosted zone in each participating account. This allows instances in DNS-VPC to resolve domain records created in these hosted zones. If you need them, here are more details on associating a private, hosted zone with VPC on a different account.
In each participating account, create the authorization using the private hosted zone ID from the previous step, the region, and the VPC ID that you want to associate (DNS-VPC).
After completing these steps, AWS Managed Microsoft AD in the centralized DNS account should be able to resolve domain records in the private, hosted zone in each participating account.
Step 4: Setting up on-premises DNS servers
This step is necessary if you would like to resolve AWS private domains from on-premises servers and this task comes down to configuring forwarders on-premise to forward DNS queries to AWS Managed Microsoft AD in DNS-VPC for all domains in the awscloud.com zone.
The steps to implement conditional forwarders vary by DNS product. Follow your product’s documentation to complete this configuration.
Summary
I introduced a simplified solution to implement central DNS resolution in a multi-account environment that could be also extended to support DNS resolution between on-premise resources and AWS. This can help reduce operations effort and the number of resources needed to implement cross-account domain resolution.
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Directory Service forum or contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
Today, I’m excited to announce the launch of .BOT, a new generic top-level domain (gTLD) from Amazon. Customers can use .BOT domains to provide an identity and portal for their bots. Fitness bots, slack bots, e-commerce bots, and more can all benefit from an easy-to-access .BOT domain. The phrase “bot” was the 4th most registered domain keyword within the .COM TLD in 2016 with more than 6000 domains per month. A .BOT domain allows customers to provide a definitive internet identity for their bots as well as enhancing SEO performance.
Below, I’ll walk through the experience of registering and provisioning a domain for my bot, whereml.bot. Then we’ll look at setting up the domain as a hosted zone in Amazon Route 53. Let’s get started.
Registering a .BOT domain
First, I’ll head over to https://amazonregistry.com/bot, type in a new domain, and click magnifying class to make sure my domain is available and get taken to the registration wizard.
Next, I have the opportunity to choose how I want to verify my bot. I build all of my bots with Amazon Lex so I’ll select that in the drop down and get prompted for instructions specific to AWS. If I had my bot hosted somewhere else I would need to follow the unique verification instructions for that particular framework.
To verify my Lex bot I need to give the Amazon Registry permissions to invoke the bot and verify it’s existence. I’ll do this by creating an AWS Identity and Access Management (IAM) cross account role and providing the AmazonLexReadOnly permissions to that role. This is easily accomplished in the AWS Console. Be sure to provide the account number and external ID shown on the registration page.
Now I’ll add read only permissions to our Amazon Lex bots.
I’ll give my role a fancy name like DotBotCrossAccountVerifyRole and a description so it’s easy to remember why I made this then I’ll click create to create the role and be transported to the role summary page.
Finally, I’ll copy the ARN from the created role and save it for my next step.
Here I’ll add all the details of my Amazon Lex bot. If you haven’t made a bot yet you can follow the tutorial to build a basic bot. I can refer to any alias I’ve deployed but if I just want to grab the latest published bot I can pass in $LATEST as the alias. Finally I’ll click Validate and proceed to registering my domain.
Amazon Registry works with a partner EnCirca to register our domains so we’ll select them and optionally grab Site Builder. I know how to sling some HTML and Javascript together so I’ll pass on the Site Builder side of things.
After I click continue we’re taken to EnCirca’s website to finalize the registration and with any luck within a few minutes of purchasing and completing the registration we should receive an email with some good news:
Alright, now that we have a domain name let’s find out how to host things on it.
Using Amazon Route53 with a .BOT domain
Amazon Route 53 is a highly available and scalable DNS with robust APIs, healthchecks, service discovery, and many other features. I definitely want to use this to host my new domain. The first thing I’ll do is navigate to the Route53 console and create a hosted zone with the same name as my domain.
Great! Now, I need to take the Name Server (NS) records that Route53 created for me and use EnCirca’s portal to add these as the authoritative nameservers on the domain.
Now I just add my records to my hosted zone and I should be able to serve traffic! Way cool, I’ve got my very own .bot domain for @WhereML.
Next Steps
I could and should add to the security of my site by creating TLS certificates for people who intend to access my domain over TLS. Luckily with AWS Certificate Manager (ACM) this is extremely straightforward and I’ve got my subdomains and root domain verified in just a few clicks.
I could create a cloudfront distrobution to front an S3 static single page application to host my entire chatbot and invoke Amazon Lex with a cognito identity right from the browser.
As ransomware attacks have grown in number in recent months, the tactics and attack vectors also have evolved. While the primary method of attack used to be to target individual computer users within organizations with phishing emails and infected attachments, we’re increasingly seeing attacks that target weaknesses in businesses’ IT infrastructure.
How Ransomware Attacks Typically Work
In our previous posts on ransomware, we described the common vehicles used by hackers to infect organizations with ransomware viruses. Most often, downloaders distribute trojan horses through malicious downloads and spam emails. The emails contain a variety of file attachments, which if opened, will download and run one of the many ransomware variants. Once a user’s computer is infected with a malicious downloader, it will retrieve additional malware, which frequently includes crypto-ransomware. After the files have been encrypted, a ransom payment is demanded of the victim in order to decrypt the files.
What’s Changed With the Latest Ransomware Attacks?
In 2016, a customized ransomware strain called SamSam began attacking the servers in primarily health care institutions. SamSam, unlike more conventional ransomware, is not delivered through downloads or phishing emails. Instead, the attackers behind SamSam use tools to identify unpatched servers running Red Hat’s JBoss enterprise products. Once the attackers have successfully gained entry into one of these servers by exploiting vulnerabilities in JBoss, they use other freely available tools and scripts to collect credentials and gather information on networked computers. Then they deploy their ransomware to encrypt files on these systems before demanding a ransom. Gaining entry to an organization through its IT center rather than its endpoints makes this approach scalable and especially unsettling.
SamSam’s methodology is to scour the Internet searching for accessible and vulnerable JBoss application servers, especially ones used by hospitals. It’s not unlike a burglar rattling doorknobs in a neighborhood to find unlocked homes. When SamSam finds an unlocked home (unpatched server), the software infiltrates the system. It is then free to spread across the company’s network by stealing passwords. As it transverses the network and systems, it encrypts files, preventing access until the victims pay the hackers a ransom, typically between $10,000 and $15,000. The low ransom amount has encouraged some victimized organizations to pay the ransom rather than incur the downtime required to wipe and reinitialize their IT systems.
The success of SamSam is due to its effectiveness rather than its sophistication. SamSam can enter and transverse a network without human intervention. Some organizations are learning too late that securing internet-facing services in their data center from attack is just as important as securing endpoints.
The typical steps in a SamSam ransomware attack are:
1 Attackers gain access to vulnerable server
Attackers exploit vulnerable software or weak/stolen credentials.
2 Attack spreads via remote access tools
Attackers harvest credentials, create SOCKS proxies to tunnel traffic, and abuse RDP to install SamSam on more computers in the network.
3 Ransomware payload deployed
Attackers run batch scripts to execute ransomware on compromised machines.
4 Ransomware demand delivered requiring payment to decrypt files
Demand amounts vary from victim to victim. Relatively low ransom amounts appear to be designed to encourage quick payment decisions.
What all the organizations successfully exploited by SamSam have in common is that they were running unpatched servers that made them vulnerable to SamSam. Some organizations had their endpoints and servers backed up, while others did not. Some of those without backups they could use to recover their systems chose to pay the ransom money.
Timeline of SamSam History and Exploits
Since its appearance in 2016, SamSam has been in the news with many successful incursions into healthcare, business, and government institutions.
March 2016 SamSam appears
SamSam campaign targets vulnerable JBoss servers Attackers hone in on healthcare organizations specifically, as they’re more likely to have unpatched JBoss machines.
April 2016 SamSam finds new targets
SamSam begins targeting schools and government. After initial success targeting healthcare, attackers branch out to other sectors.
April 2017 New tactics include RDP
Attackers shift to targeting organizations with exposed RDP connections, and maintain focus on healthcare. An attack on Erie County Medical Center costs the hospital $10 million over three months of recovery.
January 2018 Municipalities attacked
• Attack on Municipality of Farmington, NM. • Attack on Hancock Health. • Attack on Adams Memorial Hospital • Attack on Allscripts (Electronic Health Records), which includes 180,000 physicians, 2,500 hospitals, and 7.2 million patients’ health records.
February 2018 Attack volume increases
• Attack on Davidson County, NC. • Attack on Colorado Department of Transportation.
March 2018 SamSam shuts down Atlanta
• Second attack on Colorado Department of Transportation. • City of Atlanta suffers a devastating attack by SamSam. The attack has far-reaching impacts — crippling the court system, keeping residents from paying their water bills, limiting vital communications like sewer infrastructure requests, and pushing the Atlanta Police Department to file paper reports. • SamSam campaign nets $325,000 in 4 weeks. Infections spike as attackers launch new campaigns. Healthcare and government organizations are once again the primary targets.
How to Defend Against SamSam and Other Ransomware Attacks
The best way to respond to a ransomware attack is to avoid having one in the first place. If you are attacked, making sure your valuable data is backed up and unreachable by ransomware infection will ensure that your downtime and data loss will be minimal or none if you ever suffer an attack.
In our previous post, How to Recover From Ransomware, we listed the ten ways to protect your organization from ransomware.
Use anti-virus and anti-malware software or other security policies to block known payloads from launching.
Make frequent, comprehensive backups of all important files and isolate them from local and open networks. Cybersecurity professionals view data backup and recovery (74% in a recent survey) by far as the most effective solution to respond to a successful ransomware attack.
Keep offline backups of data stored in locations inaccessible from any potentially infected computer, such as disconnected external storage drives or the cloud, which prevents them from being accessed by the ransomware.
Install the latest security updates issued by software vendors of your OS and applications. Remember to patch early and patch often to close known vulnerabilities in operating systems, server software, browsers, and web plugins.
Consider deploying security software to protect endpoints, email servers, and network systems from infection.
Exercise cyber hygiene, such as using caution when opening email attachments and links.
Segment your networks to keep critical computers isolated and to prevent the spread of malware in case of attack. Turn off unneeded network shares.
Turn off admin rights for users who don’t require them. Give users the lowest system permissions they need to do their work.
Restrict write permissions on file servers as much as possible.
Educate yourself, your employees, and your family in best practices to keep malware out of your systems. Update everyone on the latest email phishing scams and human engineering aimed at turning victims into abettors.
Please Tell Us About Your Experiences with Ransomware
Have you endured a ransomware attack or have a strategy to avoid becoming a victim? Please tell us of your experiences in the comments.
In December 2016, we launched the Outbound Network Access functionality for Amazon RDS for Oracle, enabling customers to use their RDS for Oracle database instances to communicate with external web endpoints using the utl_http and utl tcp packages, and sending emails through utl_smtp. We extended the functionality by adding the option of using custom DNS servers, allowing such outbound network accesses to make use of any DNS server a customer chooses to use. These releases enabled HTTP, TCP and SMTP communication originating out of RDS for Oracle instances – limited to non-secure (non-SSL) mediums.
To overcome the limitation over SSL connections, we recently published a whitepaper, that guides through the process of creating customized Oracle wallet bundles on your RDS for Oracle instances. By making use of such wallets, you can now extend the Outbound Network Access capability to have external communications happen over secure (SSL/TLS) connections. This opens up new use cases for your RDS for Oracle instances.
With the right set of certificates imported into your RDS for Oracle instances (through Oracle wallets), your database instances can now:
Communicate with a HTTPS endpoint: Using utl_http, access a resource such as https://status.aws.amazon.com/robots.txt
Download files from Amazon S3 securely: Using a presigned URL from Amazon S3, you can now download any file over SSL
Extending Oracle Database links to use SSL: Database links between RDS for Oracle instances can now use SSL as long as the instances have the SSL option installed
Sending email over SMTPS:
You can now integrate with Amazon SES to send emails from your database instances and any other generic SMTPS with which the provider can be integrated
These are just a few high-level examples of new use cases that have opened up with the whitepaper. As a reminder, always ensure to have best security practices in place when making use of Outbound Network Access (detailed in the whitepaper).
About the Author
Surya Nallu is a Software Development Engineer on the Amazon RDS for Oracle team.
Do you have a clear understanding of the hybrid cloud? If you don’t, it’s not surprising.
Hybrid cloud has been applied to a greater and more varied number of IT solutions than almost any other recent data management term. About the only thing that’s clear about the hybrid cloud is that the term hybrid cloud wasn’t invented by customers, but by vendors who wanted to hawk whatever solution du jour they happened to be pushing.
Let’s be honest. We’re in an industry that loves hype. We can’t resist grafting hyper, multi, ultra, and super and other prefixes onto the beginnings of words to entice customers with something new and shiny. The alphabet soup of cloud-related terms can include various options for where the cloud is located (on-premises, off-premises), whether the resources are private or shared in some degree (private, community, public), what type of services are offered (storage, computing), and what type of orchestrating software is used to manage the workflow and the resources. With so many moving parts, it’s no wonder potential users are confused.
Let’s take a step back, try to clear up the misconceptions, and come up with a basic understanding of what the hybrid cloud is. To be clear, this is our viewpoint. Others are free to do what they like, so bear that in mind.
So, What is the Hybrid Cloud?
The hybrid cloud refers to a cloud environment made up of a mixture of on-premises private cloud resources combined with third-party public cloud resources that use some kind of orchestration between them.
To get beyond the hype, let’s start with Forrester Research‘s idea of the hybrid cloud: “One or more public clouds connected to something in my data center. That thing could be a private cloud; that thing could just be traditional data center infrastructure.”
To put it simply, a hybrid cloud is a mash-up of on-premises and off-premises IT resources.
To expand on that a bit, we can say that the hybrid cloud refers to a cloud environment made up of a mixture of on-premises private cloud[1] resources combined with third-party public cloud resources that use some kind of orchestration[2] between them. The advantage of the hybrid cloud model is that it allows workloads and data to move between private and public clouds in a flexible way as demands, needs, and costs change, giving businesses greater flexibility and more options for data deployment and use.
In other words, if you have some IT resources in-house that you are replicating or augmenting with an external vendor, congrats, you have a hybrid cloud!
Private Cloud vs. Public Cloud
The cloud is really just a collection of purpose built servers. In a private cloud, the servers are dedicated to a single tenant or a group of related tenants. In a public cloud, the servers are shared between multiple unrelated tenants (customers). A public cloud is off-site, while a private cloud can be on-site or off-site — or on-prem or off-prem.
As an example, let’s look at a hybrid cloud meant for data storage, a hybrid data cloud. A company might set up a rule that says all accounting files that have not been touched in the last year are automatically moved off-prem to cloud storage to save cost and reduce the amount of storage needed on-site. The files are still available; they are just no longer stored on your local systems. The rules can be defined to fit an organization’s workflow and data retention policies.
The hybrid cloud concept also contains cloud computing. For example, at the end of the quarter, order processing application instances can be spun up off-premises in a hybrid computing cloud as needed to add to on-premises capacity.
Hybrid Cloud Benefits
If we accept that the hybrid cloud combines the best elements of private and public clouds, then the benefits of hybrid cloud solutions are clear, and we can identify the primary two benefits that result from the blending of private and public clouds.
Benefit 1: Flexibility and Scalability
Undoubtedly, the primary advantage of the hybrid cloud is its flexibility. It takes time and money to manage in-house IT infrastructure and adding capacity requires advance planning.
The cloud is ready and able to provide IT resources whenever needed on short notice. The term cloud bursting refers to the on-demand and temporary use of the public cloud when demand exceeds resources available in the private cloud. For example, some businesses experience seasonal spikes that can put an extra burden on private clouds. These spikes can be taken up by a public cloud. Demand also can vary with geographic location, events, or other variables. The public cloud provides the elasticity to deal with these and other anticipated and unanticipated IT loads. The alternative would be fixed cost investments in on-premises IT resources that might not be efficiently utilized.
For a data storage user, the on-premises private cloud storage provides, among other benefits, the highest speed access. For data that is not frequently accessed, or needed with the absolute lowest levels of latency, it makes sense for the organization to move it to a location that is secure, but less expensive. The data is still readily available, and the public cloud provides a better platform for sharing the data with specific clients, users, or with the general public.
Benefit 2: Cost Savings
The public cloud component of the hybrid cloud provides cost-effective IT resources without incurring capital expenses and labor costs. IT professionals can determine the best configuration, service provider, and location for each service, thereby cutting costs by matching the resource with the task best suited to it. Services can be easily scaled, redeployed, or reduced when necessary, saving costs through increased efficiency and avoiding unnecessary expenses.
Comparing Private vs Hybrid Cloud Storage Costs
To get an idea of the difference in storage costs between a purely on-premises solutions and one that uses a hybrid of private and public storage, we’ll present two scenarios. For each scenario we’ll use data storage amounts of 100 terabytes, 1 petabyte, and 2 petabytes. Each table is the same format, all we’ve done is change how the data is distributed: private (on-premises) cloud or public (off-premises) cloud. We are using the costs for our own B2 Cloud Storage in this example. The math can be adapted for any set of numbers you wish to use.
Scenario 1100% of data on-premises storage
Data Stored
Data stored On-Premises: 100%
100 TB
1,000 TB
2,000 TB
On-premises cost range
Monthly Cost
Low — $12/TB/Month
$1,200
$12,000
$24,000
High — $20/TB/Month
$2,000
$20,000
$40,000
Scenario 220% of data on-premises with 80% public cloud storage (B2)
Data Stored
Data stored On-Premises: 20%
20 TB
200 TB
400 TB
Data stored in Cloud: 80%
80 TB
800 TB
1,600 TB
On-premises cost range
Monthly Cost
Low — $12/TB/Month
$240
$2,400
$4,800
High — $20/TB/Month
$400
$4,000
$8,000
Public cloud cost range
Monthly Cost
Low — $5/TB/Month (B2)
$400
$4,000
$8,000
High — $20/TB/Month
$1,600
$16,000
$32,000
On-premises + public cloud cost range
Monthly Cost
Low
$640
$6,400
$12,800
High
$2,000
$20,000
$40,000
As can be seen in the numbers above, using a hybrid cloud solution and storing 80% of the data in the cloud with a provider such as Backblaze B2 can result in significant savings over storing only on-premises. For other cost scenarios, see the B2 Cost Calculator.
When Hybrid Might Not Always Be the Right Fit
There are circumstances where the hybrid cloud might not be the best solution. Smaller organizations operating on a tight IT budget might best be served by a purely public cloud solution. The cost of setting up and running private servers is substantial.
An application that requires the highest possible speed might not be suitable for hybrid, depending on the specific cloud implementation. While latency does play a factor in data storage for some users, it is less of a factor for uploading and downloading data than it is for organizations using the hybrid cloud for computing. Because Backblaze recognized the importance of speed and low-latency for customers wishing to use computing on data stored in B2, we directly connected our data centers with those of our computing partners, ensuring that latency would not be an issue even for a hybrid cloud computing solution.
It is essential to have a good understanding of workloads and their essential characteristics in order to make the hybrid cloud work well for you. Each application needs to be examined for the right mix of private cloud, public cloud, and traditional IT resources that fit the particular workload in order to benefit most from a hybrid cloud architecture.
The Hybrid Cloud Can Be a Win-Win Solution
From the high altitude perspective, any solution that enables an organization to respond in a flexible manner to IT demands is a win. Avoiding big upfront capital expenses for in-house IT infrastructure will appeal to the CFO. Being able to quickly spin up IT resources as they’re needed will appeal to the CTO and VP of Operations.
Should You Go Hybrid?
We’ve arrived at the bottom line and the question is, should you or your organization embrace hybrid cloud infrastructures?
According to 451 Research, by 2019, 69% of companies will operate in hybrid cloud environments, and 60% of workloads will be running in some form of hosted cloud service (up from 45% in 2017). That indicates that the benefits of the hybrid cloud appeal to a broad range of companies.
Clearly, depending on an organization’s needs, there are advantages to a hybrid solution. While it might have been possible to dismiss the hybrid cloud in the early days of the cloud as nothing more than a buzzword, that’s no longer true. The hybrid cloud has evolved beyond the marketing hype to offer real solutions for an increasingly complex and challenging IT environment.
If an organization approaches the hybrid cloud with sufficient planning and a structured approach, a hybrid cloud can deliver on-demand flexibility, empower legacy systems and applications with new capabilities, and become a catalyst for digital transformation. The result can be an elastic and responsive infrastructure that has the ability to quickly respond to changing demands of the business.
As data management professionals increasingly recognize the advantages of the hybrid cloud, we can expect more and more of them to embrace it as an essential part of their IT strategy.
Tell Us What You’re Doing with the Hybrid Cloud
Are you currently embracing the hybrid cloud, or are you still uncertain or hanging back because you’re satisfied with how things are currently? Maybe you’ve gone totally hybrid. We’d love to hear your comments below on how you’re dealing with the hybrid cloud.
[1] Private cloud can be on-premises or a dedicated off-premises facility.
[2] Hybrid cloud orchestration solutions are often proprietary, vertical, and task dependent.
…we present Oblivious DNS (ODNS), which is a new design of the DNS ecosystem that allows current DNS servers to remain unchanged and increases privacy for data in motion and at rest. In the ODNS system, both the client is modified with a local resolver, and there is a new authoritative name server for .odns. To prevent an eavesdropper from learning information, the DNS query must be encrypted; the client generates a request for www.foo.com, generates a session key k, encrypts the requested domain, and appends the TLD domain .odns, resulting in {www.foo.com}k.odns. The client forwards this, with the session key encrypted under the .odns authoritative server’s public key ({k}PK) in the “Additional Information” record of the DNS query to the recursive resolver, which then forwards it to the authoritative name server for .odns. The authoritative server decrypts the session key with his private key, and then subsequently decrypts the requested domain with the session key. The authoritative server then forwards the DNS request to the appropriate name server, acting as a recursive resolver. While the name servers see incoming DNS requests, they do not know which clients they are coming from; additionally, an eavesdropper cannot connect a client with her corresponding DNS queries.
Thanks to Raja Mani, AWS Solutions Architect, for this great blog.
—
In this blog post, I’ll walk you through the steps for setting up continuous replication of an AWS CodeCommit repository from one AWS region to another AWS region using a serverless architecture. CodeCommit is a fully-managed, highly scalable source control service that stores anything from source code to binaries. It works seamlessly with your existing Git tools and eliminates the need to operate your own source control system. Replicating an AWS CodeCommit repository from one AWS region to another AWS region enables you to achieve lower latency pulls for global developers. This same approach can also be used to automatically back up repositories currently hosted on other services (for example, GitHub or BitBucket) to AWS CodeCommit.
This solution uses AWS Lambda and AWS Fargate for continuous replication. Benefits of this approach include:
The replication process can be easily setup to trigger based on events, such as commits made to the repository.
Setting up a serverless architecture means you don’t need to provision, maintain, or administer servers.
Note: AWS Fargate has a limitation of 10 GB for storage and is available in US East (N. Virginia) region. A similar solution that uses Amazon EC2 instances to replicate the repositories on a schedule was published in a previous blog and can be used if your repository does not meet these conditions.
Replication using Fargate
As you follow this blog post, you’ll set up an architecture that looks like this:
Any change in the AWS CodeCommit repository will trigger a Lambda function. The Lambda function will call the Fargate task that replicates the repository using a Git command line tool.
Let us assume a user wants to replicate a repository (Source) from US East (N. Virginia/us-east-1) region to a repository (Destination) in US West (Oregon/us-west-2) region. I’ll walk you through the steps for it:
Prerequisites
Create an AWS Service IAM role for Amazon EC2 that has permission for both source and destination repositories, IAM CreateRole, AttachRolePolicy and Amazon ECR privileges. Here is the EC2 role policy I used:
You need a Docker environment to build this solution. You can launch an EC2 instance and install Docker (or) you can use AWS Cloud9 that comes with Docker and Git preinstalled. I used an EC2 instance and installed Docker in it. Use the IAM role created in the previous step when creating the EC2 instance. I am going to refer this environment as “Docker Environment” in the following steps.
You need to install the AWS CLI on the Docker environment. For AWS CLI installation, refer this page.
You need to install Git, including a Git command line on the Docker environment.
Step 1: Create the Docker image
To create the Docker image, first it needs a Dockerfile. A Dockerfile is a manifest that describes the base image to use for your Docker image and what you want installed and running on it. For more information about Dockerfiles, go to the Dockerfile Reference.
1. Choose a directory in the Docker environment and perform the following steps in that directory. I used /home/ec2-user directory to perform the following steps.
2. Clone the AWS CodeCommit repository in the Docker environment. Open the terminal to the Docker environment and run the following commands to clone your source AWS CodeCommit repository (I ran the commands from /home/ec2-user directory):
Note: Change the URL marked in red to your source and destination repository URL.
3. Create a file called Dockerfile (case sensitive) with the following content (I created it in /home/ec2-user directory):
# Pull the Amazon Linux latest base image
FROM amazonlinux:latest
#Install aws-cli and git command line tools
RUN yum -y install unzip aws-cli
RUN yum -y install git
WORKDIR /home/ec2-user
RUN mkdir LocalRepository
WORKDIR /home/ec2-user/LocalRepository
#Copy Cloned CodeCommit repository to Docker container
COPY ./LocalRepository /home/ec2-user/LocalRepository
#Copy shell script that does the replication
COPY ./repl_repository.bash /home/ec2-user/LocalRepository
RUN chmod ugo+rwx /home/ec2-user/LocalRepository/repl_repository.bash
WORKDIR /home/ec2-user/LocalRepository
#Call this script when Docker starts the container
ENTRYPOINT ["/home/ec2-user/LocalRepository/repl_repository.bash"]
4. Copy the following shell script into a file called repl_repository.bash to the DockerFile directory location in the Docker environment (I created it in /home/ec2-user directory)
6. Verify whether the replication is working by running the repl_repository.bash script from the LocalRepository directory. Go to LocalRepository directory and run this command: . ../repl_repository.bash If it is successful, you will get the “Everything up-to-date” at the last line of the result like this:
$ . ../repl_repository.bash
Everything up-to-date
Step 2: Build the Docker Image
1. Build the Docker image by running this command from the directory where you created the DockerFile in the Docker environment in the previous step (I ran it from /home/ec2-user directory):
$ docker build . –t ccrepl
Output: It installs various packages and set environment variables as part of steps 1 to 3 from the Dockerfile. The steps 4 to 11 from the Dockerfile should produce an output similar to the following:
2. Run the following command to verify that the image was created successfully. It will display “Everything up-to-date” at the end if it is successful.
[[email protected] LocalRepository]$ docker run ccrepl
Everything up-to-date
Step 3: Push the Docker Image to Amazon Elastic Container Registry (ECR)
Perform the following steps in the Docker Environment.
1. Run the AWS CLI configure command and set default region as your source repository region (I used us-east-1).
$ aws configure set default.region <Source Repository Region>
2. Create an Amazon ECR repository using this command to store your ccrepl image (Note the repositoryUri in the output):
2. Create a role called AccessRoleForCCfromFG using the following command in the DockerEnvironment:
$ aws iam create-role – role-name AccessRoleForCCfromFG – assume-role-policy-document file://trustpolicyforecs.json
3. Assign CodeCommit service full access to the above role using the following command in the DockerEnvironment:
$ aws iam attach-role-policy – policy-arn arn:aws:iam::aws:policy/AWSCodeCommitFullAccess – role-name AccessRoleForCCfromFG
4. In the Amazon ECS Console, choose Repositories and select the ccrepl repository that was created in the previous step. Copy the Repository URI.
5. In the Amazon ECS Console, choose Task Definitions and click Create New Task Definition.
6. Select launch type compatibility as FARGATE and click Next Step.
7. In the create task definition screen, do the following:
In Task Definition Name, type ccrepl
In Task Role, choose AccessRoleForCCfromFG
In Task Memory, choose 2GB
In Task CPU, choose 1 vCPU
Click Add Container under Container Definitions in the same screen. In the Add Container screen, do the following:
Enter Container name as ccreplcont
Enter Image URL copied from step 4
Enter Memory Limits as 128 and click Add.
Note: Select TaskExecutionRole as “ecsTaskExecutionRole” if it already exists. If not, select create new role and it will create “ecsTaskExecutionRole” for you.
8. Click the Create button in the task definition screen to create the task. It will successfully create the task, execution role and AWS CloudWatch Log groups.
9. In the Amazon ECS Console, click Clusters and create cluster. Select template as “Networking only, Powered by AWS Fargate” and click next step.
10. Enter cluster name as ccreplcluster and click create.
Step 5: Create the Lambda Function
In this section, I used Amazon Elastic Container Service (ECS) run task API from Lambda to invoke the Fargate task.
1. In the IAM Console, create a new role called ECSLambdaRole with the permissions to AWS CodeCommit, Amazon ECS as well as pass roles privileges needed to run the ECS task. Your statement should look similar to the following (replace <your account id>):
2. In AWS management console, select VPC service and click subnets in the left navigation screen. Note down the Subnet IDs that you want to run the Fargate task in.
3. Create a new Lambda Node.js function called FargateTaskExecutionFunc and assign the role ECSLambdaRole with the following content:
Note: Replace subnets values (marked in red color) with the subnet IDs you identified as the subnets you wanted to run the Fargate task on in Step 2 of this section.
1. In the Lambda Console, click FargateTaskExecutionFunc under functions.
2. Under Add triggers in the Designer, select CodeCommit
3. In the Configure triggers screen, do the following:
Enter Repository name as Source (your source repository name)
Enter trigger name as LambdaTrigger
Leave the Events as “All repository events”
Leave the Branch names as “All branches”
Click Add button
Click Save button to save the changes
Step 6: Verification
To test the application, make a commit and push the changes to the source repository in AWS CodeCommit. That should automatically trigger the Lambda function and replicate the changes in the destination repository. You can verify this by checking CloudWatch Logs for Lambda and ECS, or simply going to the destination repository and verifying the change appears.
Conclusion
Congratulations! You have successfully configured repository replication of an AWS CodeCommit repository using AWS Lambda and AWS Fargate. You can use this technique in a deployment pipeline. You can also tweak the trigger configuration in AWS CodeCommit to call the Lambda function in response to any supported trigger event in AWS CodeCommit.
We have several upcoming tech talks in the month of April and early May. Come join us to learn about AWS services and solution offerings. We’ll have AWS experts online to help answer questions in real-time. Sign up now to learn more, we look forward to seeing you.
Note – All sessions are free and in Pacific Time.
April & early May — 2018 Schedule
Compute
April 30, 2018 | 01:00 PM – 01:45 PM PT – Best Practices for Running Amazon EC2 Spot Instances with Amazon EMR (300) – Learn about the best practices for scaling big data workloads as well as process, store, and analyze big data securely and cost effectively with Amazon EMR and Amazon EC2 Spot Instances.
May 1, 2018 | 01:00 PM – 01:45 PM PT – How to Bring Microsoft Apps to AWS (300) – Learn more about how to save significant money by bringing your Microsoft workloads to AWS.
May 2, 2018 | 01:00 PM – 01:45 PM PT – Deep Dive on Amazon EC2 Accelerated Computing (300) – Get a technical deep dive on how AWS’ GPU and FGPA-based compute services can help you to optimize and accelerate your ML/DL and HPC workloads in the cloud.
April 25, 2018 | 01:00 PM – 01:45 PM PT – Intro to Open Source Databases on AWS (200) – Learn how to tap the benefits of open source databases on AWS without the administrative hassle.
April 24, 2018 | 11:00 AM – 11:45 AM PT – Deploy your Desktops and Apps on AWS (300) – Learn how to deploy your desktops and apps on AWS with Amazon WorkSpaces and Amazon AppStream 2.0
IoT
May 2, 2018 | 11:00 AM – 11:45 AM PT – How to Easily and Securely Connect Devices to AWS IoT (200) – Learn how to easily and securely connect devices to the cloud and reliably scale to billions of devices and trillions of messages with AWS IoT.
April 30, 2018 | 11:00 AM – 11:45 AM PT – Offline GraphQL Apps with AWS AppSync (300) – Come learn how to enable real-time and offline data in your applications with GraphQL using AWS AppSync.
Networking
May 2, 2018 | 09:00 AM – 09:45 AM PT – Taking Serverless to the Edge (300) – Learn how to run your code closer to your end users in a serverless fashion. Also, David Von Lehman from Aerobatic will discuss how they used [email protected] to reduce latency and cloud costs for their customer’s websites.
May 3, 2018 | 09:00 AM – 09:45 AM PT – Protect Your Game Servers from DDoS Attacks (200) – Learn how to use the new AWS Shield Advanced for EC2 to protect your internet-facing game servers against network layer DDoS attacks and application layer attacks of all kinds.
May 1, 2018 | 11:00 AM – 11:45 AM PT – Building Data Lakes That Cost Less and Deliver Results Faster (300) – Learn how Amazon S3 Select And Amazon Glacier Select increase application performance by up to 400% and reduce total cost of ownership by extending your data lake into cost-effective archive storage.
May 3, 2018 | 11:00 AM – 11:45 AM PT – Integrating On-Premises Vendors with AWS for Backup (300) – Learn how to work with AWS and technology partners to build backup & restore solutions for your on-premises, hybrid, and cloud native environments.
American Public Television was like many organizations that have been around for a while. They were entrenched using an older technology — in their case, tape storage and distribution — that once met their needs but was limiting their productivity and preventing them from effectively collaborating with their many media partners. APT’s VP of Technology knew that he needed to move into the future and embrace cloud storage to keep APT ahead of the game.
Since 1961, American Public Television (APT) has been a leading distributor of groundbreaking, high-quality, top-rated programming to the nation’s public television stations. Gerry Field is the Vice President of Technology at APT and is responsible for delivering their extensive program catalog to 350+ public television stations nationwide.
In the time since Gerry joined APT in 2007, the industry has been in digital overdrive. During that time APT has continued to acquire and distribute the best in public television programming to their technically diverse subscribers.
This created two challenges for Gerry. First, new technology and format proliferation were driving dramatic increases in digital storage. Second, many of APT’s subscribers struggled to keep up with the rapidly changing industry. While some subscribers had state-of-the-art satellite systems to receive programming, others had to wait for the post office to drop off programs recorded on tape weeks earlier. With no slowdown on the horizon of innovation in the industry, Gerry knew that his storage and distribution systems would reach a crossroads in no time at all.
Living the tape paradigm
The digital media industry is only a few years removed from its film, and later videotape, roots. Tape was the input and the output of the industry for many years. As a consequence, the tools and workflows used by the industry were built and designed to work with tape. Over time, the “file” slowly replaced the tape as the object to be captured, edited, stored and distributed. Trouble was, many of the systems and more importantly workflows were based on processing tape, and these have proven to be hard to change.
At APT, Gerry realized the limits of the tape paradigm and began looking for technologies and solutions that enabled workflows based on file and object based storage and distribution.
Thinking file based storage and distribution
For data (digital media) storage, APT, like everyone else, started by installing onsite storage servers. As the amount of digital data grew, more storage was added. In addition, APT was expanding its distribution footprint by creating or partnering with distribution channels such as CreateTV and APT Worldwide. This dramatically increased the number of programming formats and the amount of data that had to be stored. As a consequence, updating, maintaining, and managing the APT storage systems was becoming a major challenge and a major resource hog.
Knowing that his in-house storage system was only going to cost more time and money, Gerry decided it was time to look at cloud storage. But that wasn’t the only reason he looked at the cloud. While most people consider cloud storage as just a place to back up and archive files, Gerry was envisioning how the ubiquity of the cloud could help solve his distribution challenges. The trouble was the price of cloud storage from vendors like Amazon S3 and Microsoft Azure was a non-starter, especially for a non-profit. Then Gerry came across Backblaze. B2 Cloud Storage service met all of his performance requirements, and at $0.005/GB/month for storage and $0.01/GB for downloads it was nearly 75% less than S3 or Azure.
Gerry did the math and found that he could economically incorporate B2 Cloud Storage into his IT portfolio, using it for both program submission and for active storage and archiving of the APT programs. In addition, B2 now gives him the foundation necessary to receive and distribute programming content over the Internet. This is especially useful for organizations that can’t conveniently access satellite distribution systems. Not to mention downloading from the cloud is much faster than sending a tape through the mail.
Adding B2 Cloud Storage to their infrastructure has helped American Public Television address two key challenges. First, they now have “unlimited” storage in the cloud without having to add any hardware. In addition, with B2, they only pay for the storage they use. That means they don’t have to buy storage upfront trying to match the maximum amount of storage they’ll ever need. Second, by using B2 as a distribution source for their programming APT subscribers, especially the smaller and remote ones, can get content faster and more reliably without having to perform costly upgrades to their infrastructure.
The road ahead
As APT gets used to their file based infrastructure and workflow, there are a number of cost saving and income generating ideas they are pondering which are now worth considering. Here are a few:
Program Submissions — New content can be uploaded from anywhere using a web browser, an Internet connection, and a login. For example, a producer in Cambodia can upload their film to B2. From there the film is downloaded to an in-house system where it is processed and transcoded using compute. The finished film is added to the APT catalog and added to B2. Once there, the program is instantly available for subscribers to order and download.
“The affordability and performance of Backblaze B2 is what allowed us to make the B2 cloud part of the APT data storage and distribution strategy into the future.” — Gerry Field
Easier Previews — At any time, work in process or finished programs can be made available for download from the B2 cloud. One place this could be useful is where a subscriber needs to review a program to comply with local policies and practices before airing. In the old system, each “one-off” was a time consuming manual process.
Instant Subscriptions — There are many organizations such as schools and businesses that want to use just one episode of a desired show. With an e-commerce based website, current or even archived programming kept in B2 could be available to download or stream for a minimal charge.
At APT there were multiple technologies needed to make their file-based infrastructure work, but as Gerry notes, having an affordable, trustworthy, cloud storage service like B2 is one of the critical building blocks needed to make everything work together.
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.