Tag Archives: versioning

Storing Encrypted Credentials In Git

Post Syndicated from Bozho original https://techblog.bozho.net/storing-encrypted-credentials-in-git/

We all know that we should not commit any passwords or keys to the repo with our code (no matter if public or private). Yet, thousands of production passwords can be found on GitHub (and probably thousands more in internal company repositories). Some have tried to fix that by removing the passwords (once they learned it’s not a good idea to store them publicly), but passwords have remained in the git history.

Knowing what not to do is the first and very important step. But how do we store production credentials. Database credentials, system secrets (e.g. for HMACs), access keys for 3rd party services like payment providers or social networks. There doesn’t seem to be an agreed upon solution.

I’ve previously argued with the 12-factor app recommendation to use environment variables – if you have a few that might be okay, but when the number of variables grow (as in any real application), it becomes impractical. And you can set environment variables via a bash script, but you’d have to store it somewhere. And in fact, even separate environment variables should be stored somewhere.

This somewhere could be a local directory (risky), a shared storage, e.g. FTP or S3 bucket with limited access, or a separate git repository. I think I prefer the git repository as it allows versioning (Note: S3 also does, but is provider-specific). So you can store all your environment-specific properties files with all their credentials and environment-specific configurations in a git repo with limited access (only Ops people). And that’s not bad, as long as it’s not the same repo as the source code.

Such a repo would look like this:

project
└─── production
|   |   application.properites
|   |   keystore.jks
└─── staging
|   |   application.properites
|   |   keystore.jks
└─── on-premise-client1
|   |   application.properites
|   |   keystore.jks
└─── on-premise-client2
|   |   application.properites
|   |   keystore.jks

Since many companies are using GitHub or BitBucket for their repositories, storing production credentials on a public provider may still be risky. That’s why it’s a good idea to encrypt the files in the repository. A good way to do it is via git-crypt. It is “transparent” encryption because it supports diff and encryption and decryption on the fly. Once you set it up, you continue working with the repo as if it’s not encrypted. There’s even a fork that works on Windows.

You simply run git-crypt init (after you’ve put the git-crypt binary on your OS Path), which generates a key. Then you specify your .gitattributes, e.g. like that:

secretfile filter=git-crypt diff=git-crypt
*.key filter=git-crypt diff=git-crypt
*.properties filter=git-crypt diff=git-crypt
*.jks filter=git-crypt diff=git-crypt

And you’re done. Well, almost. If this is a fresh repo, everything is good. If it is an existing repo, you’d have to clean up your history which contains the unencrypted files. Following these steps will get you there, with one addition – before calling git commit, you should call git-crypt status -f so that the existing files are actually encrypted.

You’re almost done. We should somehow share and backup the keys. For the sharing part, it’s not a big issue to have a team of 2-3 Ops people share the same key, but you could also use the GPG option of git-crypt (as documented in the README). What’s left is to backup your secret key (that’s generated in the .git/git-crypt directory). You can store it (password-protected) in some other storage, be it a company shared folder, Dropbox/Google Drive, or even your email. Just make sure your computer is not the only place where it’s present and that it’s protected. I don’t think key rotation is necessary, but you can devise some rotation procedure.

git-crypt authors claim to shine when it comes to encrypting just a few files in an otherwise public repo. And recommend looking at git-remote-gcrypt. But as often there are non-sensitive parts of environment-specific configurations, you may not want to encrypt everything. And I think it’s perfectly fine to use git-crypt even in a separate repo scenario. And even though encryption is an okay approach to protect credentials in your source code repo, it’s still not necessarily a good idea to have the environment configurations in the same repo. Especially given that different people/teams manage these credentials. Even in small companies, maybe not all members have production access.

The outstanding questions in this case is – how do you sync the properties with code changes. Sometimes the code adds new properties that should be reflected in the environment configurations. There are two scenarios here – first, properties that could vary across environments, but can have default values (e.g. scheduled job periods), and second, properties that require explicit configuration (e.g. database credentials). The former can have the default values bundled in the code repo and therefore in the release artifact, allowing external files to override them. The latter should be announced to the people who do the deployment so that they can set the proper values.

The whole process of having versioned environment-speific configurations is actually quite simple and logical, even with the encryption added to the picture. And I think it’s a good security practice we should try to follow.

The post Storing Encrypted Credentials In Git appeared first on Bozho's tech blog.

Securing Your Cryptocurrency

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/backing-up-your-cryptocurrency/

Securing Your Cryptocurrency

In our blog post on Tuesday, Cryptocurrency Security Challenges, we wrote about the two primary challenges faced by anyone interested in safely and profitably participating in the cryptocurrency economy: 1) make sure you’re dealing with reputable and ethical companies and services, and, 2) keep your cryptocurrency holdings safe and secure.

In this post, we’re going to focus on how to make sure you don’t lose any of your cryptocurrency holdings through accident, theft, or carelessness. You do that by backing up the keys needed to sell or trade your currencies.

$34 Billion in Lost Value

Of the 16.4 million bitcoins said to be in circulation in the middle of 2017, close to 3.8 million may have been lost because their owners no longer are able to claim their holdings. Based on today’s valuation, that could total as much as $34 billion dollars in lost value. And that’s just bitcoins. There are now over 1,500 different cryptocurrencies, and we don’t know how many of those have been misplaced or lost.



Now that some cryptocurrencies have reached (at least for now) staggering heights in value, it’s likely that owners will be more careful in keeping track of the keys needed to use their cryptocurrencies. For the ones already lost, however, the owners have been separated from their currencies just as surely as if they had thrown Benjamin Franklins and Grover Clevelands over the railing of a ship.

The Basics of Securing Your Cryptocurrencies

In our previous post, we reviewed how cryptocurrency keys work, and the common ways owners can keep track of them. A cryptocurrency owner needs two keys to use their currencies: a public key that can be shared with others is used to receive currency, and a private key that must be kept secure is used to spend or trade currency.

Many wallets and applications allow the user to require extra security to access them, such as a password, or iris, face, or thumb print scan. If one of these options is available in your wallets, take advantage of it. Beyond that, it’s essential to back up your wallet, either using the backup feature built into some applications and wallets, or manually backing up the data used by the wallet. When backing up, it’s a good idea to back up the entire wallet, as some wallets require additional private data to operate that might not be apparent.

No matter which backup method you use, it is important to back up often and have multiple backups, preferable in different locations. As with any valuable data, a 3-2-1 backup strategy is good to follow, which ensures that you’ll have a good backup copy if anything goes wrong with one or more copies of your data.

One more caveat, don’t reuse passwords. This applies to all of your accounts, but is especially important for something as critical as your finances. Don’t ever use the same password for more than one account. If security is breached on one of your accounts, someone could connect your name or ID with other accounts, and will attempt to use the password there, as well. Consider using a password manager such as LastPass or 1Password, which make creating and using complex and unique passwords easy no matter where you’re trying to sign in.

Approaches to Backing Up Your Cryptocurrency Keys

There are numerous ways to be sure your keys are backed up. Let’s take them one by one.

1. Automatic backups using a backup program

If you’re using a wallet program on your computer, for example, Bitcoin Core, it will store your keys, along with other information, in a file. For Bitcoin Core, that file is wallet.dat. Other currencies will use the same or a different file name and some give you the option to select a name for the wallet file.

To back up the wallet.dat or other wallet file, you might need to tell your backup program to explicitly back up that file. Users of Backblaze Backup don’t have to worry about configuring this, since by default, Backblaze Backup will back up all data files. You should determine where your particular cryptocurrency, wallet, or application stores your keys, and make sure the necessary file(s) are backed up if your backup program requires you to select which files are included in the backup.

Backblaze B2 is an option for those interested in low-cost and high security cloud storage of their cryptocurrency keys. Backblaze B2 supports 2-factor verification for account access, works with a number of apps that support automatic backups with encryption, error-recovery, and versioning, and offers an API and command-line interface (CLI), as well. The first 10GB of storage is free, which could be all one needs to store encrypted cryptocurrency keys.

2. Backing up by exporting keys to a file

Apps and wallets will let you export your keys from your app or wallet to a file. Once exported, your keys can be stored on a local drive, USB thumb drive, DAS, NAS, or in the cloud with any cloud storage or sync service you wish. Encrypting the file is strongly encouraged — more on that later. If you use 1Password or LastPass, or other secure notes program, you also could store your keys there.

3. Backing up by saving a mnemonic recovery seed

A mnemonic phrase, mnemonic recovery phrase, or mnemonic seed is a list of words that stores all the information needed to recover a cryptocurrency wallet. Many wallets will have the option to generate a mnemonic backup phrase, which can be written down on paper. If the user’s computer no longer works or their hard drive becomes corrupted, they can download the same wallet software again and use the mnemonic recovery phrase to restore their keys.

The phrase can be used by anyone to recover the keys, so it must be kept safe. Mnemonic phrases are an excellent way of backing up and storing cryptocurrency and so they are used by almost all wallets.

A mnemonic recovery seed is represented by a group of easy to remember words. For example:

eye female unfair moon genius pipe nuclear width dizzy forum cricket know expire purse laptop scale identify cube pause crucial day cigar noise receive

The above words represent the following seed:

0a5b25e1dab6039d22cd57469744499863962daba9d2844243fec 9c0313c1448d1a0b2cd9e230a78775556f9b514a8be45802c2808e fd449a20234e9262dfa69

These words have certain properties:

  • The first four letters are enough to unambiguously identify the word.
  • Similar words are avoided (such as: build and built).

Bitcoin and most other cryptocurrencies such as Litecoin, Ethereum, and others use mnemonic seeds that are 12 to 24 words long. Other currencies might use different length seeds.

4. Physical backups — Paper, Metal

Some cryptocurrency holders believe that their backup, or even all their cryptocurrency account information, should be stored entirely separately from the internet to avoid any risk of their information being compromised through hacks, exploits, or leaks. This type of storage is called “cold storage.” One method of cold storage involves printing out the keys to a piece of paper and then erasing any record of the keys from all computer systems. The keys can be entered into a program from the paper when needed, or scanned from a QR code printed on the paper.

Printed public and private keys

Printed public and private keys

Some who go to extremes suggest separating the mnemonic needed to access an account into individual pieces of paper and storing those pieces in different locations in the home or office, or even different geographical locations. Some say this is a bad idea since it could be possible to reconstruct the mnemonic from one or more pieces. How diligent you wish to be in protecting these codes is up to you.

Mnemonic recovery phrase booklet

Mnemonic recovery phrase booklet

There’s another option that could make you the envy of your friends. That’s the CryptoSteel wallet, which is a stainless steel metal case that comes with more than 250 stainless steel letter tiles engraved on each side. Codes and passwords are assembled manually from the supplied part-randomized set of tiles. Users are able to store up to 96 characters worth of confidential information. Cryptosteel claims to be fireproof, waterproof, and shock-proof.

image of a Cryptosteel cold storage device

Cryptosteel cold wallet

Of course, if you leave your Cryptosteel wallet in the pocket of a pair of ripped jeans that gets thrown out by the housekeeper, as happened to the character Russ Hanneman on the TV show Silicon Valley in last Sunday’s episode, then you’re out of luck. That fictional billionaire investor lost a USB drive with $300 million in cryptocoins. Let’s hope that doesn’t happen to you.

Encryption & Security

Whether you store your keys on your computer, an external disk, a USB drive, DAS, NAS, or in the cloud, you want to make sure that no one else can use those keys. The best way to handle that is to encrypt the backup.

With Backblaze Backup for Windows and Macintosh, your backups are encrypted in transmission to the cloud and on the backup server. Users have the option to add an additional level of security by adding a Personal Encryption Key (PEK), which secures their private key. Your cryptocurrency backup files are secure in the cloud. Using our web or mobile interface, previous versions of files can be accessed, as well.

Our object storage cloud offering, Backblaze B2, can be used with a variety of applications for Windows, Macintosh, and Linux. With B2, cryptocurrency users can choose whichever method of encryption they wish to use on their local computers and then upload their encrypted currency keys to the cloud. Depending on the client used, versioning and life-cycle rules can be applied to the stored files.

Other backup programs and systems provide some or all of these capabilities, as well. If you are backing up to a local drive, it is a good idea to encrypt the local backup, which is an option in some backup programs.

Address Security

Some experts recommend using a different address for each cryptocurrency transaction. Since the address is not the same as your wallet, this means that you are not creating a new wallet, but simply using a new identifier for people sending you cryptocurrency. Creating a new address is usually as easy as clicking a button in the wallet.

One of the chief advantages of using a different address for each transaction is anonymity. Each time you use an address, you put more information into the public ledger (blockchain) about where the currency came from or where it went. That means that over time, using the same address repeatedly could mean that someone could map your relationships, transactions, and incoming funds. The more you use that address, the more information someone can learn about you. For more on this topic, refer to Address reuse.

Note that a downside of using a paper wallet with a single key pair (type-0 non-deterministic wallet) is that it has the vulnerabilities listed above. Each transaction using that paper wallet will add to the public record of transactions associated with that address. Newer wallets, i.e. “deterministic” or those using mnemonic code words support multiple addresses and are now recommended.

There are other approaches to keeping your cryptocurrency transaction secure. Here are a couple of them.

Multi-signature

Multi-signature refers to requiring more than one key to authorize a transaction, much like requiring more than one key to open a safe. It is generally used to divide up responsibility for possession of cryptocurrency. Standard transactions could be called “single-signature transactions” because transfers require only one signature — from the owner of the private key associated with the currency address (public key). Some wallets and apps can be configured to require more than one signature, which means that a group of people, businesses, or other entities all must agree to trade in the cryptocurrencies.

Deep Cold Storage

Deep cold storage ensures the entire transaction process happens in an offline environment. There are typically three elements to deep cold storage.

First, the wallet and private key are generated offline, and the signing of transactions happens on a system not connected to the internet in any manner. This ensures it’s never exposed to a potentially compromised system or connection.

Second, details are secured with encryption to ensure that even if the wallet file ends up in the wrong hands, the information is protected.

Third, storage of the encrypted wallet file or paper wallet is generally at a location or facility that has restricted access, such as a safety deposit box at a bank.

Deep cold storage is used to safeguard a large individual cryptocurrency portfolio held for the long term, or for trustees holding cryptocurrency on behalf of others, and is possibly the safest method to ensure a crypto investment remains secure.

Keep Your Software Up to Date

You should always make sure that you are using the latest version of your app or wallet software, which includes important stability and security fixes. Installing updates for all other software on your computer or mobile device is also important to keep your wallet environment safer.

One Last Thing: Think About Your Testament

Your cryptocurrency funds can be lost forever if you don’t have a backup plan for your peers and family. If the location of your wallets or your passwords is not known by anyone when you are gone, there is no hope that your funds will ever be recovered. Taking a bit of time on these matters can make a huge difference.

To the Moon*

Are you comfortable with how you’re managing and backing up your cryptocurrency wallets and keys? Do you have a suggestion for keeping your cryptocurrencies safe that we missed above? Please let us know in the comments.


*To the Moon — Crypto slang for a currency that reaches an optimistic price projection.

The post Securing Your Cryptocurrency appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

How to Easily Apply Amazon Cloud Directory Schema Changes with In-Place Schema Upgrades

Post Syndicated from Mahendra Chheda original https://aws.amazon.com/blogs/security/how-to-easily-apply-amazon-cloud-directory-schema-changes-with-in-place-schema-upgrades/

Now, Amazon Cloud Directory makes it easier for you to apply schema changes across your directories with in-place schema upgrades. Your directory now remains available while Cloud Directory applies backward-compatible schema changes such as the addition of new fields. Without migrating data between directories or applying code changes to your applications, you can upgrade your schemas. You also can view the history of your schema changes in Cloud Directory by using version identifiers, which help you track and audit schema versions across directories. If you have multiple instances of a directory with the same schema, you can view the version history of schema changes to manage your directory fleet and ensure that all directories are running with the same schema version.

In this blog post, I demonstrate how to perform an in-place schema upgrade and use schema versions in Cloud Directory. I add additional attributes to an existing facet and add a new facet to a schema. I then publish the new schema and apply it to running directories, upgrading the schema in place. I also show how to view the version history of a directory schema, which helps me to ensure my directory fleet is running the same version of the schema and has the correct history of schema changes applied to it.

Note: I share Java code examples in this post. I assume that you are familiar with the AWS SDK and can use Java-based code to build a Cloud Directory code example. You can apply the concepts I cover in this post to other programming languages such as Python and Ruby.

Cloud Directory fundamentals

I will start by covering a few Cloud Directory fundamentals. If you are already familiar with the concepts behind Cloud Directory facets, schemas, and schema lifecycles, you can skip to the next section.

Facets: Groups of attributes. You use facets to define object types. For example, you can define a device schema by adding facets such as computers, phones, and tablets. A computer facet can track attributes such as serial number, make, and model. You can then use the facets to create computer objects, phone objects, and tablet objects in the directory to which the schema applies.

Schemas: Collections of facets. Schemas define which types of objects can be created in a directory (such as users, devices, and organizations) and enforce validation of data for each object class. All data within a directory must conform to the applied schema. As a result, the schema definition is essentially a blueprint to construct a directory with an applied schema.

Schema lifecycle: The four distinct states of a schema: Development, Published, Applied, and Deleted. Schemas in the Published and Applied states have version identifiers and cannot be changed. Schemas in the Applied state are used by directories for validation as applications insert or update data. You can change schemas in the Development state as many times as you need them to. In-place schema upgrades allow you to apply schema changes to an existing Applied schema in a production directory without the need to export and import the data populated in the directory.

How to add attributes to a computer inventory application schema and perform an in-place schema upgrade

To demonstrate how to set up schema versioning and perform an in-place schema upgrade, I will use an example of a computer inventory application that uses Cloud Directory to store relationship data. Let’s say that at my company, AnyCompany, we use this computer inventory application to track all computers we give to our employees for work use. I previously created a ComputerSchema and assigned its version identifier as 1. This schema contains one facet called ComputerInfo that includes attributes for SerialNumber, Make, and Model, as shown in the following schema details.

Schema: ComputerSchema
Version: 1

Facet: ComputerInfo
Attribute: SerialNumber, type: Integer
Attribute: Make, type: String
Attribute: Model, type: String

AnyCompany has offices in Seattle, Portland, and San Francisco. I have deployed the computer inventory application for each of these three locations. As shown in the lower left part of the following diagram, ComputerSchema is in the Published state with a version of 1. The Published schema is applied to SeattleDirectory, PortlandDirectory, and SanFranciscoDirectory for AnyCompany’s three locations. Implementing separate directories for different geographic locations when you don’t have any queries that cross location boundaries is a good data partitioning strategy and gives your application better response times with lower latency.

Diagram of ComputerSchema in Published state and applied to three directories

Legend for the diagrams in this post

The following code example creates the schema in the Development state by using a JSON file, publishes the schema, and then creates directories for the Seattle, Portland, and San Francisco locations. For this example, I assume the schema has been defined in the JSON file. The createSchema API creates a schema Amazon Resource Name (ARN) with the name defined in the variable, SCHEMA_NAME. I can use the putSchemaFromJson API to add specific schema definitions from the JSON file.

// The utility method to get valid Cloud Directory schema JSON
String validJson = getJsonFile("ComputerSchema_version_1.json")

String SCHEMA_NAME = "ComputerSchema";

String developmentSchemaArn = client.createSchema(new CreateSchemaRequest()
        .withName(SCHEMA_NAME))
        .getSchemaArn();

// Put the schema document in the Development schema
PutSchemaFromJsonResult result = client.putSchemaFromJson(new PutSchemaFromJsonRequest()
        .withSchemaArn(developmentSchemaArn)
        .withDocument(validJson));

The following code example takes the schema that is currently in the Development state and publishes the schema, changing its state to Published.

String SCHEMA_VERSION = "1";
String publishedSchemaArn = client.publishSchema(
        new PublishSchemaRequest()
        .withDevelopmentSchemaArn(developmentSchemaArn)
        .withVersion(SCHEMA_VERSION))
        .getPublishedSchemaArn();

// Our Published schema ARN is as follows
// arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:schema/published/ComputerSchema/1

The following code example creates a directory named SeattleDirectory and applies the published schema. The createDirectory API call creates a directory by using the published schema provided in the API parameters. Note that Cloud Directory stores a version of the schema in the directory in the Applied state. I will use similar code to create directories for PortlandDirectory and SanFranciscoDirectory.

String DIRECTORY_NAME = "SeattleDirectory"; 

CreateDirectoryResult directory = client.createDirectory(
        new CreateDirectoryRequest()
        .withName(DIRECTORY_NAME)
        .withSchemaArn(publishedSchemaArn));

String directoryArn = directory.getDirectoryArn();
String appliedSchemaArn = directory.getAppliedSchemaArn();

// This code section can be reused to create directories for Portland and San Francisco locations with the appropriate directory names

// Our directory ARN is as follows 
// arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX

// Our applied schema ARN is as follows 
// arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX/schema/ComputerSchema/1

Revising a schema

Now let’s say my company, AnyCompany, wants to add more information for computers and to track which employees have been assigned a computer for work use. I modify the schema to add two attributes to the ComputerInfo facet: Description and OSVersion (operating system version). I make Description optional because it is not important for me to track this attribute for the computer objects I create. I make OSVersion mandatory because it is critical for me to track it for all computer objects so that I can make changes such as applying security patches or making upgrades. Because I make OSVersion mandatory, I must provide a default value that Cloud Directory will apply to objects that were created before the schema revision, in order to handle backward compatibility. Note that you can replace the value in any object with a different value.

I also add a new facet to track computer assignment information, shown in the following updated schema as the ComputerAssignment facet. This facet tracks these additional attributes: Name (the name of the person to whom the computer is assigned), EMail (the email address of the assignee), Department, and department CostCenter. Note that Cloud Directory refers to the previously available version identifier as the Major Version. Because I can now add a minor version to a schema, I also denote the changed schema as Minor Version A.

Schema: ComputerSchema
Major Version: 1
Minor Version: A 

Facet: ComputerInfo
Attribute: SerialNumber, type: Integer 
Attribute: Make, type: String
Attribute: Model, type: Integer
Attribute: Description, type: String, required: NOT_REQUIRED
Attribute: OSVersion, type: String, required: REQUIRED_ALWAYS, default: "Windows 7"

Facet: ComputerAssignment
Attribute: Name, type: String
Attribute: EMail, type: String
Attribute: Department, type: String
Attribute: CostCenter, type: Integer

The following diagram shows the changes that were made when I added another facet to the schema and attributes to the existing facet. The highlighted area of the diagram (bottom left) shows that the schema changes were published.

Diagram showing that schema changes were published

The following code example revises the existing Development schema by adding the new attributes to the ComputerInfo facet and by adding the ComputerAssignment facet. I use a new JSON file for the schema revision, and for the purposes of this example, I am assuming the JSON file has the full schema including planned revisions.

// The utility method to get a valid CloudDirectory schema JSON
String schemaJson = getJsonFile("ComputerSchema_version_1_A.json")

// Put the schema document in the Development schema
PutSchemaFromJsonResult result = client.putSchemaFromJson(
        new PutSchemaFromJsonRequest()
        .withSchemaArn(developmentSchemaArn)
        .withDocument(schemaJson));

Upgrading the Published schema

The following code example performs an in-place schema upgrade of the Published schema with schema revisions (it adds new attributes to the existing facet and another facet to the schema). The upgradePublishedSchema API upgrades the Published schema with backward-compatible changes from the Development schema.

// From an earlier code example, I know the publishedSchemaArn has this value: "arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:schema/published/ComputerSchema/1"

// Upgrade publishedSchemaArn to minorVersion A. The Development schema must be backward compatible with 
// the existing publishedSchemaArn. 

String minorVersion = "A"

UpgradePublishedSchemaResult upgradePublishedSchemaResult = client.upgradePublishedSchema(new UpgradePublishedSchemaRequest()
        .withDevelopmentSchemaArn(developmentSchemaArn)
        .withPublishedSchemaArn(publishedSchemaArn)
        .withMinorVersion(minorVersion));

String upgradedPublishedSchemaArn = upgradePublishedSchemaResult.getUpgradedSchemaArn();

// The Published schema ARN after the upgrade shows a minor version as follows 
// arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:schema/published/ComputerSchema/1/A

Upgrading the Applied schema

The following diagram shows the in-place schema upgrade for the SeattleDirectory directory. I am performing the schema upgrade so that I can reflect the new schemas in all three directories. As a reminder, I added new attributes to the ComputerInfo facet and also added the ComputerAssignment facet. After the schema and directory upgrade, I can create objects for the ComputerInfo and ComputerAssignment facets in the SeattleDirectory. Any objects that were created with the old facet definition for ComputerInfo will now use the default values for any additional attributes defined in the new schema.

Diagram of the in-place schema upgrade for the SeattleDirectory directory

I use the following code example to perform an in-place upgrade of the SeattleDirectory to a Major Version of 1 and a Minor Version of A. Note that you should change a Major Version identifier in a schema to make backward-incompatible changes such as changing the data type of an existing attribute or dropping a mandatory attribute from your schema. Backward-incompatible changes require directory data migration from a previous version to the new version. You should change a Minor Version identifier in a schema to make backward-compatible upgrades such as adding additional attributes or adding facets, which in turn may contain one or more attributes. The upgradeAppliedSchema API lets me upgrade an existing directory with a different version of a schema.

// This upgrades ComputerSchema version 1 of the Applied schema in SeattleDirectory to Major Version 1 and Minor Version A
// The schema must be backward compatible or the API will fail with IncompatibleSchemaException

UpgradeAppliedSchemaResult upgradeAppliedSchemaResult = client.upgradeAppliedSchema(new UpgradeAppliedSchemaRequest()
        .withDirectoryArn(directoryArn)
        .withPublishedSchemaArn(upgradedPublishedSchemaArn));

String upgradedAppliedSchemaArn = upgradeAppliedSchemaResult.getUpgradedSchemaArn();

// The Applied schema ARN after the in-place schema upgrade will appear as follows
// arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX/schema/ComputerSchema/1

// This code section can be reused to upgrade directories for the Portland and San Francisco locations with the appropriate directory ARN

Note: Cloud Directory has excluded returning the Minor Version identifier in the Applied schema ARN for backward compatibility and to enable the application to work across older and newer versions of the directory.

The following diagram shows the changes that are made when I perform an in-place schema upgrade in the two remaining directories, PortlandDirectory and SanFranciscoDirectory. I make these calls sequentially, upgrading PortlandDirectory first and then upgrading SanFranciscoDirectory. I use the same code example that I used earlier to upgrade SeattleDirectory. Now, all my directories are running the most current version of the schema. Also, I made these schema changes without having to migrate data and while maintaining my application’s high availability.

Diagram showing the changes that are made with an in-place schema upgrade in the two remaining directories

Schema revision history

I can now view the schema revision history for any of AnyCompany’s directories by using the listAppliedSchemaArns API. Cloud Directory maintains the five most recent versions of applied schema changes. Similarly, to inspect the current Minor Version that was applied to my schema, I use the getAppliedSchemaVersion API. The listAppliedSchemaArns API returns the schema ARNs based on my schema filter as defined in withSchemaArn.

I use the following code example to query an Applied schema for its version history.

// This returns the five most recent Minor Versions associated with a Major Version
ListAppliedSchemaArnsResult listAppliedSchemaArnsResult = client.listAppliedSchemaArns(new ListAppliedSchemaArnsRequest()
        .withDirectoryArn(directoryArn)
        .withSchemaArn(upgradedAppliedSchemaArn));

// Note: The listAppliedSchemaArns API without the SchemaArn filter returns all the Major Versions in a directory

The listAppliedSchemaArns API returns the two ARNs as shown in the following output.

arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX/schema/ComputerSchema/1
arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX/schema/ComputerSchema/1/A

The following code example queries an Applied schema for current Minor Version by using the getAppliedSchemaVersion API.

// This returns the current Applied schema's Minor Version ARN 

GetAppliedSchemaVersion getAppliedSchemaVersionResult = client.getAppliedSchemaVersion(new GetAppliedSchemaVersionRequest()
	.withSchemaArn(upgradedAppliedSchemaArn));

The getAppliedSchemaVersion API returns the current Applied schema ARN with a Minor Version, as shown in the following output.

arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX/schema/ComputerSchema/1/A

If you have a lot of directories, schema revision API calls can help you audit your directory fleet and ensure that all directories are running the same version of a schema. Such auditing can help you ensure high integrity of directories across your fleet.

Summary

You can use in-place schema upgrades to make changes to your directory schema as you evolve your data set to match the needs of your application. An in-place schema upgrade allows you to maintain high availability for your directory and applications while the upgrade takes place. For more information about in-place schema upgrades, see the in-place schema upgrade documentation.

If you have comments about this blog post, submit them in the “Comments” section below. If you have questions about implementing the solution in this post, start a new thread in the Directory Service forum or contact AWS Support.

– Mahendra

 

SecureLogin For Java Web Applications

Post Syndicated from Bozho original https://techblog.bozho.net/securelogin-java-web-applications/

No, there is not a missing whitespace in the title. It’s not about any secure login, it’s about the SecureLogin protocol developed by Egor Homakov, a security consultant, who became famous for committing to master in the Rails project without having permissions.

The SecureLogin protocol is very interesting, as it does not rely on any central party (e.g. OAuth providers like Facebook and Twitter), thus avoiding all the pitfalls of OAuth (which Homakov has often criticized). It is not a password manager either. It is just a client-side software that performs a bit of crypto in order to prove to the server that it is indeed the right user. For that to work, two parts are key:

  • Using a master password to generate a private key. It uses a key-derivation function, which guarantees that the produced private key has sufficient entropy. That way, using the same master password and the same email, you will get the same private key everytime you use the password, and therefore the same public key. And you are the only one who can prove this public key is yours, by signing a message with your private key.
  • Service providers (websites) identify you by your public key by storing it in the database when you register and then looking it up on each subsequent login

The client-side part is performed ideally by a native client – a browser plugin (one is available for Chrome) or a OS-specific application (including mobile ones). That may sound tedious, but it’s actually quick and easy and a one-time event (and is easier than password managers).

I have to admit – I like it, because I’ve been having a similar idea for a while. In my “biometric identification” presentation (where I discuss the pitfalls of using biometrics-only identification schemes), I proposed (slide 23) an identification scheme that uses biometrics (e.g. scanned with your phone) + a password to produce a private key (using a key-derivation function). And the biometric can easily be added to SecureLogin in the future.

It’s not all roses, of course, as one issue isn’t fully resolved yet – revocation. In case someone steals your master password (or you suspect it might be stolen), you may want to change it and notify all service providers of that change so that they can replace your old public key with a new one. That has two implications – first, you may not have a full list of sites that you registered on, and since you may have changed devices, or used multiple devices, there may be websites that never get to know about your password change. There are proposed solutions (points 3 and 4), but they are not intrinsic to the protocol and rely on centralized services. The second issue is – what if the attacker changes your password first? To prevent that, service providers should probably rely on email verification, which is neither part of the protocol, nor is encouraged by it. But you may have to do it anyway, as a safeguard.

Homakov has not only defined a protocol, but also provided implementations of the native clients, so that anyone can start using it. So I decided to add it to a project I’m currently working on (the login page is here). For that I needed a java implementation of the server verification, and since no such implementation existed (only ruby and node.js are provided for now), I implemented it myself. So if you are going to use SecureLogin with a Java web application, you can use that instead of rolling out your own. While implementing it, I hit a few minor issues that may lead to protocol changes, so I guess backward compatibility should also be somehow included in the protocol (through versioning).

So, how does the code look like? On the client side you have a button and a little javascript:

<!-- get the latest sdk.js from the GitHub repo of securelogin
   or include it from https://securelogin.pw/sdk.js -->
<script src="js/securelogin/sdk.js"></script>
....
<p class="slbutton" id="securelogin">&#9889; SecureLogin</p>
$("#securelogin").click(function() {
  SecureLogin(function(sltoken){
	// TODO: consider adding csrf protection as in the demo applications
        // Note - pass as request body, not as param, as the token relies 
        // on url-encoding which some frameworks mess with
	$.post('/app/user/securelogin', sltoken, function(result) {
            if(result == 'ok') {
		 window.location = "/app/";
            } else {
                 $.notify("Login failed, try again later", "error");
            }
	});
  });
  return false;
});

A single button can be used for both login and signup, or you can have a separate signup form, if it has to include additional details rather than just an email. Since I added SecureLogin in addition to my password-based login, I kept the two forms.

On the server, you simply do the following:

@RequestMapping(value = "/securelogin/register", method = RequestMethod.POST)
@ResponseBody
public String secureloginRegister(@RequestBody String token, HttpServletResponse response) {
    try {
        SecureLogin login = SecureLogin.verify(request.getSecureLoginToken(), Options.create(websiteRootUrl));
        UserDetails details = userService.getUserDetailsByEmail(login.getEmail());
        if (details == null || !login.getRawPublicKey().equals(details.getSecureLoginPublicKey())) {
            return "failure";
        }
        // sets the proper cookies to the response
        TokenAuthenticationService.addAuthentication(response, login.getEmail(), secure));
        return "ok";
    } catch (SecureLoginVerificationException e) {
        return "failure";
    }
}

This is spring-mvc, but it can be any web framework. You can also incorporate that into a spring-security flow somehow. I’ve never liked spring-security’s complexity, so I did it manually. Also, instead of strings, you can return proper status codes. Note that I’m doing a lookup by email and only then checking the public key (as if it’s a password). You can do the other way around if you have the proper index on the public key column.

I wouldn’t suggest having a SecureLogin-only system, as the project is still in an early stage and users may not be comfortable with it. But certainly adding it as an option is a good idea.

The post SecureLogin For Java Web Applications appeared first on Bozho's tech blog.

Catching Up on Some Recent AWS Launches and Publications

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/catching-up-on-some-recent-aws-launches-and-publications/

As I have noted in the past, the AWS Blog Team is working hard to make sure that you know about as many AWS launches and publications as possible, without totally burying you in content! As part of our balancing act, we will occasionally publish catch-up posts to clear our queues and to bring more information to your attention. Here’s what I have in store for you today:

  • Monitoring for Cross-Region Replication of S3 Objects
  • Tags for Spot Fleet Instances
  • PCI DSS Compliance for 12 More Services
  • HIPAA Eligibility for WorkDocs
  • VPC Resizing
  • AppStream 2.0 Graphics Design Instances
  • AMS Connector App for ServiceNow
  • Regtech in the Cloud
  • New & Revised Quick Starts

Let’s jump right in!

Monitoring for Cross-Region Replication of S3 Objects
I told you about cross-region replication for S3 a couple of years ago. As I showed you at the time, you simply enable versioning for the source bucket and then choose a destination region and bucket. You can check the replication status manually, or you can create an inventory (daily or weekly) of the source and destination buckets.

The Cross-Region Replication Monitor (CRR Monitor for short) solution checks the replication status of objects across regions and gives you metrics and failure notifications in near real-time.

To learn more, read the CRR Monitor Implementation Guide and then use the AWS CloudFormation template to Deploy the CRR Monitor.

Tags for Spot Instances
Spot Instances and Spot Fleets (collections of Spot Instances) give you access to spare compute capacity. We recently gave you the ability to enter tags (key/value pairs) as part of your spot requests and to have those tags applied to the EC2 instances launched to fulfill the request:

To learn more, read Tag Your Spot Fleet EC2 Instances.

PCI DSS Compliance for 12 More Services
As first announced on the AWS Security Blog, we recently added 12 more services to our PCI DSS compliance program, raising the total number of in-scope services to 42. To learn more, check out our Compliance Resources.

HIPAA Eligibility for WorkDocs
In other compliance news, we announced that Amazon WorkDocs has achieved HIPAA eligibility and PCI DSS compliance in all AWS Regions where WorkDocs is available.

VPC Resizing
This feature allows you to extend an existing Virtual Private Cloud (VPC) by adding additional blocks of addresses. This gives you more flexibility and should help you to deal with growth. You can add up to four secondary /16 CIDRs per VPC. You can also edit the secondary CIDRs by deleting them and adding new ones. Simply select the VPC and choose Edit CIDRs from the menu:

Then add or remove CIDR blocks as desired:

To learn more, read about VPCs and Subnets.

AppStream 2.0 Graphics Design Instances
Powered by AMD FirePro S7150x2 Server GPUs and equipped with AMD Multiuser GPU technology, the new Graphics Design instances for Amazon AppStream 2.0 will let you run and stream graphics applications more cost-effectively than ever. The instances are available in four sizes, with 2-16 vCPUs and 7.5 GB to 61 GB of memory.

To learn more, read Introducing Amazon AppStream 2.0 Graphics Design, a New Lower Costs Instance Type for Streaming Graphics Applications.

AMS Connector App for ServiceNow
AWS Managed Services (AMS) provides Infrastructure Operations Management for the Enterprise. Designed to accelerate cloud adoption, it automates common operations such as change requests, patch management, security and backup.

The new AMS integration App for ServiceNow lets you interact with AMS from within ServiceNow, with no need for any custom development or API integration.

To learn more, read Cloud Management Made Easier: AWS Managed Services Now Integrates with ServiceNow.

Regtech in the Cloud
Regtech (as I learned while writing this), is short for regulatory technology, and is all about using innovative technology such as cloud computing, analytics, and machine learning to address regulatory challenges.

Working together with APN Consulting Partner Cognizant, TABB Group recently published a thought leadership paper that explains why regulations and compliance pose huge challenges for our customers in the financial services, and shows how AWS can help!

New & Revised Quick Starts
Our Quick Starts team has been cranking out new solutions and making significant updates to the existing ones. Here’s a roster:

Alfresco Content Services (v2) Atlassian Confluence Confluent Platform Data Lake
Datastax Enterprise GitHub Enterprise Hashicorp Nomad HIPAA
Hybrid Data Lake with Wandisco Fusion IBM MQ IBM Spectrum Scale Informatica EIC
Magento (v2) Linux Bastion (v2) Modern Data Warehouse with Tableau MongoDB (v2)
NetApp ONTAP NGINX (v2) RD Gateway Red Hat Openshift
SAS Grid SIOS Datakeeper StorReduce SQL Server (v2)

And that’s all I have for today!

Jeff;

Launch – AWS Glue Now Generally Available

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/launch-aws-glue-now-generally-available/

Today we’re excited to announce the general availability of AWS Glue. Glue is a fully managed, serverless, and cloud-optimized extract, transform and load (ETL) service. Glue is different from other ETL services and platforms in a few very important ways.

First, Glue is “serverless” – you don’t need to provision or manage any resources and you only pay for resources when Glue is actively running. Second, Glue provides crawlers that can automatically detect and infer schemas from many data sources, data types, and across various types of partitions. It stores these generated schemas in a centralized Data Catalog for editing, versioning, querying, and analysis. Third, Glue can automatically generate ETL scripts (in Python!) to translate your data from your source formats to your target formats. Finally, Glue allows you to create development endpoints that allow your developers to use their favorite toolchains to construct their ETL scripts. Ok, let’s dive deep with an example.

In my job as a Developer Evangelist I spend a lot of time traveling and I thought it would be cool to play with some flight data. The Bureau of Transportations Statistics is kind enough to share all of this data for anyone to use here. We can easily download this data and put it in an Amazon Simple Storage Service (S3) bucket. This data will be the basis of our work today.

Crawlers

First, we need to create a Crawler for our flights data from S3. We’ll select Crawlers in the Glue console and follow the on screen prompts from there. I’ll specify s3://crawler-public-us-east-1/flight/2016/csv/ as my first datasource (we can add more later if needed). Next, we’ll create a database called flights and give our tables a prefix of flights as well.

The Crawler will go over our dataset, detect partitions through various folders – in this case months of the year, detect the schema, and build a table. We could add additonal data sources and jobs into our crawler or create separate crawlers that push data into the same database but for now let’s look at the autogenerated schema.

I’m going to make a quick schema change to year, moving it from BIGINT to INT. Then I can compare the two versions of the schema if needed.

Now that we know how to correctly parse this data let’s go ahead and do some transforms.

ETL Jobs

Now we’ll navigate to the Jobs subconsole and click Add Job. Will follow the prompts from there giving our job a name, selecting a datasource, and an S3 location for temporary files. Next we add our target by specifying “Create tables in your data target” and we’ll specify an S3 location in Parquet format as our target.

After clicking next, we’re at screen showing our various mappings proposed by Glue. Now we can make manual column adjustments as needed – in this case we’re just going to use the X button to remove a few columns that we don’t need.

This brings us to my favorite part. This is what I absolutely love about Glue.

Glue generated a PySpark script to transform our data based on the information we’ve given it so far. On the left hand side we can see a diagram documenting the flow of the ETL job. On the top right we see a series of buttons that we can use to add annotated data sources and targets, transforms, spigots, and other features. This is the interface I get if I click on transform.

If we add any of these transforms or additional data sources, Glue will update the diagram on the left giving us a useful visualization of the flow of our data. We can also just write our own code into the console and have it run. We can add triggers to this job that fire on completion of another job, a schedule, or on demand. That way if we add more flight data we can reload this same data back into S3 in the format we need.

I could spend all day writing about the power and versatility of the jobs console but Glue still has more features I want to cover. So, while I might love the script editing console, I know many people prefer their own development environments, tools, and IDEs. Let’s figure out how we can use those with Glue.

Development Endpoints and Notebooks

A Development Endpoint is an environment used to develop and test our Glue scripts. If we navigate to “Dev endpoints” in the Glue console we can click “Add endpoint” in the top right to get started. Next we’ll select a VPC, a security group that references itself and then we wait for it to provision.


Once it’s provisioned we can create an Apache Zeppelin notebook server by going to actions and clicking create notebook server. We give our instance an IAM role and make sure it has permissions to talk to our data sources. Then we can either SSH into the server or connect to the notebook to interactively develop our script.

Pricing and Documentation

You can see detailed pricing information here. Glue crawlers, ETL jobs, and development endpoints are all billed in Data Processing Unit Hours (DPU) (billed by minute). Each DPU-Hour costs $0.44 in us-east-1. A single DPU provides 4vCPU and 16GB of memory.

We’ve only covered about half of the features that Glue has so I want to encourage everyone who made it this far into the post to go read the documentation and service FAQs. Glue also has a rich and powerful API that allows you to do anything console can do and more.

We’re also releasing two new projects today. The aws-glue-libs provide a set of utilities for connecting, and talking with Glue. The aws-glue-samples repo contains a set of example jobs.

I hope you find that using Glue reduces the time it takes to start doing things with your data. Look for another post from me on AWS Glue soon because I can’t stop playing with this new service.
Randall

Automating Blue/Green Deployments of Infrastructure and Application Code using AMIs, AWS Developer Tools, & Amazon EC2 Systems Manager

Post Syndicated from Ramesh Adabala original https://aws.amazon.com/blogs/devops/bluegreen-infrastructure-application-deployment-blog/

Previous DevOps blog posts have covered the following use cases for infrastructure and application deployment automation:

An AMI provides the information required to launch an instance, which is a virtual server in the cloud. You can use one AMI to launch as many instances as you need. It is security best practice to customize and harden your base AMI with required operating system updates and, if you are using AWS native services for continuous security monitoring and operations, you are strongly encouraged to bake into the base AMI agents such as those for Amazon EC2 Systems Manager (SSM), Amazon Inspector, CodeDeploy, and CloudWatch Logs. A customized and hardened AMI is often referred to as a “golden AMI.” The use of golden AMIs to create EC2 instances in your AWS environment allows for fast and stable application deployment and scaling, secure application stack upgrades, and versioning.

In this post, using the DevOps automation capabilities of Systems Manager, AWS developer tools (CodePipeLine, CodeDeploy, CodeCommit, CodeBuild), I will show you how to use AWS CodePipeline to orchestrate the end-to-end blue/green deployments of a golden AMI and application code. Systems Manager Automation is a powerful security feature for enterprises that want to mature their DevSecOps practices.

Here are the high-level phases and primary services covered in this use case:

 

You can access the source code for the sample used in this post here: https://github.com/awslabs/automating-governance-sample/tree/master/Bluegreen-AMI-Application-Deployment-blog.

This sample will create a pipeline in AWS CodePipeline with the building blocks to support the blue/green deployments of infrastructure and application. The sample includes a custom Lambda step in the pipeline to execute Systems Manager Automation to build a golden AMI and update the Auto Scaling group with the golden AMI ID for every rollout of new application code. This guarantees that every new application deployment is on a fully patched and customized AMI in a continuous integration and deployment model. This enables the automation of hardened AMI deployment with every new version of application deployment.

 

 

We will build and run this sample in three parts.

Part 1: Setting up the AWS developer tools and deploying a base web application

Part 1 of the AWS CloudFormation template creates the initial Java-based web application environment in a VPC. It also creates all the required components of Systems Manager Automation, CodeCommit, CodeBuild, and CodeDeploy to support the blue/green deployments of the infrastructure and application resulting from ongoing code releases.

Part 1 of the AWS CloudFormation stack creates these resources:

After Part 1 of the AWS CloudFormation stack creation is complete, go to the Outputs tab and click the Elastic Load Balancing link. You will see the following home page for the base web application:

Make sure you have all the outputs from the Part 1 stack handy. You need to supply them as parameters in Part 3 of the stack.

Part 2: Setting up your CodeCommit repository

In this part, you will commit and push your sample application code into the CodeCommit repository created in Part 1. To access the initial git commands to clone the empty repository to your local machine, click Connect to go to the AWS CodeCommit console. Make sure you have the IAM permissions required to access AWS CodeCommit from command line interface (CLI).

After you’ve cloned the repository locally, download the sample application files from the part2 folder of the Git repository and place the files directly into your local repository. Do not include the aws-codedeploy-sample-tomcat folder. Go to the local directory and type the following commands to commit and push the files to the CodeCommit repository:

git add .
git commit -a -m "add all files from the AWS Java Tomcat CodeDeploy application"
git push

After all the files are pushed successfully, the repository should look like this:

 

Part 3: Setting up CodePipeline to enable blue/green deployments     

Part 3 of the AWS CloudFormation template creates the pipeline in AWS CodePipeline and all the required components.

a) Source: The pipeline is triggered by any change to the CodeCommit repository.

b) BuildGoldenAMI: This Lambda step executes the Systems Manager Automation document to build the golden AMI. After the golden AMI is successfully created, a new launch configuration with the new AMI details will be updated into the Auto Scaling group of the application deployment group. You can watch the progress of the automation in the EC2 console from the Systems Manager –> Automations menu.

c) Build: This step uses the application build spec file to build the application build artifact. Here are the CodeBuild execution steps and their status:

d) Deploy: This step clones the Auto Scaling group, launches the new instances with the new AMI, deploys the application changes, reroutes the traffic from the elastic load balancer to the new instances and terminates the old Auto Scaling group. You can see the execution steps and their status in the CodeDeploy console.

After the CodePipeline execution is complete, you can access the application by clicking the Elastic Load Balancing link. You can find it in the output of Part 1 of the AWS CloudFormation template. Any consecutive commits to the application code in the CodeCommit repository trigger the pipelines and deploy the infrastructure and code with an updated AMI and code.

 

If you have feedback about this post, add it to the Comments section below. If you have questions about implementing the example used in this post, open a thread on the Developer Tools forum.


About the author

 

Ramesh Adabala is a Solutions Architect in Southeast Enterprise Solution Architecture team at Amazon Web Services.

[email protected] – Intelligent Processing of HTTP Requests at the Edge

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/lambdaedge-intelligent-processing-of-http-requests-at-the-edge/

Late last year I announced a preview of [email protected] and talked about how you could use it to intelligently process HTTP requests at locations that are close (latency-wise) to your customers. Developers who applied and gained access to the preview have been making good use of it, and have provided us with plenty of very helpful feedback. During the preview we added the ability to generate HTTP responses and support for CloudWatch Logs, and also updated our roadmap based on the feedback.

Now Generally Available
Today I am happy to announce that [email protected] is now generally available! You can use it to:

  • Inspect cookies and rewrite URLs to perform A/B testing.
  • Send specific objects to your users based on the User-Agent header.
  • Implement access control by looking for specific headers before passing requests to the origin.
  • Add, drop, or modify headers to direct users to different cached objects.
  • Generate new HTTP responses.
  • Cleanly support legacy URLs.
  • Modify or condense headers or URLs to improve cache utilization.
  • Make HTTP requests to other Internet resources and use the results to customize responses.

[email protected] allows you to create web-based user experiences that are rich and personal. As is rapidly becoming the norm in today’s world, you don’t need to provision or manage any servers. You simply upload your code (Lambda functions written in Node.js) and pick one of the CloudFront behaviors that you have created for the distribution, along with the desired CloudFront event:

In this case, my function (the imaginatively named EdgeFunc1) would run in response to origin requests for image/* within the indicated distribution. As you can see, you can run code in response to four different CloudFront events:

Viewer Request – This event is triggered when an event arrives from a viewer (an HTTP client, generally a web browser or a mobile app), and has access to the incoming HTTP request. As you know, each CloudFront edge location maintains a large cache of objects so that it can efficiently respond to repeated requests. This particular event is triggered regardless of whether the requested object is already cached.

Origin Request – This event is triggered when the edge location is about to make a request back to the origin, due to the fact that the requested object is not cached at the edge location. It has access to the request that will be made to the origin (often an S3 bucket or code running on an EC2 instance).

Origin Response – This event is triggered after the origin returns a response to a request. It has access to the response from the origin.

Viewer Response – This is event is triggered before the edge location returns a response to the viewer. It has access to the response.

Functions are globally replicated and requests are automatically routed to the optimal location for execution. You can write your code once and with no overt action on your part, have it be available at low latency to users all over the world.

Your code has full access to requests and responses, including headers, cookies, the HTTP method (GET, HEAD, and so forth), and the URI. Subject to a few restrictions, it can modify existing headers and insert new ones.

[email protected] in Action
Let’s create a simple function that runs in response to the Viewer Request event. I open up the Lambda Console and create a new function. I choose the Node.js 6.10 runtime and search for cloudfront blueprints:

I choose cloudfront-response-generation and configure a trigger to invoke the function:

The Lambda Console provides me with some information about the operating environment for my function:

I enter a name and a description for my function, as usual:

The blueprint includes a fully operational function. It generates a “200” HTTP response and a very simple body:

I used this as the starting point for my own code, which pulls some interesting values from the request and displays them in a table:

'use strict';
exports.handler = (event, context, callback) => {

    /* Set table row style */
    const rs = '"border-bottom:1px solid black;vertical-align:top;"';
    /* Get request */
    const request = event.Records[0].cf.request;
   
    /* Get values from request */ 
    const httpVersion = request.httpVersion;
    const clientIp    = request.clientIp;
    const method      = request.method;
    const uri         = request.uri;
    const headers     = request.headers;
    const host        = headers['host'][0].value;
    const agent       = headers['user-agent'][0].value;
    
    var sreq = JSON.stringify(event.Records[0].cf.request, null, '&nbsp;');
    sreq = sreq.replace(/\n/g, '<br/>');

    /* Generate body for response */
    const body = 
     '<html>\n'
     + '<head><title>Hello From [email protected]</title></head>\n'
     + '<body>\n'
     + '<table style="border:1px solid black;background-color:#e0e0e0;border-collapse:collapse;" cellpadding=4 cellspacing=4>\n'
     + '<tr style=' + rs + '><td>Host</td><td>'        + host     + '</td></tr>\n'
     + '<tr style=' + rs + '><td>Agent</td><td>'       + agent    + '</td></tr>\n'
     + '<tr style=' + rs + '><td>Client IP</td><td>'   + clientIp + '</td></tr>\n'
     + '<tr style=' + rs + '><td>Method</td><td>'      + method   + '</td></tr>\n'
     + '<tr style=' + rs + '><td>URI</td><td>'         + uri      + '</td></tr>\n'
     + '<tr style=' + rs + '><td>Raw Request</td><td>' + sreq     + '</td></tr>\n'
     + '</table>\n'
     + '</body>\n'
     + '</html>'

    /* Generate HTTP response */
    const response = {
        status: '200',
        statusDescription: 'HTTP OK',
        httpVersion: httpVersion,
        body: body,
        headers: {
            'vary':          [{key: 'Vary',          value: '*'}],
            'last-modified': [{key: 'Last-Modified', value:'2017-01-13'}]
        },
    };

    callback(null, response);
};

I configure my handler, and request the creation of a new IAM Role with Basic Edge Lambda permissions:

On the next page I confirm my settings (as I would do for a regular Lambda function), and click on Create function:

This creates the function, attaches the trigger to the distribution, and also initiates global replication of the function. The status of my distribution changes to In Progress for the duration of the replication (typically 5 to 8 minutes):

The status changes back to Deployed as soon as the replication completes:

Then I access the root of my distribution (https://dogy9dy9kvj6w.cloudfront.net/), the function runs, and this is what I see:

Feel free to click on the image (it is linked to the root of my distribution) to run my code!

As usual, this is a very simple example and I am sure that you can do a lot better. Here are a few ideas to get you started:

Site Management – You can take an entire dynamic website offline and replace critical pages with [email protected] functions for maintenance or during a disaster recovery operation.

High Volume Content – You can create scoreboards, weather reports, or public safety pages and make them available at the edge, both quickly and cost-effectively.

Create something cool and share it in the comments or in a blog post, and I’ll take a look.

Things to Know
Here are a couple of things to keep in mind as you start to think about how to put [email protected] to use in your application:

Timeouts – Functions that handle Origin Request and Origin Response events must complete within 3 seconds. Functions that handle Viewer Request and Viewer Response events must complete within 1 second.

Versioning – After you update your code in the Lambda Console, you must publish a new version and set up a fresh set of triggers for it, and then wait for the replication to complete. You must always refer to your code using a version number; $LATEST and aliases do not apply.

Headers – As you can see from my code, the HTTP request headers are accessible as an array. The headers fall in to four categories:

  • Accessible – Can be read, written, deleted, or modified.
  • Restricted – Must be passed on to the origin.
  • Read-only – Can be read, but not modified in any way.
  • Blacklisted – Not seen by code, and cannot be added.

Runtime Environment – The runtime environment provides each function with 128 MB of memory, but no builtin libraries or access to /tmp.

Web Service Access – Functions that handle Origin Request and Origin Response events must complete within 3 seconds can access the AWS APIs and fetch content via HTTP. These requests are always made synchronously with request to the original request or response.

Function Replication – As I mentioned earlier, your functions will be globally replicated. The replicas are visible in the “other” regions from the Lambda Console:

CloudFront – Everything that you already know about CloudFront and CloudFront behaviors is relevant to [email protected]. You can use multiple behaviors (each with up to four [email protected] functions) from each behavior, customize header & cookie forwarding, and so forth. You can also make the association between events and functions (via ARNs that include function versions) while you are editing a behavior:

Available Now
[email protected] is available now and you can start using it today. Pricing is based on the number of times that your functions are invoked and the amount of time that they run (see the [email protected] Pricing page for more info).

Jeff;

 

Manage Kubernetes Clusters on AWS Using Kops

Post Syndicated from Arun Gupta original https://aws.amazon.com/blogs/compute/kubernetes-clusters-aws-kops/

Any containerized application typically consists of multiple containers. There is a container for the application itself, one for database, possibly another for web server, and so on. During development, its normal to build and test this multi-container application on a single host. This approach works fine during early dev and test cycles but becomes a single point of failure for production where the availability of the application is critical. In such cases, this multi-container application is deployed on multiple hosts. There is a need for an external tool to manage such a multi-container multi-host deployment. Container orchestration frameworks provides the capability of cluster management, scheduling containers on different hosts, service discovery and load balancing, crash recovery and other related functionalities. There are multiple options for container orchestration on Amazon Web Services: Amazon ECS, Docker for AWS, and DC/OS.

Another popular option for container orchestration on AWS is Kubernetes. There are multiple ways to run a Kubernetes cluster on AWS. This multi-part blog series provides a brief overview and explains some of these approaches in detail. This first post explains how to create a Kubernetes cluster on AWS using kops.

Kubernetes and Kops overview

Kubernetes is an open source, container orchestration platform. Applications packaged as Docker images can be easily deployed, scaled, and managed in a Kubernetes cluster. Some of the key features of Kubernetes are:

  • Self-healing
    Failed containers are restarted to ensure that the desired state of the application is maintained. If a node in the cluster dies, then the containers are rescheduled on a different node. Containers that do not respond to application-defined health check are terminated, and thus rescheduled.
  • Horizontal scaling
    Number of containers can be easily scaled up and down automatically based upon CPU utilization, or manually using a command.
  • Service discovery and load balancing
    Multiple containers can be grouped together discoverable using a DNS name. The service can be load balanced with integration to the native LB provided by the cloud provider.
  • Application upgrades and rollbacks
    Applications can be upgraded to a newer version without an impact to the existing one. If something goes wrong, Kubernetes rolls back the change.

Kops, short for Kubernetes Operations, is a set of tools for installing, operating, and deleting Kubernetes clusters in the cloud. A rolling upgrade of an older version of Kubernetes to a new version can also be performed. It also manages the cluster add-ons. After the cluster is created, the usual kubectl CLI can be used to manage resources in the cluster.

Download Kops and Kubectl

There is no need to download the Kubernetes binary distribution for creating a cluster using kops. However, you do need to download the kops CLI. It then takes care of downloading the right Kubernetes binary in the cloud, and provisions the cluster.

The different download options for kops are explained at github.com/kubernetes/kops#installing. On MacOS, the easiest way to install kops is using the brew package manager.

brew update && brew install kops

The version of kops can be verified using the kops version command, which shows:

Version 1.6.1

In addition, download kubectl. This is required to manage the Kubernetes cluster. The latest version of kubectl can be downloaded using the following command:

curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/darwin/amd64/kubectl

Make sure to include the directory where kubectl is downloaded in your PATH.

IAM user permission

The IAM user to create the Kubernetes cluster must have the following permissions:

  • AmazonEC2FullAccess
  • AmazonRoute53FullAccess
  • AmazonS3FullAccess
  • IAMFullAccess
  • AmazonVPCFullAccess

Alternatively, a new IAM user may be created and the policies attached as explained at github.com/kubernetes/kops/blob/master/docs/aws.md#setup-iam-user.

Create an Amazon S3 bucket for the Kubernetes state store

Kops needs a “state store” to store configuration information of the cluster.  For example, how many nodes, instance type of each node, and Kubernetes version. The state is stored during the initial cluster creation. Any subsequent changes to the cluster are also persisted to this store as well. As of publication, Amazon S3 is the only supported storage mechanism. Create a S3 bucket and pass that to the kops CLI during cluster creation.

This post uses the bucket name kubernetes-aws-io. Bucket names must be unique; you have to use a different name. Create an S3 bucket:

aws s3api create-bucket --bucket kubernetes-aws-io

I strongly recommend versioning this bucket in case you ever need to revert or recover a previous version of the cluster. This can be enabled using the AWS CLI as well:

aws s3api put-bucket-versioning --bucket kubernetes-aws-io --versioning-configuration Status=Enabled

For convenience, you can also define KOPS_STATE_STORE environment variable pointing to the S3 bucket. For example:

export KOPS_STATE_STORE=s3://kubernetes-aws-io

This environment variable is then used by the kops CLI.

DNS configuration

As of Kops 1.6.1, a top-level domain or a subdomain is required to create the cluster. This domain allows the worker nodes to discover the master and the master to discover all the etcd servers. This is also needed for kubectl to be able to talk directly with the master.

This domain may be registered with AWS, in which case a Route 53 hosted zone is created for you. Alternatively, this domain may be at a different registrar. In this case, create a Route 53 hosted zone. Specify the name server (NS) records from the created zone as NS records with the domain registrar.

This post uses a kubernetes-aws.io domain registered at a third-party registrar.

Generate a Route 53 hosted zone using the AWS CLI. Download jq to run this command:

ID=$(uuidgen) && \
aws route53 create-hosted-zone \
--name cluster.kubernetes-aws.io \
--caller-reference $ID \
| jq .DelegationSet.NameServers

This shows an output such as the following:

[
"ns-94.awsdns-11.com",
"ns-1962.awsdns-53.co.uk",
"ns-838.awsdns-40.net",
"ns-1107.awsdns-10.org"
]

Create NS records for the domain with your registrar. Different options on how to configure DNS for the cluster are explained at github.com/kubernetes/kops/blob/master/docs/aws.md#configure-dns.

Experimental support to create a gossip-based cluster was added in Kops 1.6.2. This post uses a DNS-based approach, as that is more mature and well tested.

Create the Kubernetes cluster

The Kops CLI can be used to create a highly available cluster, with multiple master nodes spread across multiple Availability Zones. Workers can be spread across multiple zones as well. Some of the tasks that happen behind the scene during cluster creation are:

  • Provisioning EC2 instances
  • Setting up AWS resources such as networks, Auto Scaling groups, IAM users, and security groups
  • Installing Kubernetes.

Start the Kubernetes cluster using the following command:

kops create cluster \
--name cluster.kubernetes-aws.io \
--zones us-west-2a \
--state s3://kubernetes-aws-io \
--yes

In this command:

  • --zones
    Defines the zones in which the cluster is going to be created. Multiple comma-separated zones can be specified to span the cluster across multiple zones.
  • --name
    Defines the cluster’s name.
  • --state
    Points to the S3 bucket that is the state store.
  • --yes
    Immediately creates the cluster. Otherwise, only the cloud resources are created and the cluster needs to be started explicitly using the command kops update --yes. If the cluster needs to be edited, then the kops edit cluster command can be used.

This starts a single master and two worker node Kubernetes cluster. The master is in an Auto Scaling group and the worker nodes are in a separate group. By default, the master node is m3.medium and the worker node is t2.medium. Master and worker nodes are assigned separate IAM roles as well.

Wait for a few minutes for the cluster to be created. The cluster can be verified using the command kops validate cluster --state=s3://kubernetes-aws-io. It shows the following output:

Using cluster from kubectl context: cluster.kubernetes-aws.io

Validating cluster cluster.kubernetes-aws.io

INSTANCE GROUPS
NAME                 ROLE      MACHINETYPE    MIN    MAX    SUBNETS
master-us-west-2a    Master    m3.medium      1      1      us-west-2a
nodes                Node      t2.medium      2      2      us-west-2a

NODE STATUS
NAME                                           ROLE      READY
ip-172-20-38-133.us-west-2.compute.internal    node      True
ip-172-20-38-177.us-west-2.compute.internal    master    True
ip-172-20-46-33.us-west-2.compute.internal     node      True

Your cluster cluster.kubernetes-aws.io is ready

It shows the different instances started for the cluster, and their roles. If multiple cluster states are stored in the same bucket, then --name <NAME> can be used to specify the exact cluster name.

Check all nodes in the cluster using the command kubectl get nodes:

NAME                                          STATUS         AGE       VERSION
ip-172-20-38-133.us-west-2.compute.internal   Ready,node     14m       v1.6.2
ip-172-20-38-177.us-west-2.compute.internal   Ready,master   15m       v1.6.2
ip-172-20-46-33.us-west-2.compute.internal    Ready,node     14m       v1.6.2

Again, the internal IP address of each node, their current status (master or node), and uptime are shown. The key information here is the Kubernetes version for each node in the cluster, 1.6.2 in this case.

The kubectl value included in the PATH earlier is configured to manage this cluster. Resources such as pods, replica sets, and services can now be created in the usual way.

Some of the common options that can be used to override the default cluster creation are:

  • --kubernetes-version
    The version of Kubernetes cluster. The exact versions supported are defined at github.com/kubernetes/kops/blob/master/channels/stable.
  • --master-size and --node-size
    Define the instance of master and worker nodes.
  • --master-count and --node-count
    Define the number of master and worker nodes. By default, a master is created in each zone specified by --master-zones. Multiple master nodes can be created by a higher number using --master-count or specifying multiple Availability Zones in --master-zones.

A three-master and five-worker node cluster, with master nodes spread across different Availability Zones, can be created using the following command:

kops create cluster \
--name cluster2.kubernetes-aws.io \
--zones us-west-2a,us-west-2b,us-west-2c \
--node-count 5 \
--state s3://kubernetes-aws-io \
--yes

Both the clusters are sharing the same state store but have different names. This also requires you to create an additional Amazon Route 53 hosted zone for the name.

By default, the resources required for the cluster are directly created in the cloud. The --target option can be used to generate the AWS CloudFormation scripts instead. These scripts can then be used by the AWS CLI to create resources at your convenience.

Get a complete list of options for cluster creation with kops create cluster --help.

More details about the cluster can be seen using the command kubectl cluster-info:

Kubernetes master is running at https://api.cluster.kubernetes-aws.io
KubeDNS is running at https://api.cluster.kubernetes-aws.io/api/v1/proxy/namespaces/kube-system/services/kube-dns

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

Check the client and server version using the command kubectl version:

Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.4", GitCommit:"d6f433224538d4f9ca2f7ae19b252e6fcb66a3ae", GitTreeState:"clean", BuildDate:"2017-05-19T18:44:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.2", GitCommit:"477efc3cbe6a7effca06bd1452fa356e2201e1ee", GitTreeState:"clean", BuildDate:"2017-04-19T20:22:08Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}

Both client and server version are 1.6 as shown by the Major and Minor attribute values.

Upgrade the Kubernetes cluster

Kops can be used to create a Kubernetes 1.4.x, 1.5.x, or an older version of the 1.6.x cluster using the --kubernetes-version option. The exact versions supported are defined at github.com/kubernetes/kops/blob/master/channels/stable.

Or, you may have used kops to create a cluster a while ago, and now want to upgrade to the latest recommended version of Kubernetes. Kops supports rolling cluster upgrades where the master and worker nodes are upgraded one by one.

As of kops 1.6.1, upgrading a cluster is a three-step process.

First, check and apply the latest recommended Kubernetes update.

kops upgrade cluster \
--name cluster2.kubernetes-aws.io \
--state s3://kubernetes-aws-io \
--yes

The --yes option immediately applies the changes. Not specifying the --yes option shows only the changes that are applied.

Second, update the state store to match the cluster state. This can be done using the following command:

kops update cluster \
--name cluster2.kubernetes-aws.io \
--state s3://kubernetes-aws-io \
--yes

Lastly, perform a rolling update for all cluster nodes using the kops rolling-update command:

kops rolling-update cluster \
--name cluster2.kubernetes-aws.io \
--state s3://kubernetes-aws-io \
--yes

Previewing the changes before updating the cluster can be done using the same command but without specifying the --yes option. This shows the following output:

NAME                 STATUS        NEEDUPDATE    READY    MIN    MAX    NODES
master-us-west-2a    NeedsUpdate   1             0        1      1      1
nodes                NeedsUpdate   2             0        2      2      2

Using --yes updates all nodes in the cluster, first master and then worker. There is a 5-minute delay between restarting master nodes, and a 2-minute delay between restarting nodes. These values can be altered using --master-interval and --node-interval options, respectively.

Only the worker nodes may be updated by using the --instance-group node option.

Delete the Kubernetes cluster

Typically, the Kubernetes cluster is a long-running cluster to serve your applications. After its purpose is served, you may delete it. It is important to delete the cluster using the kops command. This ensures that all resources created by the cluster are appropriately cleaned up.

The command to delete the Kubernetes cluster is:

kops delete cluster --state=s3://kubernetes-aws-io --yes

If multiple clusters have been created, then specify the cluster name as in the following command:

kops delete cluster cluster2.kubernetes-aws.io --state=s3://kubernetes-aws-io --yes

Conclusion

This post explained how to manage a Kubernetes cluster on AWS using kops. Kubernetes on AWS users provides a self-published list of companies using Kubernetes on AWS.

Try starting a cluster, create a few Kubernetes resources, and then tear it down. Kops on AWS provides a more comprehensive tutorial for setting up Kubernetes clusters. Kops docs are also helpful for understanding the details.

In addition, the Kops team hosts office hours to help you get started, from guiding you with your first pull request. You can always join the #kops channel on Kubernetes slack to ask questions. If nothing works, then file an issue at github.com/kubernetes/kops/issues.

Future posts in this series will explain other ways of creating and running a Kubernetes cluster on AWS.

— Arun

Synchronizing Amazon S3 Buckets Using AWS Step Functions

Post Syndicated from Andy Katz original https://aws.amazon.com/blogs/compute/synchronizing-amazon-s3-buckets-using-aws-step-functions/

Constantin Gonzalez is a Principal Solutions Architect at AWS

In my free time, I run a small blog that uses Amazon S3 to host static content and Amazon CloudFront to distribute it world-wide. I use a home-grown, static website generator to create and upload my blog content onto S3.

My blog uses two S3 buckets: one for staging and testing, and one for production. As a website owner, I want to update the production bucket with all changes from the staging bucket in a reliable and efficient way, without having to create and populate a new bucket from scratch. Therefore, to synchronize files between these two buckets, I use AWS Lambda and AWS Step Functions.

In this post, I show how you can use Step Functions to build a scalable synchronization engine for S3 buckets and learn some common patterns for designing Step Functions state machines while you do so.

Step Functions overview

Step Functions makes it easy to coordinate the components of distributed applications and microservices using visual workflows. Building applications from individual components that each perform a discrete function lets you scale and change applications quickly.

While this particular example focuses on synchronizing objects between two S3 buckets, it can be generalized to any other use case that involves coordinated processing of any number of objects in S3 buckets, or other, similar data processing patterns.

Bucket replication options

Before I dive into the details on how this particular example works, take a look at some alternatives for copying or replicating data between two Amazon S3 buckets:

  • The AWS CLI provides customers with a powerful aws s3 sync command that can synchronize the contents of one bucket with another.
  • S3DistCP is a powerful tool for users of Amazon EMR that can efficiently load, save, or copy large amounts of data between S3 buckets and HDFS.
  • The S3 cross-region replication functionality enables automatic, asynchronous copying of objects across buckets in different AWS regions.

In this use case, you are looking for a slightly different bucket synchronization solution that:

  • Works within the same region
  • Is more scalable than a CLI approach running on a single machine
  • Doesn’t require managing any servers
  • Uses a more finely grained cost model than the hourly based Amazon EMR approach

You need a scalable, serverless, and customizable bucket synchronization utility.

Solution architecture

Your solution needs to do three things:

  1. Copy all objects from a source bucket into a destination bucket, but leave out objects that are already present, for efficiency.
  2. Delete all "orphaned" objects from the destination bucket that aren’t present on the source bucket, because you don’t want obsolete objects lying around.
  3. Keep track of all objects for #1 and #2, regardless of how many objects there are.

In the beginning, you read in the source and destination buckets as parameters and perform basic parameter validation. Then, you operate two separate, independent loops, one for copying missing objects and one for deleting obsolete objects. Each loop is a sequence of Step Functions states that read in chunks of S3 object lists and use the continuation token to decide in a choice state whether to continue the loop or not.

This solution is based on the following architecture that uses Step Functions, Lambda, and two S3 buckets:

As you can see, this setup involves no servers, just two main building blocks:

  • Step Functions manages the overall flow of synchronizing the objects from the source bucket with the destination bucket.
  • A set of Lambda functions carry out the individual steps necessary to perform the work, such as validating input, getting lists of objects from source and destination buckets, copying or deleting objects in batches, and so on.

To understand the synchronization flow in more detail, look at the Step Functions state machine diagram for this example.

Walkthrough

Here’s a detailed discussion of how this works.

To follow along, use the code in the sync-buckets-state-machine GitHub repo. The code comes with a ready-to-run deployment script in Python that takes care of all the IAM roles, policies, Lambda functions, and of course the Step Functions state machine deployment using AWS CloudFormation, as well as instructions on how to use it.

Fine print: Use at your own risk

Before I start, here are some disclaimers:

  • Educational purposes only.

    The following example and code are intended for educational purposes only. Make sure that you customize, test, and review it on your own before using any of this in production.

  • S3 object deletion.

    In particular, using the code included below may delete objects on S3 in order to perform synchronization. Make sure that you have backups of your data. In particular, consider using the Amazon S3 Versioning feature to protect yourself against unintended data modification or deletion.

Step Functions execution starts with an initial set of parameters that contain the source and destination bucket names in JSON:

{
    "source":       "my-source-bucket-name",
    "destination":  "my-destination-bucket-name"
}

Armed with this data, Step Functions execution proceeds as follows.

Step 1: Detect the bucket region

First, you need to know the regions where your buckets reside. In this case, take advantage of the Step Functions Parallel state. This allows you to use a Lambda function get_bucket_location.py inside two different, parallel branches of task states:

  • FindRegionForSourceBucket
  • FindRegionForDestinationBucket

Each task state receives one bucket name as an input parameter, then detects the region corresponding to "their" bucket. The output of these functions is collected in a result array containing one element per parallel function.

Step 2: Combine the parallel states

The output of a parallel state is a list with all the individual branches’ outputs. To combine them into a single structure, use a Lambda function called combine_dicts.py in its own CombineRegionOutputs task state. The function combines the two outputs from step 1 into a single JSON dict that provides you with the necessary region information for each bucket.

Step 3: Validate the input

In this walkthrough, you only support buckets that reside in the same region, so you need to decide if the input is valid or if the user has given you two buckets in different regions. To find out, use a Lambda function called validate_input.py in the ValidateInput task state that tests if the two regions from the previous step are equal. The output is a Boolean.

Step 4: Branch the workflow

Use another type of Step Functions state, a Choice state, which branches into a Failure state if the comparison in step 3 yields false, or proceeds with the remaining steps if the comparison was successful.

Step 5: Execute in parallel

The actual work is happening in another Parallel state. Both branches of this state are very similar to each other and they re-use some of the Lambda function code.

Each parallel branch implements a looping pattern across the following steps:

  1. Use a Pass state to inject either the string value "source" (InjectSourceBucket) or "destination" (InjectDestinationBucket) into the listBucket attribute of the state document.

    The next step uses either the source or the destination bucket, depending on the branch, while executing the same, generic Lambda function. You don’t need two Lambda functions that differ only slightly. This step illustrates how to use Pass states as a way of injecting constant parameters into your state machine and as a way of controlling step behavior while re-using common step execution code.

  2. The next step UpdateSourceKeyList/UpdateDestinationKeyList lists objects in the given bucket.

    Remember that the previous step injected either "source" or "destination" into the state document’s listBucket attribute. This step uses the same list_bucket.py Lambda function to list objects in an S3 bucket. The listBucket attribute of its input decides which bucket to list. In the left branch of the main parallel state, use the list of source objects to work through copying missing objects. The right branch uses the list of destination objects, to check if they have a corresponding object in the source bucket and eliminate any orphaned objects. Orphans don’t have a source object of the same S3 key.

  3. This step performs the actual work. In the left branch, the CopySourceKeys step uses the copy_keys.py Lambda function to go through the list of source objects provided by the previous step, then copies any missing object into the destination bucket. Its sister step in the other branch, DeleteOrphanedKeys, uses its destination bucket key list to test whether each object from the destination bucket has a corresponding source object, then deletes any orphaned objects.

  4. The S3 ListObjects API action is designed to be scalable across many objects in a bucket. Therefore, it returns object lists in chunks of configurable size, along with a continuation token. If the API result has a continuation token, it means that there are more objects in this list. You can work from token to token to continue getting object list chunks, until you get no more continuation tokens.

By breaking down large amounts of work into chunks, you can make sure each chunk is completed within the timeframe allocated for the Lambda function, and within the maximum input/output data size for a Step Functions state.

This approach comes with a slight tradeoff: the more objects you process at one time in a given chunk, the faster you are done. There’s less overhead for managing individual chunks. On the other hand, if you process too many objects within the same chunk, you risk going over time and space limits of the processing Lambda function or the Step Functions state so the work cannot be completed.

In this particular case, use a Lambda function that maximizes the number of objects listed from the S3 bucket that can be stored in the input/output state data. This is currently up to 32,768 bytes, assuming (based on some experimentation) that the execution of the COPY/DELETE requests in the processing states can always complete in time.

A more sophisticated approach would use the Step Functions retry/catch state attributes to account for any time limits encountered and adjust the list size accordingly through some list site adjusting.

Step 6: Test for completion

Because the presence of a continuation token in the S3 ListObjects output signals that you are not done processing all objects yet, use a Choice state to test for its presence. If a continuation token exists, it branches into the UpdateSourceKeyList step, which uses the token to get to the next chunk of objects. If there is no token, you’re done. The state machine then branches into the FinishCopyBranch/FinishDeleteBranch state.

By using Choice states like this, you can create loops exactly like the old times, when you didn’t have for statements and used branches in assembly code instead!

Step 7: Success!

Finally, you’re done, and can step into your final Success state.

Lessons learned

When implementing this use case with Step Functions and Lambda, I learned the following things:

  • Sometimes, it is necessary to manipulate the JSON state of a Step Functions state machine with just a few lines of code that hardly seem to warrant their own Lambda function. This is ok, and the cost is actually pretty low given Lambda’s 100 millisecond billing granularity. The upside is that functions like these can be helpful to make the data more palatable for the following steps or for facilitating Choice states. An example here would be the combine_dicts.py function.
  • Pass states can be useful beyond debugging and tracing, they can be used to inject arbitrary values into your state JSON and guide generic Lambda functions into doing specific things.
  • Choice states are your friend because you can build while-loops with them. This allows you to reliably grind through large amounts of data with the patience of an engine that currently supports execution times of up to 1 year.

    Currently, there is an execution history limit of 25,000 events. Each Lambda task state execution takes up 5 events, while each choice state takes 2 events for a total of 7 events per loop. This means you can loop about 3500 times with this state machine. For even more scalability, you can split up work across multiple Step Functions executions through object key sharding or similar approaches.

  • It’s not necessary to spend a lot of time coding exception handling within your Lambda functions. You can delegate all exception handling to Step Functions and instead simplify your functions as much as possible.

  • Step Functions are great replacements for shell scripts. This could have been a shell script, but then I would have had to worry about where to execute it reliably, how to scale it if it went beyond a few thousand objects, etc. Think of Step Functions and Lambda as tools for scripting at a cloud level, beyond the boundaries of servers or containers. "Serverless" here also means "boundary-less".

Summary

This approach gives you scalability by breaking down any number of S3 objects into chunks, then using Step Functions to control logic to work through these objects in a scalable, serverless, and fully managed way.

To take a look at the code or tweak it for your own needs, use the code in the sync-buckets-state-machine GitHub repo.

To see more examples, please visit the Step Functions Getting Started page.

Enjoy!

Use AWS CloudFormation to Automate the Creation of an S3 Bucket with Cross-Region Replication Enabled

Post Syndicated from Rajakumar Sampathkumar original https://aws.amazon.com/blogs/devops/use-aws-cloudformation-to-automate-the-creation-of-an-s3-bucket-with-cross-region-replication-enabled/

At the request of many of our customers, in this blog post, we will discuss how to use AWS CloudFormation to create an S3 bucket with cross-region replication enabled. We’ve included a CloudFormation template with this post that uses an AWS Lambda-backed custom resource to create source and destination buckets.

What is S3 cross-region replication?

Cross-region replication is a bucket-level feature that enables automatic, asynchronous copying of objects across buckets in different AWS regions. You can create two buckets in two different regions and use the ReplicationConfiguration property to replicate the objects from one bucket to the other. For example, you can have a bucket in us-east-1 and replicate the bucket objects to a bucket in us-west-2.

For more information, see What Is and Is Not Replicated in Cross-Region Replication.

Challenge

When you enable cross-region replication, the replicated objects will be stored in only one destination (an S3 bucket). The destination bucket must already exist and it must be in an AWS region different from your source bucket.

Using CloudFormation, you cannot create the destination bucket in a region different from the region in which you are creating your stack. To create the destination bucket, you can:

Solution overview

The CloudFormation template provided with this post uses an AWS Lambda-backed custom resource to create an S3 destination bucket in one region and a source S3 bucket in the same region as the CloudFormation endpoint.

Note: In this scenario, CloudFormation is not aware of the destination bucket created by AWS Lambda. For this reason, CloudFormation will not delete this resource when the stack is deleted.

How does it work?

Launch the stack and provide the following custom values to the CloudFormation template. These (user input) values will be passed as parameters to the template.

  • OriginalBucketName
  • ReplicationBucketName
  • ReplicationRegion (different from the source region from which you are launching the stack)

After the parameters are received by the template, the CloudFormation stack creates these IAM roles:

  • A Lambda execution role with access to Amazon CloudWatch Logs, Amazon EC2, and Amazon S3
  • An S3 role with AmazonS3FullAccess

The AWS Lambda function is created after the roles are created. Lambda triggers the creation of the S3 destination bucket in the region specified in the CloudFormation template. Versioning is enabled on the bucket.

When the destination bucket is available, CloudFormation initiates the creation of the source bucket with cross-region replication enabled. The destination bucket is the target for cross-region replication.

Note: The creation of the IAM role and Lambda function is automated in the template. You do not need not create them manually.

Automated deployment

The step-by-step instructions in this section show you how you can automate the creation of an S3 bucket with cross-region replication enabled. After you click the button, the bucket will be created in approximately two minutes.

Note: Running this solution may result in charges to your AWS account. These include possible charges for Amazon S3 and AWS Lambda.

1. Sign in to the AWS Management Console and open the AWS CloudFormation console. Choose the Launch Stack button to create the AWS CloudFormation stack (S3CrossRegionReplication).

The template will be loaded from an S3 bucket automatically.

2. On the Specify details page, change the stack name, if required. Provide the following custom values to the CloudFormation template. These (user input) values will be passed as parameters to the template.

  • OriginalBucketName
  • ReplicationBucketName
  • ReplicationRegion

Choose Next.

3. On the Options page, you can specify tags for your AWS CloudFormation template, if you like, and then choose Next.

Permissions are built in the template. You don’t have to choose an IAM role.

Choose Next.

4. On the Review page, review your template details. Select the acknowledgement check box, and then choose Create to create the stack.

You can also download the template and use it as a starting point for your own implementation. The template is launched in the US East (N. Virginia) region by default. To launch the CloudFormation stack in a different AWS region, use the region selector in the console navigation bar after you click Launch stack.

Note: Because this solution uses AWS Lambda, which is currently available in selected regions only, be sure you launch this solution in an AWS region where Lambda is available. For more information, see AWS service offerings by region.

Conclusion

In this blog post, we showed you how to use a single AWS CloudFormation template and AWS Lambda-backed custom resources to create an S3 bucket with cross-region replication enabled.
 
I would like to thank my colleague Arun Tunuri for his contributions in designing the CloudFormation template.

 


About the author

Rajakumar Sampathkumar is a Senior Technical Account Manager for Amazon Web Services. In his spare time, he is a passionate author and likes to spend quality time with his kids and nature.

 
 

[$] The great leap backward

Post Syndicated from corbet original https://lwn.net/Articles/720924/rss

Sayre’s law
states: “In any dispute the intensity of feeling is inversely
proportional to the value of the issues at stake
“. In that context,
it is perhaps easy to understand why the discussion around the version
number for the next major openSUSE Leap release has gone on for hundreds of
sometimes vitriolic messages. While this change is controversial, the
openSUSE board hopes that it
will lead to more rational versioning in the long term — but the world has a
way of interfering with such plans.

The Economics of Hybrid Cloud Storage

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hybrid-cloud-storage-economics/

“Hybrid Cloud” has jumped into the IT vernacular over the last few years. Hybrid Cloud solutions intelligently divide processing and data storage between on-premise and off-premise resources to maximize efficiency. Businesses seamlessly extend their data processing and data storage capabilities to the cloud enabling them to manage unusual or fluctuating demands for services. More recently businesses are utilizing cloud computing and storage resources for their day-to-day operations instead of building out their own infrastructure.

Companies in the media and entertainment industry are candidates for considering the hybrid cloud as on any given day these organizations ingest and process large amounts of data in the form of video and audio files. Effectively processing and storing this data is paramount in managing the cost of a project and keeping it on schedule. Below we’ll examine the data storage aspects of the hybrid cloud when considering such a solution for a media and entertainment organization.

The Classic Storage Environment

In the media and entertainment industry, much of the video and audio collected is either never used or reviewed once and then archived. A rough estimate is that 10% of all the audio and video collected is used in the various drafts produced. That means that 90% of the data is archived: stored on the local storage systems or perhaps saved off to tape. This archived data can not be deleted until the project owner agrees, an event that can take months and sometimes years.

Using local storage to keep this archived data means you have to “overbuy” your on-premise storage to accommodate the maximum amount of data you might ever need to hold. While this allows the data to be easily accessed and restored, you have to purchase or lease substantially more storage than you really need.

As a consequence, many organizations decided to use tape storage for their archived data to reduce the need for on-premise data storage. They soon discovered the hidden costs of tape systems: ongoing tape maintenance, supply costs, and continuing personnel expenses. In addition, to recover an archived video or audio file from tape was often slow, cumbersome, and fraught with error.

Hybrid Cloud Storage

Cantemo’s Media Asset Management Portal can identify and automatically route video and audio data to a storage destination – on-premise, cloud, tape, etc. – as needed. Let’s consider a model where 20% of the data ingested is needed for the duration of a given project. The remaining 80% is evaluated and then determined that it can be archived, although we might need to access a video or audio clip at a later time. What is the best destination for the Cantemo Portal to route video and audio that optimizes both cost and access? Let’s review each of our choices: on-premise disk, tape, and cloud storage.

Data Destinations

To compare the three solutions, we’ve considered the cost of each system over a five year period for: initial purchase cost, ongoing costs and supplies, maintenance costs, personnel cost for operations, and subscription costs.

  • On-Premise Disk Storage – On-premise storage can range from a 1 petabyte NAS (Network Attached Storage) system to a multi-petabyte SAN (Storage Area Network). The cost ranges from $12/terabyte/month to $20/terabyte/month (or more). These figures assume new equipment at “street” prices where available. These systems are used for instant access to the data over a high-speed network connection. The data, or a proxy, can be altered multiple times and versioning is required.
  • Tape Storage – Typically these are LTO (Linear Tape-Open) systems with a minimum of two local tape systems, operational costs, etc. The data is stored, typically in batch mode, and accessed infrequently. The tapes can be stored on-site or off-site. Off-site storage costs more. The cost for LTO tape ranges from $7/terabyte/month to $10/terabyte/month, with much of that being the ongoing operational costs. The design includes one incremental tape per day, 2-week retention, first week on-site, second week off-site, with weekly pickup/drop-off. Also included are weekly, monthly, and yearly full backups, rotated on/off site as needed for tape testing, data recovery, etc.
  • Cloud Storage – The cost of cloud storage has come down over the last few years and currently ranges from $5/terabyte/month to $25/terabyte/month for storage depending on the vendor. Video and audio stored in cloud storage is typically easy to locate and readily available for recovery if needed. In most cases, there are minimal operational costs as, for example, the Cantemo Portal software is designed to locate and recover files that are required, but not present on the on-premise storage system.

Of course, a given organization will have their own costs, but in general they should fall within the ranges noted above.

Comparing Storage Costs

In comparing costs of the different methods noted above, we’ll present three scenarios. For each scenario we’ll use data storage amounts of 100 terabytes, 1 petabyte, and 2 petabytes. Each table is the same format, all we’ve done is change how the data is distributed: on-premise, tape, or cloud. The math can be adapted for any set of numbers you wish to use.

SCENARIO 1 – 100% of data is in on-premise storage

Scenario 1 Data Stored Data Stored Data Stored
Data stored On-Premise: 100% 100 TB 1,000 TB 2,000 TB
On-premise cost range Monthly Cost Monthly Cost Monthly Cost
Low – $12/TB/Month $1,200 $12,000 $24,000
High – $20/TB/Month $2,000 $20,000 $40,000

SCENARIO 2 – 20% of data is in on-premise storage and 80% of data is on LTO Tape

Scenario 2 Data Stored Data Stored Data Stored
Data stored On-Premise: 20% 20 TB 200 TB 400 TB
Data stored Tape: 80% 80 TB 800 TB 1,600 TB
On-premise cost range Monthly Cost Monthly Cost Monthly Cost
Low – $12/TB/Month $240 $2,400 $4,800
High – $20/TB/Month $400 $4,000 $8,000
LTO Tape cost range Monthly Cost Monthly Cost Monthly Cost
Low – $7/TB/Month $560 $5,600 $11,200
High – $10/TB/Month $800 $8,000 $16,000
TOTAL Cost of Scenario 2 Monthly Cost Monthly Cost Monthly Cost
Low $800 $8,000 $16,000
High $1,200 $12,000 $24,000

Using tape to store 80% of the data can reduce the cost 33% over just using on-premise data storage.

SCENARIO 3 – 20% of data is in on-premise storage and 80% of data is in cloud storage

Scenario 3 Data Stored Data Stored Data Stored
Data stored On-Premise: 20% 20 TB 200 TB 400 TB
Data stored in Cloud: 80% 80 TB 800 TB 1,600 TB
On-premise cost range Monthly Cost Monthly Cost Monthly Cost
Low – $12/TB/Month $240 $2,400 $4,800
High – $20/TB/Month $400 $4,000 $8,000
LTO Tape cost range Monthly Cost Monthly Cost Monthly Cost
Low – $5/TB/Month $400 $4,000 $8,000
High – $25/TB/Month $2,000 $20,000 $40,000
TOTAL Cost of Scenario 3 Monthly Cost Monthly Cost Monthly Cost
Low $640 $6,400 $12,800
High $2,400 $24,000 $48,000

Storing 80% of the data in the cloud can lead a 46% savings on the low end, but could actually be more expensive depending on the vendor selected.

Separate the Costs

Often, cloud storage costs are combined with cloud computing costs in the Hybrid Cloud model, thus hiding the true cost of the cloud storage, perhaps, until it’s too late. The savings gained by using cloud computing services a few times a day may be completely offset by the high cost of cloud storage, which you would be using the entire time. Here are some recommendations.

  1. Ask to have your Hybrid Cloud costs broken out into computing and storage costs, it should be clear what you are paying for each service.
  2. Consider moving the cloud data storage cost to a low cost provider such as Backblaze B2 Cloud Storage, which charges only $5/terabyte/month for cloud storage. This is particularly useful for archived data that still needs to be accessible as Backblaze cloud storage is readily available.
  3. If compute, data distribution, and data archiving services are required, the Cantemo Portal allows you to designate different cloud storage vendors depending on the usage. For example, data requiring computing services can be stored with Amazon S3 and data designated for archiving can be stored in Backblaze. This allows you optimize access, while minimizing costs.

Considering Hybrid Data Storage

Today, most companies in the Media and Entertainment industry have large amounts of data. The hybrid cloud has the potential to change how the industry does business by moving to cloud-based platforms that allow for global collaboration around the clock. In these scenarios, the amount of data created and stored will be staggering, even by today’s standards. As a consequence, it will be paramount for you to know the most cost efficient way to store and access your data.

The latest version of Cantemo Portal includes native integration to Backblaze B2 Cloud Storage, making it easy to create custom rules for archiving to the cloud and access archived files when needed.

(Author’s note: I used on-premise throughout this document as it is the common vernacular used in the tech industry. Apologies to those grammatically offended.)

The post The Economics of Hybrid Cloud Storage appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Scripting Languages for AWS Lambda: Running PHP, Ruby, and Go

Post Syndicated from Bryan Liston original https://aws.amazon.com/blogs/compute/scripting-languages-for-aws-lambda-running-php-ruby-and-go/

Dimitrij Zub
Dimitrij Zub, Solutions Architect
Raphael Sack
Raphael Sack, Technical Trainer

In our daily work with partners and customers, we see a lot of different amazing skills, expertise and experience in many fields, and programming languages. From languages that have been around for a while to languages on the cutting edge, many teams have developed a deep understanding of concepts of each language; they want to apply these languages with and within the innovations coming from AWS, such as AWS Lambda.

Lambda provides native support for a wide array of languages, such as Scala, Java, Node.js, Python, and C/C++ Linux native applications. In this post, we outline how you can use Lambda with different scripting languages.

For each language, you perform the following tasks:

  • Prepare: Launch an instance from an AMI and log in via SSH
  • Compile and package the language for Lambda
  • Install: Create the Lambda package and test the code

The preparation and installation steps are similar between languages, but we provide step-by-step guides and examples for compiling and packaging PHP, Go, and Ruby.

Common steps to prepare

You can use the capabilities of Lambda to run arbitrary executables to prepare the binaries to be executed within the Lambda environment.

The following steps are only an overview on how to get PHP, Go, or Ruby up and running on Lambda; however, using this approach, you can add more specific libraries, extend the compilation scope, and leverage JSON to interconnect your Lambda function to Amazon API Gateway and other services.

After your binaries have been compiled and your basic folder structure is set up, you won’t need to redo those steps for new projects or variations of your code. Simply write the code to accept inputs from STDIN and return to STDOUT and the written Node.js wrapper takes care of bridging the runtimes for you.

For the sake of simplicity, we demonstrate the preparation steps for PHP only, but these steps are also applicable for the other environments described later.

In the Amazon EC2 console, choose Launch instance. When you choose an AMI, use one of the AMIs in the Lambda Execution Environment and Available Libraries list, for the same region in which you will run the PHP code and launch an EC2 instance to have a compiler. For more information, see Step 1: Launch an Instance.

Pick t2.large as the EC2 instance type to have two cores and 8 GB of memory for faster PHP compilation times.

languages_1png

Choose Review and Launch to use the defaults for storage and add the instance to a default, SSH only, security group generated by the wizard.

Choose Launch to continue; in the launch dialog, you can select an existing key-pair value for your login or create a new one. In this case, create a new key pair called “php” and download it.

languages_2png

After downloading the keys, navigate to the download folder and run the following command:

chmod 400 php.pem

This is required because of SSH security standards. You can now connect to the instance using the EC2 public DNS. Get the value by selecting the instance in the console and looking it up under Public DNS in the lower right part of the screen.

ssh -i php.pem [email protected][PUBLIC DNS]

You’re done! With this instance up and running, you have the right AMI in the right region to be able to continue with all the other steps.

Getting ready for PHP

After you have logged in to your running AMI, you can start compiling and packaging your environment for Lambda. With PHP, you compile the PHP 7 environment from the source and make it ready to be packaged for the Lambda environment.

Setting up PHP on the instance

The next step is to prepare the instance to compile PHP 7, configure the PHP 7 compiler to output in a defined directory, and finally compile PHP 7 to the Lambda AMI.

Update the package manager by running the following command:

sudo yum update –y

Install the minimum necessary libraries to be able to compile PHP 7:

sudo yum install gcc gcc-c++ libxml2-devel -y 

With the dependencies installed, you need to download the PHP 7 sources available from

PHP Downloads.

For this post, we were running the EC2 instance in Ireland, so we selected http://ie1.php.net/get/php-7.0.7.tar.bz2/from/this/mirror as our mirror. Run the following command to download the sources to the instance and choose your own mirror for the appropriate region.

cd ~

wget http://ie1.php.net/distributions/php-7.0.7.tar.bz2 .

Extract the files using the following command:

tar -jxvf php-7.0.7.tar.bz2

This creates the php-7.0.7 folder in your home directory. Next, create a dedicated folder for the php-7 binaries by running the following commands.

mkdir /home/ec2-user/php-7-bin

./configure --prefix=/home/ec2-user/php-7-bin/

This makes sure the PHP compilation is nicely packaged into the php binaries folder you created in your home directory. Keep in mind, that you only compile the baseline PHP here to reduce the amount of dependencies required for your Lambda function.

You can add more dependencies and more compiler options to your PHP binaries using the options available in ./configure. Run ./configure –h for more information about what can be packaged into your PHP distribution to be used with Lambda, but also keep in mind that this will increase the overall binaries package.

Finally, run the following command to start the compilation:

make install

languages_3png

https://xkcd.com/303/

After the compilation is complete, you can quickly confirm that PHP is functional by running the following command:

cd ~/php-7-bin/bin/

./php –v

PHP 7.0.7 (cli) (built: Jun 16 2016 09:14:04) ( NTS )

Copyright (c) 1997-2016 The PHP Group

Zend Engine v3.0.0, Copyright (c) 1998-2016 Zend Technologies

Time to code

Using your favorite editor, you can create an entry point PHP file, which in this case reads input from a Linux pipe and provide its output to stdout. Take a simple JSON document and count the amounts of top-level attributes for this matter. Name the file HelloLambda.php.

<?php

$data = stream_get_contents(STDIN);

$json = json_decode($data, true);

$result = json_encode(array('result' => count($json)));

echo $result."n";

?>

Creating the Lambda package

With PHP compiled and ready to go, all you need to do now is to create your Lambda package with the Node.js wrapper as an entry point.

First, tar the php-7-bin folder where the binaries reside using the following command:

cd ~

tar -zcvf php-7-bin.tar.gz php-7-bin/

Download it to your local project folder where you can continue development, by logging out and running the following command from your local machine (Linux or OSX), or using tools like WinSCP on Windows:

scp -i php.pem [email protected][EC2_HOST]:~/php-7-bin.tar.gz .

With the package download, you create your Lambda project in a new folder, which you can call php-lambda for this specific example. Unpack all files into this folder, which should result in the following structure:

php-lambda 

+-- php-7-bin

The next step is to create a Node.js wrapper file. The file takes the inputs of the Lambda invocations, invoke the PHP binary with helloLambda.php as a parameter, and provide the inputs via Linux pipe to PHP for processing. Call the file php.js and copy the following content:

process.env['PATH'] = process.env['PATH'] + ':' + process.env['LAMBDA_TASK_ROOT'];

const spawn = require('child_process').spawn;

exports.handler = function(event, context) {

    //var php = spawn('php',['helloLambda.php']); //local debug only
    var php = spawn('php-7-bin/bin/php',['helloLambda.php']);
    var output = "";

    //send the input event json as string via STDIN to php process
    php.stdin.write(JSON.stringify(event));

    //close the php stream to unblock php process
    php.stdin.end();

    //dynamically collect php output
    php.stdout.on('data', function(data) {
          output+=data;
    });

    //react to potential errors
    php.stderr.on('data', function(data) {
            console.log("STDERR: "+data);
    });

    //finalize when php process is done.
    php.on('close', function(code) {
            context.succeed(JSON.parse(output));
    });
}

//local debug only
//exports.handler(JSON.parse("{"hello":"world"}"));

With all the files finalized, the folder structure should look like the following:

php-lambda

+– php-7-bin

— helloLambda.php

— php.js

The final step before the deployment is to zip the package into an archive which can be uploaded to Lambda. Call the package LambdaPHP.zip. Feel free to remove unnecessary files, such as phpdebug, from the php-7-bin/bin folder to reduce the size of the archive.

Go, Lambda, go!

The following steps are an overview of how to compile and execute Go applications on Lambda. As with the PHP section, you are free to enhance and build upon the Lambda function with other AWS services and your application infrastructure. Though this example allows you to use your own Linux machine with a fitting distribution to work locally, it might still be useful to understand the Lambda AMIs for test and automation.

To further enhance your environment, you may want to create an automatic compilation pipeline and even deployment of the Go application to Lambda. Consider using versioning and aliases, as they help in managing new versions and dev/test/production code.

Setting up Go on the instance

The next step is to set up the Go binaries on the instance, so that you can compile the upcoming application.

First, make sure your packages are up to date (always):

sudo yum update -y

Next, visit the official Go site, check for the latest version, and download it to EC2 or to your local machine if using Linux:

cd ~

wget https://storage.googleapis.com/golang/go1.6.2.linux-amd64.tar.gz .

Extract the files using the following command:

tar -xvf go1.6.2.linux-amd64.tar.

This creates a folder named “go” in your home directory.

Time to code

For this example, you create a very simple application that counts the amount of objects in the provided JSON element. Using your favorite editor, create a file named “HelloLambda.go” with the following code directly on the machine to which you have downloaded the Go package, which may be the EC2 instance you started in the beginning or your local environment, in which case you are not stuck with vi.

package main

import (
    "fmt"
    "os"
    "encoding/json"
)

func main() {
    var dat map[string]interface{}
    
    fmt.Printf( "Welcome to Lambda Go, now Go Go Go!n" )
    if len( os.Args ) < 2 {
        fmt.Println( "Missing args" )
        return
    }

    err := json.Unmarshal([]byte(os.Args[1]), &dat)

    if err == nil {
        fmt.Println( len( dat ) )
    } else {
        fmt.Println(err)
    }
}

Before compiling, configure an environment variable to tell the Go compiler where all the files are located:

export GOROOT=~/go/

You are now set to compile a nifty new application!

~/go/bin/go build ./HelloLambda.go

Start your application for the very first time:

./HelloLambda '{ "we" : "love", "using" : "Lambda" }'

You should see output similar to:

Welcome to Lambda Go, now Go Go Go!

2

Creating the Lambda package

You have already set up your machine to compile Go applications, written the code, and compiled it successfully; all that is left is to package it up and deploy it to Lambda.

If you used an EC2 instance, copy the binary from the compilation instance and prepare it for packaging. To copy out the binary, use the following command from your local machine (Linux or OSX), or using tools such as WinSCP on Windows.

scp -i GoLambdaGo.pem [email protected]:~/goLambdaGo .

With the binary ready, create the Lambda project in a new folder, which you can call go-lambda.

The next step is to create a Node.js wrapper file to invoke the Go application; call it go.js. The file takes the inputs of the Lambda invocations and invokes the Go binary.

Here’s the content for another example of a Node.js wrapper:

const exec = require('child_process').exec;
exports.handler = function(event, context) {
    const child = exec('./goLambdaGo ' + ''' + JSON.stringify(event) + ''', (error) => {
        // Resolve with result of process
        context.done(error, 'Process complete!');
    });

    // Log process stdout and stderr
    child.stdout.on('data', console.log);
    child.stderr.on('data', console.error);

}

With all the files finalized and ready, your folder structure should look like the following:

go-lambda

— go.js

— goLambdaGo

The final step before deployment is to zip the package into an archive that can be uploaded to Lambda; call the package LambdaGo.zip.

On a Linux or OSX machine, run the following command:

zip -r go.zip ./goLambdaGo ./go.js

A gem in Lambda

For convenience, you can use the same previously used instance, but this time to compile Ruby for use with Lambda. You can also create a new instance using the same instructions.

Setting up Ruby on the instance

The next step is to set up the Ruby binaries and dependencies on the EC2 instance or local Linux environment, so that you can package the upcoming application.

First, make sure your packages are up to date (always):

sudo yum update -y

For this post, you use Traveling Ruby, a project that helps in creating “portable”, self-contained Ruby packages. You can download the latest version from Traveling Ruby linux-x86:

cd ~

wget http://d6r77u77i8pq3.cloudfront.net/releases/traveling-ruby-20150715-2.2.2-linux-x86_64.tar.gz .

Extract the files to a new folder using the following command:

mkdir LambdaRuby

tar -xvf traveling-ruby-20150715-2.2.2-linux-x86_64.tar.gz -C LambdaRuby

This creates the “LambdaRuby” folder in your home directory.

Time to code

For this demonstration, you create a very simple application that counts the amount of objects in a provided JSON element. Using your favorite editor, create a file named “lambdaRuby.rb” with the following code:

#!./bin/ruby

require 'json'

# You can use this to check your Ruby version from within puts(RUBY_VERSION)

if ARGV.length > 0
    puts JSON.parse( ARGV[0] ).length
else
    puts "0"
end

Now, start your application for the very first time, using the following command:

./lambdaRuby.rb '{ "we" : "love", "using" : "Lambda" }'

You should see the amount of fields in the JSON as output (2).

Creating the Lambda package

You have downloaded the Ruby gem, written the code, and tested it successfully… all that is left is to package it up and deploy it to Lambda. Because Ruby is an interpreter-based language, you create a Node.js wrapper and package it with the Ruby script and all the Ruby files.

The next step is to create a Node.js wrapper file to invoke your Ruby application; call it ruby.js. The file takes the inputs of the Lambda invocations and invoke your Ruby application. Here’s the content for a sample Node.js wrapper:

const exec = require('child_process').exec;

exports.handler = function(event, context) {
    const child = exec('./lambdaRuby.rb ' + ''' + JSON.stringify(event) + ''', (result) => {
        // Resolve with result of process
        context.done(result);
    });

    // Log process stdout and stderr
    child.stdout.on('data', console.log);
    child.stderr.on('data', console.error);
}

With all the files finalized and ready, your folder structure should look like this:

LambdaRuby

+– bin

+– bin.real

+– info

— lambdaRuby.rb

+– lib

— ruby.js

The final step before the deployment is to zip the package into an archive to be uploaded to Lambda. Call the package LambdaRuby.zip.

On a Linux or OSX machine, run the following command:

zip -r ruby.zip ./

Copy your zip file from the instance so you can upload it. To copy out the archive, use the following command from your local machine (Linux or OSX), or using tools such as WinSCP on Windows.

scp -i RubyLambda.pem [email protected]:~/LambdaRuby/LambdaRuby.zip .

Common steps to install

With the package done, you are ready to deploy the PHP, Go, or Ruby runtime into Lambda.

Log in to the AWS Management Console and navigate to Lambda; make sure that the region matches the one which you selected the AMI for in the preparation step.

For simplicity, I’ve used PHP as an example for the deployment; however, the steps below are the same for Go and Ruby.

Creating the Lambda function

Choose Create a Lambda function, Skip. Select the following fields and upload your previously created archive.

languages_4png

The most important areas are:

  • Name: The name to give your Lambda function
  • Runtime: Node.js
  • Lambda function code: Select the zip file created in the PHP, Go, or Ruby section, such as php.zip, go.zip, or ruby.zip
  • Handler: php.handler (as in the code, the entry function is called handler and the file is php.js. If you have used the file names from the Go and Ruby sections use the following format: [js file name without .js].handler, i.e., go.handler)
  • Role: Choose Basic Role if you have not yet created one, and create a role for your Lambda function execution

Choose Next, Create function to continue to testing.

Testing the Lambda function

To test the Lambda function, choose Test in the upper right corner, which displays a sample event with three top-level attributes.

languages_5png

Feel free to add more, or simply choose Save and test to see that your function has executed properly.

languages_6png

Conclusion

In this post, we outlined three different ways to create scripting language runtimes for Lambda, from compiling against the Lambda runtime for PHP and being able to run scripts, compiling the actuals as in Go, or using packaged binaries as much as possible with Ruby. We hope you enjoyed the ideas, found the hidden gems, and are now ready to go to create some pretty hefty projects in your favorite language, enjoying serverless, Amazon Kinesis, and API Gateway along the way.

If you have questions or suggestions, please comment below.

Deploy an App to an AWS OpsWorks Layer Using AWS CodePipeline

Post Syndicated from Daniel Huesch original https://aws.amazon.com/blogs/devops/deploy-an-app-to-an-aws-opsworks-layer-using-aws-codepipeline/

Deploy an App to an AWS OpsWorks Layer Using AWS CodePipeline

AWS CodePipeline lets you create continuous delivery pipelines that automatically track code changes from sources such as AWS CodeCommit, Amazon S3, or GitHub. Now, you can use AWS CodePipeline as a code change-management solution for apps, Chef cookbooks, and recipes that you want to deploy with AWS OpsWorks.

This blog post demonstrates how you can create an automated pipeline for a simple Node.js app by using AWS CodePipeline and AWS OpsWorks. After you configure your pipeline, every time you update your Node.js app, AWS CodePipeline passes the updated version to AWS OpsWorks. AWS OpsWorks then deploys the updated app to your fleet of instances, leaving you to focus on improving your application. AWS makes sure that the latest version of your app is deployed.

Step 1: Upload app code to an Amazon S3 bucket

The Amazon S3 bucket must be in the same region in which you later create your pipeline in AWS CodePipeline. For now, AWS CodePipeline supports the AWS OpsWorks provider in the us-east-1 region only; all resources in this blog post should be created in the US East (N. Virginia) region. The bucket must also be versioned, because AWS CodePipeline requires a versioned source. For more information, see Using Versioning.

Upload your app to an Amazon S3 bucket

  1. Download a ZIP file of the AWS OpsWorks sample, Node.js app, and save it to a convenient location on your local computer: https://s3.amazonaws.com/opsworks-codepipeline-demo/opsworks-nodejs-demo-app.zip.
  2. Open the Amazon S3 console at https://console.aws.amazon.com/s3/. Choose Create Bucket. Be sure to enable versioning.
  3. Choose the bucket that you created and upload the ZIP file that you saved in step 1.code-pipeline-01
  4. In the Properties pane for the uploaded ZIP file, make a note of the S3 link to the file. You will need the bucket name and the ZIP file name portion of this link to create your pipeline.

Step 2: Create an AWS OpsWorks to Amazon EC2 service role

1.     Go to the Identity and Access Management (IAM) service console, and choose Roles.
2.     Choose Create Role, and name it aws-opsworks-ec2-role-with-s3.
3.     In the AWS Service Roles section, choose Amazon EC2, and then choose the policy called AmazonS3ReadOnlyAccess.
4.     The new role should appear in the Roles dashboard.

code-pipeline-02

Step 3: Create an AWS OpsWorks Chef 12 Linux stack

To use AWS OpsWorks as a provider for a pipeline, you must first have an AWS OpsWorks stack, a layer, and at least one instance in the layer. As a reminder, the Amazon S3 bucket to which you uploaded your app must be in the same region in which you later create your AWS OpsWorks stack and pipeline, US East (N. Virginia).

1.     In the OpsWorks console, choose Add Stack, and then choose a Chef 12 stack.
2.     Set the stack’s name to CodePipeline Demo and make sure the Default operating system is set to Linux.
3.     Enable Use custom Chef cookbooks.
4.     For Repository type, choose HTTP Archive, and then use the following cookbook repository on S3: https://s3.amazonaws.com/opsworks-codepipeline-demo/opsworks-nodejs-demo-cookbook.zip. This repository contains a set of Chef cookbooks that include Chef recipes you’ll use to install the Node.js package and its dependencies on your instance. You will use these Chef recipes to deploy the Node.js app that you prepared in step 1.1.

Step 4: Create and configure an AWS OpsWorks layer

Now that you’ve created an AWS OpsWorks stack called CodePipeline Demo, you can create an OpsWorks layer.

1.     Choose Layers, and then choose Add Layer in the AWS OpsWorks stack view.
2.     Name the layer Node.js App Server. For Short Name, type app1, and then choose Add Layer.
3.     After you create the layer, open the layer’s Recipes tab. In the Deploy lifecycle event, type nodejs_demo. Later, you will link this to a Chef recipe that is part of the Chef cookbook you referenced when you created the stack in step 3.4. This Chef recipe runs every time a new version of your application is deployed.

code-pipeline-03

4.     Now, open the Security tab, choose Edit, and choose AWS-OpsWorks-WebApp from the Security groups drop-down list. You will also need to set the EC2 Instance Profile to use the service role you created in step 2.2 (aws-opsworks-ec2-role-with-s3).

code-pipeline-04

Step 5: Add your App to AWS OpsWorks

Now that your layer is configured, add the Node.js demo app to your AWS OpsWorks stack. When you create the pipeline, you’ll be required to reference this demo Node.js app.

  1. Have the Amazon S3 bucket link from the step 1.4 ready. You will need the link to the bucket in which you stored your test app.
  2. In AWS OpsWorks, open the stack you created (CodePipeline Demo), and in the navigation pane, choose Apps.
  3. Choose Add App.
  4. Provide a name for your demo app (for example, Node.js Demo App), and set the Repository type to an S3 Archive. Paste your S3 bucket link (s3://bucket-name/file name) from step 1.4.
  5. Now that your app appears in the list on the Apps page, add an instance to your OpsWorks layer.

Step 6: Add an instance to your AWS OpsWorks layer

Before you create a pipeline in AWS CodePipeline, set up at least one instance within the layer you defined in step 4.

  1. Open the stack that you created (CodePipeline Demo), and in the navigation pane, choose Instances.
  2. Choose +Instance, and accept the default settings, including the hostname, size, and subnet. Choose Add Instance.

code-pipeline-05

  1. By default, the instance is in a stopped state. Choose start to start the instance.

Step 7: Create a pipeline in AWS CodePipeline

Now that you have a stack and an app configured in AWS OpsWorks, create a pipeline with AWS OpsWorks as the provider to deploy your app to your specified layer. If you update your app or your Chef deployment recipes, the pipeline runs again automatically, triggering the deployment recipe to run and deploy your updated app.

This procedure creates a simple pipeline that includes only one Source and one Deploy stage. However, you can create more complex pipelines that use AWS OpsWorks as a provider.

To create a pipeline

  1. Open the AWS CodePipeline console in the U.S. East (N. Virginia) region.
  2. Choose Create pipeline.
  3. On the Getting started with AWS CodePipeline page, type MyOpsWorksPipeline, or a pipeline name of your choice, and then choose Next step.
  4. On the Source Location page, choose Amazon S3 from the Source provider drop-down list.
  5. In the Amazon S3 details area, type the Amazon S3 bucket path to your application, in the format s3://bucket-name/file name. Refer to the link you noted in step 1.4. Choose Next step.
    code-pipeline-06
  6. On the Build page, choose No Build from the drop-down list, and then choose Next step.
  7. On the Deploy page, choose AWS OpsWorks as the deployment provider.code-pipeline-07-2
  8. Specify the names of the stack, layer, and app that you created earlier, then choose Next step.
  9. On the AWS Service Role page, choose Create Role. On the IAM console page that opens, you will see the role that will be created for you (AWS-CodePipeline-Service). From the Policy Name drop-down list, choose Create new policy. Be sure the policy document has the following content, and then choose Allow.
    For more information about the service role and its policy statement, see Attach or Edit a Policy for an IAM Service Role.code-pipeline-08-2
  10. On the Review your pipeline page, confirm the choices shown on the page, and then choose Create pipeline.

The pipeline should now start deploying your app to your OpsWorks layer on its own.  Wait for deployment to finish; you’ll know it’s finished when Succeeded is displayed in both the Source and Deploy stages.

code-pipeline-09-2

Step 8: Verifying the app deployment

To verify that AWS CodePipeline deployed the Node.js app to your layer, sign in to the instance you created in step 4. You should be able to see and use the Node.js web app.

  1. On the AWS OpsWorks dashboard, choose the stack and the layer to which you just deployed your app.
  2. In the navigation pane, choose Instances, and then choose the public IP address of your instance to view the web app. The running app will be displayed in a new browser tab.code-pipeline-10-2
  3. To test the app, on the app’s web page, in the Leave a comment text box, type a comment, and then choose Send. The app adds your comment to the web page. You can add more comments to the page, if you like.

code-pipeline-11-2

Wrap-up

You now have a working and fully automated pipeline. As soon as you make changes to your application’s code and update the S3 bucket with the new version of your app, AWS CodePipeline automatically collects the artifact and uses AWS OpsWorks to deploy it to your instance, by running the OpsWorks deployment Chef recipe that you defined on your layer. The deployment recipe starts all of the operations on your instance that are required to support a new version of your artifact.

To learn more about Chef cookbooks and recipes: https://docs.chef.io/cookbooks.html

To learn more about the AWS OpsWorks and AWS CodePipeline integration: https://docs.aws.amazon.com/opsworks/latest/userguide/other-services-cp.html

Integrating Git with AWS CodePipeline

Post Syndicated from Jay McConnell original https://aws.amazon.com/blogs/devops/integrating-git-with-aws-codepipeline/


AWS CodePipeline
is a continuous delivery service you can use to model, visualize, and automate the steps required to release your software. The service currently supports GitHub, AWS CodeCommit, and Amazon S3 as source providers. This blog post will cover how to integrate AWS CodePipeline with GitHub Enterprise, Bitbucket, GitLab, or any other Git server that supports the webhooks functionality available in most Git software.

Architecture overview

Webhooks notify a remote service by issuing an HTTP POST when a commit is pushed to the repository. AWS Lambda receives the HTTP POST through Amazon API Gateway, and then downloads a copy of the repository. It places a zipped copy of the repository into a versioned S3 bucket. AWS CodePipeline can then use the zip file in S3 as a source; the pipeline will be triggered whenever the Git repository is updated.

Architectural diagram

Architectural overview

There are two methods you can use to get the contents of a repository. Each method exposes Lambda functions that have different security and scalability properties.

  • Zip download uses the Git provider’s HTTP API to download an already-zipped copy of the current state of the repository.
    • No need for external libraries.
    • Smaller Lambda function code.
    • Large repo size limit (500 MB).
  • Git pull uses SSH to pull from the repository. The repository contents are then zipped and uploaded to S3.
    • Efficient for repositories with a high volume of commits, because each time the API is triggered, it downloads only the changed files.
    • Suitable for any Git server that supports hooks and SSH; does not depend on personal access tokens or OAuth2.
    • More extensible because it uses a standard Git library.

Build the required AWS resources

For your convenience, there is an AWS CloudFormation template that includes the AWS infrastructure and configuration required to build out this integration. To launch the CloudFormation stack setup wizard, click the link for your desired region. (The following AWS regions support all of the services required for this integration.)

For a list of services available in AWS regions, see the AWS Region Table.

The stack setup wizard will prompt you to enter several parameters. Many of these values must be obtained from your Git service.

OutputBucketName: The name of the bucket where your zipped code will be uploaded. CloudFormation will create a bucket with this name. For this reason, you cannot use the name of an existing S3 bucket.

Note: By default, there is no lifecycle policy on this bucket, so previous versions of your code will be retained indefinitely.  If you want to control the retention period of previous versions, see Lifecycle Configuration for a Bucket with Versioning in the Amazon S3 User Guide.

AllowedIps: Used only with the git pull method described earlier. A comma-separated list of IP CIDR blocks used for Git provider source IP authentication. The Bitbucket Cloud IP ranges are provided as defaults.

ApiSecret: Used only with the git pull method described earlier. This parameter is used for webhook secrets in GitHub Enterprise and GitLab. If a secret is matched, IP range authentication is bypassed. The secret cannot contain commas (,), slashes (\), or quotation marks ().

GitToken: Used only with the zip download method described earlier. This is a personal access token generated by GitHub Enterprise or GitLab.

OauthKey/OuathSecret: Used only with the zip download method described earlier. This is an OAuth2 key and secret provided by Bitbucket.

At least one parameter for your chosen method and provider must be set.

The process for setting up webhook secrets and API tokens differs between vendors and product versions. Consult your Git provider’s documentation for details.

After you have entered values for these parameters, you can complete the steps in the wizard and start the stack creation. If your desired values change over time, you can use CloudFormation’s update stack functionality to modify your parameters.

After the CloudFormation stack creation is complete, make a note of the GitPullWebHookApi, ZipDownloadWebHookApi, OutputBucketName and PublicSSHKey. You will need these in the following steps.

Configure the source repository

Depending on the method (git pull or zip download) you would like to use, in your Git provider’s interface, set the destination URL of your webhook to either the GitPullWebHookApi or ZipDownloadWebHookApi. If you create a secret at this point, be sure to update the ApiSecret parameter in your CloudFormation stack.

If you are using the git pull method, the Git repo is downloaded over SSH. For this reason, the PublicSSHKey output must be imported into Git as a deployment key.

Test a commit

After you have set up webhooks on your repository, run the git push command to create a folder structure and zip file in the S3 bucket listed in your CloudFormation output as OutputBucketName. If the zip file is not created, you can check the following sources for troubleshooting help:

Set up AWS CodePipeline

The final step is to create a pipeline in AWS CodePipeline using the zip file as an S3 source. For information about creating a pipeline, see the Simple Pipeline Walkthrough in the AWS CodePipeline User Guide. After your pipeline is set up, commits to your repository will trigger an update to the zip file in S3, which, in turn, triggers a pipeline execution.

We hope this blog post will help you integrate your Git server. Feel free to leave suggestions or approaches on integration in the comments.

CodePipeline Update – Build Continuous Delivery Workflows for CloudFormation Stacks

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/codepipeline-update-build-continuous-delivery-workflows-for-cloudformation-stacks/

When I begin to write about a new way for you to become more productive by using two AWS services together, I think about a 1980’s TV commercial for Reese’s Peanut Butter Cups! The intersection of two useful services or two delicious flavors creates a new one that is even better.

Today’s chocolate / peanut butter intersection takes place where AWS CodePipeline meets AWS CloudFormation. You can now use CodePipeline to build a continuous delivery pipeline for CloudFormation stacks. When you practice continuous delivery, each code change is automatically built, tested, and prepared for release to production. In most cases, the continuous delivery release process includes a combination of manual and automatic approval steps. For example, code that successfully passes through a series of automated tests can be routed to a development or product manager for final review and approval before it is pushed to production.

This important combination of features allows you to use the infrastructure as code model while gaining all of the advantages of continuous delivery. Each time you change a CloudFormation template, CodePipeline can initiate a workflow that will build a test stack, test it, await manual approval, and then push the changes to production.The workflow can create and manipulate stacks in many different ways:

As you will soon see, the workflow can take advantage of advanced CloudFormation features such as the ability to generate and then apply change sets (read New – Change Sets for AWS CloudFormation to learn more) to an operational stack.

The Setup
In order to learn more about this feature, I used a CloudFormation template to set up my continuous delivery pipeline (this is yet another example of infrastructure as code). This template (available here and described in detail here) sets up a full-featured pipeline. When I use the template to create my pipeline, I specify the name of an S3 bucket and the name of a source file:

The SourceS3Key points to a ZIP file that is enabled for S3 versioning. This file contains the CloudFormation template (I am using the WordPress Single Instance example) that will be deployed via the pipeline that I am about to create. It can also contain other deployment artifacts such as configuration or parameter files; here’s what mine looks like:

The entire continuous delivery pipeline is ready just a few seconds after I click on Create Stack. Here’s what it looks like:

The Action
At this point I have used CloudFormation to set up my pipeline. With the stage set (so to speak), now I can show you how this pipeline makes use of the new CloudFormation actions.

Let’s focus on the second stage, TestStage. Triggered by the first stage, this stage uses CloudFormation to create a test stack:

The stack is created using parameter values from the test-stack-configuration.json file in my ZIP. Since you can use different configuration files for each CloudFormation action, you can use the same template for testing and production.

After the stack is up and running, the ApproveTestStack step is used to await manual approval (it says “Waiting for approval above.”). Playing the role of the dev manager, I verify that the test stack behaves and performs as expected, and then approve it:

After approval, the DeleteTestStack step deletes the test stack.

Now we are just about ready to deploy to production. ProdStage creates a CloudFormation change set and then submits it for manual approval. This stage uses the parameter values from the prod-stack-configuration.json file in my ZIP. I can use the parameters to launch a modestly sized test environment on a small EC2 instance and a large production environment from the same template.

Now I’m playing the role of the big boss, responsible for keeping the production site up and running. I review the change set in order to make sure that I understand what will happen when I deploy to production. This is the first time that I am running the pipeline, so the change set indicates that an EC2 instance and a security group will be created:

And then I approve it:

With the change set approved, it is applied to the existing production stack in the ExecuteChangeSet step. Applying the change to an  existing stack keeps existing resources in play where possible and avoids a wholesale restart of the application. This is generally more efficient and less disruptive than replacing the entire stack. It keeps in-memory caches warmed up and avoids possible bursts of cold-start activity.

Implementing a Change
Let’s say that I decide to support HTTPS. In order to do this, I need to add port 443 to my application’s security group. I simply edit the CloudFormation template, put it into a fresh ZIP, and upload it to S3. Here’s what I added to my template:

      - CidrIp: 0.0.0.0/0
        FromPort: '443'
        IpProtocol: tcp
        ToPort: '443'

Then I return to the Console and see that CodePipeline has already detected the change and set the pipeline in to motion:

The pipeline runs again, I approve the test stack, and then inspect the change set, confirming that it will simply modify an existing security group:

One quick note before I go. The CloudFormation template for the pipeline creates an IAM role and uses it to create the test and deployment stacks (this is a new feature; read about the AWS CloudFormation Service Role to learn more). For best results, you should delete the stacks before you delete the pipeline. Otherwise, you’ll need to re-create the role in order to delete the stacks.

There’s More
I’m just about out of space and time, but I’ll briefly summarize a couple of other aspects of this new capability.

Parameter Overrides – When you define a CloudFormation action, you may need to exercise additional control over the parameter values that are defined for the template. You can do this by opening up the Advanced pane and entering any desired parameter overrides:

Artifact References – In some situations you may find that you need to reference an attribute of an artifact that was produced by an earlier stage of the pipeline. For example, suppose that an early stage of your pipeline copies a Lambda function to an S3 bucket and calls the resulting artifact LambdaFunctionSource. Here’s how you would retrieve the bucket name and the object key from the attribute using a parameter override:

{
  "BucketName" : { "Fn::GetArtifactAtt" : ["LambdaFunctionSource", "BucketName"]},
  "ObjectKey" : { "Fn::GetArtifactAtt" : ["LambdaFunctionSource", "ObjectKey"]}
}

Access to JSON Parameter – You can use the new Fn::GetParam function to retrieve a value from a JSON-formatted file that is included in an artifact.

Note that Fn:GetArtifactAtt and Fn::GetParam are designed to be used within the parameter overrides.

S3 Bucket Versioning – The first step of my pipeline (the Source action) refers to an object in an S3 bucket. By enabling S3 versioning for the object, I simply upload a new version of my template after each change:

If I am using S3 as my source, I must use versioning (uploading a new object over the existing one is not supported). I can also use AWS CodeCommit or a GitHub repo as my source.

Create Pipeline Wizard
I started out this blog post by using a CloudFormation template to create my pipeline. I can also click on Create pipeline in the console and build my initial pipeline (with source, build, and beta deployment stages) using a wizard. The wizard now allows me to select CloudFormation as my deployment provider. I can create or update a stack or a change set in the beta deployment stage:

Available Now
This new feature is available now and you can start using it today. To learn more, check out the CodePipeline Documentation.


Jeff;

 

Building a Cross-Region/Cross-Account Code Deployment Solution on AWS

Post Syndicated from BK Chaurasiya original https://aws.amazon.com/blogs/devops/building-a-cross-regioncross-account-code-deployment-solution-on-aws/

Many of our customers have expressed a desire to build an end-to-end release automation workflow solution that can deploy changes across multiple regions or different AWS accounts.

In this post, I will show you how you can easily build an automated cross-region code deployment solution using AWS CodePipeline (a continuous delivery service), AWS CodeDeploy (an automated application deployment service), and AWS Lambda (a serverless compute service). In the Taking This Further section, I will also show you how to extend what you’ve learned so that you can create a cross-account deployment solution.

We will use AWS CodeDeploy and AWS CodePipeline to create a multi-pipeline solution running in two regions (Region A and Region B). Any update to the source code in Region A will trigger validation and deployment of source code changes in the pipeline in Region A. A successful processing of source code in all of its AWS CodePipeline stages will invoke a Lambda function as a custom action, which will copy the source code into an S3 bucket in Region B. After the source code is copied into this bucket, it will trigger a similar chain of processes into the different AWS CodePipeline stages in Region B. See the following diagram.

                                                   Diagram1

This architecture follows best practices for multi-region deployments, sequentially deploying code into one region at a time upon successful testing and validation. This architecture lets you place controls to stop the deployment if a problem is identified with release. This prevents a bad version from being propagated to your next environments.

This post is based on the Simple Pipeline Walkthrough in the AWS CodePipeline User Guide. I have provided an AWS CloudFormation template that automates the steps for you.

 

Prerequisites

You will need an AWS account with administrator permissions. If you don’t have an account, you can sign up for one here. You will also need sample application source code that you can download here.

We will use the CloudFormation template provided in this post to create the following resources:

  • Amazon S3 buckets to host the source code for the sample application. You can use a GitHub repository if you prefer, but you will need to change the CloudFormation template.
  • AWS CodeDeploy to deploy the sample application.
  • AWS CodePipeline with predefined stages for this setup.
  • AWS Lambda as a custom action in AWS CodePipeline. It invokes a function to copy the source code into another region or account. If you are deploying to multiple accounts, cross-account S3 bucket permissions are required.

Note: The resources created by the CloudFormation template may result in charges to your account. The cost will depend on how long you keep the CloudFormation stack and its resources.

Let’s Get Started

Choose your source and destination regions for a continuous delivery of your source code. In this post, we are deploying the source code to two regions: first to Region A (Oregon) and then to Region B (N. Virginia/US Standard). You can choose to extend the setup to three or more regions if your business needs require it.

Step 1: Create Amazon S3 buckets for hosting your application source code in your source and destination regions. Make sure versioning is enabled on these buckets. For more information, see these steps in the AWS CodePipeline User Guide.

For example:

xrdeployment-sourcecode-us-west-2-<AccountID>           (Source code bucket in Region A – Oregon)

xrdeployment-sourcecode-us-east-1-<AccountID>           (Source code bucket in Region B – N. Virginia/US Standard)

Note: The source code bucket in Region B is also the destination bucket in Region A. Versioning on the bucket ensures that AWS CodePipeline is executed automatically when source code is changed.

Configuration Setup in Source Region A

Be sure you are in the US West (Oregon) region. You can use the drop-down menu to switch regions.

 

Step 2: In the AWS CloudFormation console, choose launch-stack to launch the CloudFormation template. All of the steps in the Simple Pipeline Walkthrough are automated when you use this template.

This template creates a custom Lambda function and AWS CodePipeline and AWS CodeDeploy resources to deploy a sample application. You can customize any of these components according to your requirements later.

On the Specify Details page, do the following:

  1. In Stack name, type a name for the stack (for example, XRDepDemoStackA).
  2. In AppName, you can leave the default, or you can type a name of up to 40 characters. Use only lowercase letters, numbers, periods, and hyphens.
  3. In InstanceCount and InstanceType, leave the default values. You might want to change them when you extend this setup for your use case.
  4. In S3SourceCodeBucket, specify the name of the S3 bucket where source code is placed (xrdeployment-sourcecode-us-west-2-<AccountID>). See step 1.
  5. In S3SourceCodeObject, specify the name of the source code zip file. The sample source code, xrcodedeploy_linux.zip, is provided for you.
  6. Choose a destination region from the drop-down list. For the steps in this blog post, choose us-east-1.
  7. In DestinationBucket, type the name of the bucket in the destination region where the source code will be copied  (xrdeployment-sourcecode-us-east-1-<AccountID>). See step 1.
  8. In KeyPairName, choose the name of Amazon EC2 key pair. This enables remote login to your instances. You cannot sign in to your instance without generating a key pair and downloading a private key file. For information about generating a key pair, see these steps.
  9. In SSHLocation, type the IP address from which you will access the resources created in this stack. This is a recommended security best practice.
  10. In TagValue, type a value that identifies the deployment stage in the target deployment (for example, Alpha).
  11. Choose Next.

 

Diagram3

(Optional) On the Options page, in Key, type Name. In Value, type a name that will help you easily identify the resources created in this stack. This name will be used to tag all of the resources created by the template. These tags are helpful if you want to use or modify these resources later on. Choose Next.

On the Review page, select the I acknowledge that this template might cause AWS CloudFormation to create IAM resources check box. (It will.) Review the other settings, and then choose Create.

                                                Diagram4

It will take several minutes for CloudFormation to create the resources on your behalf. You can watch the progress messages on the Events tab in the console.

When the stack has been created, you will see a CREATE_COMPLETE message in the Status column on the Overview tab.

                  Diagram5

Configuration Setup in Destination Region B

Step 3: We now need to create AWS resources in Region B. Use the drop-down menu to switch to US East (N. Virginia).

In the AWS CloudFormation console, choose launch-stack to launch the CloudFormation template.

On the Specify Details page, do the following:

  1. In Stack name, type a name for the stack (for example, XRDepDemoStackB).
  2. In AppName, you can leave the default, or you can type a name of up to 40 characters. Use only lowercase letters, numbers, periods, and hyphens.
  3. In InstanceCount and InstanceType, leave the default values. You might want to change them when you extend this setup for your use case.
  4. In S3SourceCodeBucket, specify the name of the S3 bucket where the source code is placed (xrdeployment-sourcecode-us-east-1-<AccountID>).  This is same as the DestinationBucket in step 2.
  5. In S3SourceCodeObject, specify the name of the source code zip file. The sample source code (xrcodedeploy_linux.zip) is provided for you.
  6. From the DestinationRegion drop-down list, choose none.
  7. In DestinationBucket, type none. This is our final destination region for this setup.
  8. In the KeyPairName, choose the name of the EC2 key pair.
  9. In SSHLocation, type the IP address from which you will access the resources created in this stack.
  10. In TagValue, type a value that identifies the deployment stage in the target deployment (for example, Beta).

Repeat the steps in the Configuration Setup in Source Region A until the CloudFormation stack has been created. You will see a CREATE_COMPLETE message in the Status column of the console.

So What Just Happened?

We have created an EC2 instance in both regions. These instances are running a sample web application. We have also configured AWS CodeDeploy deployment groups and created a pipeline where source changes propagate to AWS CodeDeploy groups in both regions. AWS CodeDeploy deploys a web page to each of the Amazon EC2 instances in the deployment groups. See the diagram at the beginning of this post.

The pipelines in both regions will start automatically as they are created. You can view your pipelines in the AWS CodePipeline console. You’ll find a link to AWS CodePipeline on the Outputs section of your CloudFormation stack.

                                  Diagram7

Note: Your pipeline will fail during its first automatic execution because we haven’t placed source code into the S3SourceCodeBucket in the source region (Region A).

Step 4: Download the sample source code file, xrcodedeploy_linux.zip, from this link and place it in the source code S3 bucket for Region A. This will kick off AWS CodePipeline.

Step 5: Watch the progress of your pipeline in the source region (Region A) as it completes the actions configured for each of its stages and invokes a custom Lambda action that copies the source code into Region B. Then watch the progress of your pipeline in Region B (final destination region) after the pipeline succeeds in the source region (Region A). The pipeline in the destination region (Region B) should kick off automatically as soon as AWS CodePipeline in the source region (Region A) completes execution.

When each stage is complete, it turns from blue (in progress) to green (success).

                                                              Diagram8

Congratulations! You just created a cross-region deployment solution using AWS CodePipeline, AWS CodeDeploy, and AWS Lambda. You can place a new version of source code in your S3 bucket and watch it progress through AWS CodePipeline in all the regions.

Step 6: Verify your deployment. When Succeeded is displayed for the pipeline status in the final destination region, view the deployed application:

  1. In the status area for Betastage in the final destination region, choose Details. The details of the deployment will appear in the AWS CodeDeploy console. You can also pick any other stage in other regions.
  2. In the Deployment Details section, in Instance ID, choose the instance ID of any of the successfully deployed instance.
  3. In the Amazon EC2 console, on the Description tab, in Public DNS, copy the address, and then paste it into the address bar of your web browser. The web page opens the sample web application that was built for you

                             Diagram9

Taking This Further

  1. Using the CloudFormation template provided to you in this post, you can extend the setup to three regions.
  2. So far we have deployed code in two regions within one AWS account. There may be a case where your environments exist in different AWS accounts. For example, assume a scenario in which:
  • You have your development environment running in Region A in AWS Account A.
  • You have your QA environment running in Region B in AWS Account B.
  • You have a staging or production environment running in Region C in AWS Account C.

 

Diagram10

You will need to configure cross-account permissions on your destination S3 bucket and delegate these permissions to a role that Lambda assumed in the source account. Without these permissions, the Lambda function in AWS CodePipeline will not be able to copy the source code into the destination S3 bucket. (See the lambdaS3CopyRole in the CloudFormation template provided with this post.)

Create the following bucket policy on the destination bucket in Account B:

{

                  “Version”: “2012-10-17”,

                  “Statement”: [

                                    {

                                                      “Sid”: “DelegateS3Access”,

                                                      “Effect”: “Allow”,

                                                      “Principal”: {

                                                                        “AWS”: “arn:aws:iam::<Account A ID>:root”

                                                      },

                                                      “Action”: “s3:*”,

                                                      “Resource”: [

                                                                        “arn:aws:s3::: <destination bucket > /*”,

                                                                        “arn:aws:s3::: <destination bucket > “

                                                      ]

                                    }

                  ]

}

 

Repeat this step as you extend the setup to additional accounts.

Create your CloudFormation stacks in Account B and Account C (follow steps 2 and 3 in these accounts, respectively) and your pipeline will execute sequentially.

You can use another code repository solution like AWS CodeCommit or Github as your source and target repositories.

Wrapping Up

After you’ve finished exploring your pipeline and its associated resources, you can do the following:

  • Extend the setup. Add more stages to your pipeline in AWS CodePipeline.
  • Delete the stack in AWS CloudFormation, which deletes the pipeline, its resources, and the stack itself.

This is the option to choose if you no longer want to use the pipeline or any of its resources. Cleaning up resources you’re no longer using is important because you don’t want to continue to be charged.

To delete the CloudFormation stack:

  1. Delete the Amazon S3 buckets used as the artifact store in AWS CodePipeline in the source and destination regions. Although this bucket was created as part of the CloudFormation stack, Amazon S3 does not allow CloudFormation to delete buckets that contain objects.To delete this bucket, open the Amazon S3 console, select the buckets you created in this setup, and then delete them. For more information, see Delete or Empty a Bucket.
  2. Follow the steps to delete a stack in the AWS CloudFormation User Guide.

 

I would like to thank my colleagues Raul Frias, Asif Khan and Frank Li for their contributions to this post.