Tag Archives: New stuff

New – How to better monitor your custom application metrics using Amazon CloudWatch Agent

Post Syndicated from Helen Lin original https://aws.amazon.com/blogs/devops/new-how-to-better-monitor-your-custom-application-metrics-using-amazon-cloudwatch-agent/

This blog was contributed by Zhou Fang, Sr. Software Development Engineer for Amazon CloudWatch and Helen Lin, Sr. Product Manager for Amazon CloudWatch

Amazon CloudWatch collects monitoring and operational data from both your AWS resources and on-premises servers, providing you with a unified view of your infrastructure and application health. By default, CloudWatch automatically collects and stores many of your AWS services’ metrics and enables you to monitor and alert on metrics such as high CPU utilization of your Amazon EC2 instances. With the CloudWatch Agent that launched last year, you can also deploy the agent to collect system metrics and application logs from both your Windows and Linux environments. Using this data collected by CloudWatch, you can build operational dashboards to monitor your service and application health, set high-resolution alarms to alert and take automated actions, and troubleshoot issues using Amazon CloudWatch Logs.

We recently introduced CloudWatch Agent support for collecting custom metrics using StatsD and collectd. It’s important to collect system metrics like available memory, and you might also want to monitor custom application metrics. You can use these custom application metrics, such as request count to understand the traffic going through your application or understand latency so you can be alerted when requests take too long to process. StatsD and collectd are popular, open-source solutions that gather system statistics for a wide variety of applications. By combining the system metrics the agent already collects, with the StatsD protocol for instrumenting your own metrics and collectd’s numerous plugins, you can better monitor, analyze, alert, and troubleshoot the performance of your systems and applications.

Let’s dive into an example that demonstrates how to monitor your applications using the CloudWatch Agent.  I am operating a RESTful service that performs simple text encoding. I want to use CloudWatch to help monitor a few key metrics:

  • How many requests are coming into my service?
  • How many of these requests are unique?
  • What is the typical size of a request?
  • How long does it take to process a job?

These metrics help me understand my application performance and throughput, in addition to setting alarms on critical metrics that could indicate service degradation, such as request latency.

Step 1. Collecting StatsD metrics

My service is running on an EC2 instance, using Amazon Linux AMI 2018.03.0. Make sure to attach the CloudWatchAgentServerPolicy AWS managed policy so that the CloudWatch agent can collect and publish metrics from this instance:

Here is the service structure:

 

The “/encode” handler simply returns the base64 encoded string of an input text.  To monitor key metrics, such as total and unique request count as well as request size and method response time, I used StatsD to define these custom metrics.

@RestController

public class EncodeController {

    @RequestMapping("/encode")
    public String encode(@RequestParam(value = "text") String text) {
        long startTime = System.currentTimeMillis();
        statsd.incrementCounter("totalRequest.count", new String[]{"path:/encode"});
        statsd.recordSetValue("uniqueRequest.count", text, new String[]{"path:/encode"});
        statsd.recordHistogramValue("request.size", text.length(), new String[]{"path:/encode"});
        String encodedString = Base64.getEncoder().encodeToString(text.getBytes());
        statsd.recordExecutionTime("latency", System.currentTimeMillis() - startTime, new String[]{"path:/encode"});
        return encodedString;
    }
}

Note that I need to first choose a StatsD client from here.

The “/status” handler responds with a health check ping.  Here I am monitoring my available JVM memory:

@RestController
public class StatusController {

    @RequestMapping("/status")
    public int status() {
        statsd.recordGaugeValue("memory.free", Runtime.getRuntime().freeMemory(), new String[]{"path:/status"});
        return 0;
    }
}

 

Step 2. Emit custom metrics using collectd (optional)

collectd is another popular, open-source daemon for collecting application metrics. If I want to use the hundreds of available collectd plugins to gather application metrics, I can also use the CloudWatch Agent to publish collectd metrics to CloudWatch for 15-months retention. In practice, I might choose to use either StatsD or collectd to collect custom metrics, or I have the option to use both. All of these use cases  are supported by the CloudWatch agent.

Using the same demo RESTful service, I’ll show you how to monitor my service health using the collectd cURL plugin, which passes the collectd metrics to CloudWatch Agent via the network plugin.

For my RESTful service, the “/status” handler returns HTTP code 200 to signify that it’s up and running. This is important to monitor the health of my service and trigger an alert when the application does not respond with a HTTP 200 success code. Additionally, I want to monitor the lapsed time for each health check request.

To collect these metrics using collectd, I have a collectd daemon installed on the EC2 instance, running version 5.8.0. Here is my collectd config:

LoadPlugin logfile
LoadPlugin curl
LoadPlugin network

<Plugin logfile>
  LogLevel "debug"
  File "/var/log/collectd.log"
  Timestamp true
</Plugin>

<Plugin curl>
    <Page "status">
        URL "http://localhost:8080/status";
        MeasureResponseTime true
        MeasureResponseCode true
    </Page>
</Plugin>

<Plugin network>
    <Server "127.0.0.1" "25826">
        SecurityLevel Encrypt
        Username "user"
        Password "secret"
    </Server>
</Plugin>

 

For the cURL plugin, I configured it to measure response time (latency) and response code (HTTP status code) from the RESTful service.

Note that for the network plugin, I used Encrypt mode which requires an authentication file for the CloudWatch Agent to authenticate incoming collectd requests.  Click here for full details on the collectd installation script.

 

Step 3. Configure the CloudWatch agent

So far, I have shown you how to:

A.  Use StatsD to emit custom metrics to monitor my service health
B.  Optionally use collectd to collect metrics using plugins

Next, I will install and configure the CloudWatch agent to accept metrics from both the StatsD client and collectd plugins.

I installed the CloudWatch Agent following the instructions in the user guide, but here are the detailed steps:

Install CloudWatch Agent:

wget https://s3.amazonaws.com/amazoncloudwatch-agent/linux/amd64/latest/AmazonCloudWatchAgent.zip -O AmazonCloudWatchAgent.zip && unzip -o AmazonCloudWatchAgent.zip && sudo ./install.sh

Configure CloudWatch Agent to receive metrics from StatsD and collectd:

{
  "metrics": {
    "append_dimensions": {
      "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
      "InstanceId": "${aws:InstanceId}"
    },
    "metrics_collected": {
      "collectd": {},
      "statsd": {}
    }
  }
}

Pass the above config (config.json) to the CloudWatch Agent:

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:config.json -s

In case you want to skip these steps and just execute my sample agent install script, you can find it here.

 

Step 4. Generate and monitor application traffic in CloudWatch

Now that I have the CloudWatch agent installed and configured to receive StatsD and collect metrics, I’m going to generate traffic through the service:

echo "send 100 requests"
for i in {1..100}
do
   curl "localhost:8080/encode?text=TextToEncode_${i}[email protected]#%"
   echo ""
   sleep 1
done

 

Next, I log in to the CloudWatch console and check that the service is up and running. Here’s a graph of the StatsD metrics:

 

Here is a graph of the collectd metrics:

 

Conclusion

With StatsD and collectd support, you can now use the CloudWatch Agent to collect and monitor your custom applications in addition to the system metrics and application logs it already collects. Furthermore, you can create operational dashboards with these metrics, set alarms to take automated actions when free memory is low, and troubleshoot issues by diving into the application logs.  Note that StatsD supports both Windows and Linux operating systems while collectd is Linux only.  For Windows, you can also continue to use Windows Performance Counters to collect custom metrics instead.

The CloudWatch Agent with custom metrics support (version 1.203420.0 or later) is available in all public AWS Regions, AWS GovCloud (US), with AWS China (Beijing) and AWS China (Ningxia) coming soon.

The agent is free to use; you pay the usual CloudWatch prices for logs and custom metrics.

For more details, head over to the CloudWatch user guide for StatsD and collectd.

The Benefits of Side Projects

Post Syndicated from Bozho original https://techblog.bozho.net/the-benefits-of-side-projects/

Side projects are the things you do at home, after work, for your own “entertainment”, or to satisfy your desire to learn new stuff, in case your workplace doesn’t give you that opportunity (or at least not enough of it). Side projects are also a way to build stuff that you think is valuable but not necessarily “commercialisable”. Many side projects are open-sourced sooner or later and some of them contribute to the pool of tools at other people’s disposal.

I’ve outlined one recommendation about side projects before – do them with technologies that are new to you, so that you learn important things that will keep you better positioned in the software world.

But there are more benefits than that – serendipitous benefits, for example. And I’d like to tell some personal stories about that. I’ll focus on a few examples from my list of side projects to show how, through a sort-of butterfly effect, they helped shape my career.

The computoser project, no matter how cool algorithmic music composition, didn’t manage to have much of a long term impact. But it did teach me something apart from niche musical theory – how to read a bulk of scientific papers (mostly computer science) and understand them without being formally trained in the particular field. We’ll see how that was useful later.

Then there was the “State alerts” project – a website that scraped content from public institutions in my country (legislation, legislation proposals, decisions by regulators, new tenders, etc.), made them searchable, and “subscribable” – so that you get notified when a keyword of interest is mentioned in newly proposed legislation, for example. (I obviously subscribed for “information technologies” and “electronic”).

And that project turned out to have a significant impact on the following years. First, I chose a new technology to write it with – Scala. Which turned out to be of great use when I started working at TomTom, and on the 3rd day I was transferred to a Scala project, which was way cooler and much more complex than the original one I was hired for. It was a bit ironic, as my colleagues had just read that “I don’t like Scala” a few weeks earlier, but nevertheless, that was one of the most interesting projects I’ve worked on, and it went on for two years. Had I not known Scala, I’d probably be gone from TomTom much earlier (as the other project was restructured a few times), and I would not have learned many of the scalability, architecture and AWS lessons that I did learn there.

But the very same project had an even more important follow-up. Because if its “civic hacking” flavour, I was invited to join an informal group of developers (later officiated as an NGO) who create tools that are useful for society (something like MySociety.org). That group gathered regularly, discussed both tools and policies, and at some point we put up a list of policy priorities that we wanted to lobby policy makers. One of them was open source for the government, the other one was open data. As a result of our interaction with an interim government, we donated the official open data portal of my country, functioning to this day.

As a result of that, a few months later we got a proposal from the deputy prime minister’s office to “elect” one of the group for an advisor to the cabinet. And we decided that could be me. So I went for it and became advisor to the deputy prime minister. The job has nothing to do with anything one could imagine, and it was challenging and fascinating. We managed to pass legislation, including one that requires open source for custom projects, eID and open data. And all of that would not have been possible without my little side project.

As for my latest side project, LogSentinel – it became my current startup company. And not without help from the previous two mentioned above – the computer science paper reading was of great use when I was navigating the crypto papers landscape, and from the government job I not only gained invaluable legal knowledge, but I also “got” a co-founder.

Some other side projects died without much fanfare, and that’s fine. But the ones above shaped my “story” in a way that would not have been possible otherwise.

And I agree that such serendipitous chain of events could have happened without side projects – I could’ve gotten these opportunities by meeting someone at a bar (unlikely, but who knows). But we, as software engineers, are capable of tilting chance towards us by utilizing our skills. Side projects are our “extracurricular activities”, and they often lead to unpredictable, but rather positive chains of events. They would rarely be the only factor, but they are certainly great at unlocking potential.

The post The Benefits of Side Projects appeared first on Bozho's tech blog.

Kernel prepatch 4.16-rc7

Post Syndicated from corbet original https://lwn.net/Articles/750090/rss

The 4.16-rc7 prepatch is out; it’s
probably the last one. “I’m still not *planning*
on an rc8 this release, because while rc7 is bigger than usual,
nothing in here makes me go ‘Hmm, maybe we should delay the release’.
But let’s see what happens this upcoming week – if next Sunday comes
around, and there’s lots of new stuff, I’ll reconsider then.

Coding is for girls

Post Syndicated from magda original https://www.raspberrypi.org/blog/coding-is-for-girls/

Less than four years ago, Magda Jadach was convinced that programming wasn’t for girls. On International Women’s Day, she tells us how she discovered that it definitely is, and how she embarked on the new career that has brought her to Raspberry Pi as a software developer.

“Coding is for boys”, “in order to be a developer you have to be some kind of super-human”, and “it’s too late to learn how to code” – none of these three things is true, and I am going to prove that to you in this post. By doing this I hope to help some people to get involved in the tech industry and digital making. Programming is for anyone who loves to create and loves to improve themselves.

In the summer of 2014, I started the journey towards learning how to code. I attended my first coding workshop at the recommendation of my boyfriend, who had constantly told me about the skill and how great it was to learn. I was convinced that, at 28 years old, I was already too old to learn. I didn’t have a technical background, I was under the impression that “coding is for boys”, and I lacked the superpowers I was sure I needed. I decided to go to the workshop only to prove him wrong.

Later on, I realised that coding is a skill like any other. You can compare it to learning any language: there’s grammar, vocabulary, and other rules to acquire.

Log In or Sign Up to View

See posts, photos and more on Facebook.

Alien message in console

To my surprise, the workshop was completely inspiring. Within six hours I was able to create my first web page. It was a really simple page with a few cats, some colours, and ‘Hello world’ text. This was a few years ago, but I still remember when I first clicked “view source” to inspect the page. It looked like some strange alien message, as if I’d somehow broken the computer.

I wanted to learn more, but with so many options, I found myself a little overwhelmed. I’d never taught myself any technical skill before, and there was a lot of confusing jargon and new terms to get used to. What was HTML? CSS and JavaScript? What were databases, and how could I connect together all the dots and choose what I wanted to learn? Luckily I had support and was able to keep going.

At times, I felt very isolated. Was I the only girl learning to code? I wasn’t aware of many female role models until I started going to more workshops. I met a lot of great female developers, and thanks to their support and help, I kept coding.

Another struggle I faced was the language barrier. I am not a native speaker of English, and diving into English technical documentation wasn’t easy. The learning curve is daunting in the beginning, but it’s completely normal to feel uncomfortable and to think that you’re really bad at coding. Don’t let this bring you down. Everyone thinks this from time to time.

Play with Raspberry Pi and quit your job

I kept on improving my skills, and my interest in developing grew. However, I had no idea that I could do this for a living; I simply enjoyed coding. Since I had a day job as a journalist, I was learning in the evenings and during the weekends.

I spent long hours playing with a Raspberry Pi and setting up so many different projects to help me understand how the internet and computers work, and get to grips with the basics of electronics. I built my first ever robot buggy, retro game console, and light switch. For the first time in my life, I had a soldering iron in my hand. Day after day I become more obsessed with digital making.

Magdalena Jadach on Twitter

solderingiron Where have you been all my life? Weekend with #raspberrypi + @pimoroni + @Pololu + #solder = best time! #electricity

One day I realised that I couldn’t wait to finish my job and go home to finish some project that I was working on at the time. It was then that I decided to hand over my resignation letter and dive deep into coding.

For the next few months I completely devoted my time to learning new skills and preparing myself for my new career path.

I went for an interview and got my first ever coding internship. Two years, hundreds of lines of code, and thousands of hours spent in front of my computer later, I have landed my dream job at the Raspberry Pi Foundation as a software developer, which proves that dreams come true.

Animated GIF – Find & Share on GIPHY

Discover & share this Animated GIF with everyone you know. GIPHY is how you search, share, discover, and create GIFs.

Where to start?

I recommend starting with HTML & CSS – the same path that I chose. It is a relatively straightforward introduction to web development. You can follow my advice or choose a different approach. There is no “right” or “best” way to learn.

Below is a collection of free coding resources, both from Raspberry Pi and from elsewhere, that I think are useful for beginners to know about. There are other tools that you are going to want in your developer toolbox aside from HTML.

  • HTML and CSS are languages for describing, structuring, and styling web pages
  • You can learn JavaScript here and here
  • Raspberry Pi (obviously!) and our online learning projects
  • Scratch is a graphical programming language that lets you drag and combine code blocks to make a range of programs. It’s a good starting point
  • Git is version control software that helps you to work on your own projects and collaborate with other developers
  • Once you’ve got started, you will need a code editor. Sublime Text or Atom are great options for starting out

Coding gives you so much new inspiration, you learn new stuff constantly, and you meet so many amazing people who are willing to help you develop your skills. You can volunteer to help at a Code Club or  Coder Dojo to increase your exposure to code, or attend a Raspberry Jam to meet other like-minded makers and start your own journey towards becoming a developer.

The post Coding is for girls appeared first on Raspberry Pi.

Adding Visible Electronic Signatures To PDFs

Post Syndicated from Bozho original https://techblog.bozho.net/adding-visible-electronic-signatures-pdf/

I’m aware this is going to be a very niche topic. Electronically signing PDFs is far from a mainstream usecase. However, I’ll write it for two reasons – first, I think it will be very useful for those few who actually need it, and second, I think it will become more and more common as the eIDAS regulation gain popularity – it basically says that electronic signatures are recognized everywhere in Europe (now, it’s not exactly true, because of some boring legal details, but anyway).

So, what is the usecase – first, you have to electronically sign the PDF with an a digital signature (the legal term is “electronic signature”, so I’ll use them interchangeably, although they don’t fully match – e.g. any electronic data applied to other data can be seen as an electronic signature, where a digital signature is the PKI-based signature).

Second, you may want to actually display the signature on the pages, rather than have the PDF reader recognize it and show it in some side-panel. Why is that? Because people are used to seeing signatures on pages and some may insist on having the signature visible (true story – I’ve got a comment that a detached signature “is not a REAL electronic signature, because it’s not visible on the page”).

Now, notice that I wrote “pages”, on “page”. Yes, an electronic document doesn’t have pages – it’s a stream of bytes. So having the signature just on the last page is okay. But, again, people are used to signing all pages, so they’d prefer the electronic signature to be visible on all pages.

And that makes the task tricky – PDF is good with having a digital signature box on the last page, but having multiple such boxes doesn’t work well. Therefore one has to add other types of annotations that look like a signature box and when clicked open the signature panel (just like an actual signature box).

I have to introduce here DSS – a wonderful set of components by the European Commission that can be used to sign and validate all sorts of electronic signatures. It’s open source, you can use at any way you like. Deploy the demo application, use only the libraries, whatever. It includes the signing functionality out of the box – just check the PAdESService or the PDFBoxSignatureService. It even includes the option to visualize the signature once (on a particular page).

However, it doesn’t have the option to show “stamps” (images) on multiple pages. Which is why I forked it and implemented the functionality. Most of my changes are in the PDFBoxSignatureService in the loadAndStampDocument(..) method. If you want to use that functionality you can just build a jar from my fork and use it (by passing the appropriate SignatureImageParameters to PAdESSErvice.sign(..) to define how the signature will look like).

Why is this needed in the first place? Because when a document is signed, you cannot modify it anymore, as you will change the hash. However, PDFs have incremental updates which allow appending to the document and thus having a newer version without modifying anything in the original version. That way the signature is still valid (the originally signed content is not modified), but new stuff is added. In our case, this new stuff is some “annotations”, which represent an image and a clickable area that opens the signature panel (in Adobe Reader at least). And while they are added before the signature box is added, if there are more than one signer, then the 2nd signer’s annotations are added after the first signature.

Sadly, PDFBox doesn’t support that out of the box. Well, it almost does – the piece of code below looks hacky, and it took a while to figure what exactly should be called and when, but it works with just a single reflection call:

    for (PDPage page : pdDocument.getPages()) {
        // reset existing annotations (needed in order to have the stamps added)
        page.setAnnotations(null);
    }
    // reset document outline (needed in order to have the stamps added)
    pdDocument.getDocumentCatalog().setDocumentOutline(null);
    List<PDAnnotation> annotations = addStamps(pdDocument, parameters);
			
    setDocumentId(parameters, pdDocument);
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    try (COSWriter writer = new COSWriter(baos, new RandomAccessBuffer(pdfBytes))) {
        // force-add the annotations (wouldn't be saved in incremental updates otherwise)
        annotations.forEach(ann -> addObjectToWrite(writer, ann.getCOSObject()));
				
        // technically the same as saveIncremental but with more control
        writer.write(pdDocument);
    }
    pdDocument.close();
    pdDocument = PDDocument.load(baos.toByteArray());
    ...
}

private void addObjectToWrite(COSWriter writer, COSDictionary cosObject) {
    // the COSWriter does not expose the addObjectToWrite method, so we need reflection to add the annotations
    try {
        Method method = writer.getClass().getDeclaredMethod("addObjectToWrite", COSBase.class);
        method.setAccessible(true);
        method.invoke(writer, cosObject);
    } catch (Exception ex) {
        throw new RuntimeException(ex);
    }
}

What it does is – loads the original PDF, clears some internal catalogs, adds the annotations (images) to all pages, and then “force-add the annotations” because they “wouldn’t be saved in incremental updates otherwise”. I hope PDFBox make this a little more straightforward, but for the time being this works, and it doesn’t invalidate the existing signatures.

I hope that this posts introduces you to:

  • the existence of legally binding electronic signatures
  • the existence of the DSS utilities
  • the PAdES standard for PDF signing
  • how to place more than just one signature box in a PDF document

And I hope this article becomes more and more popular over time, as more and more businesses realize they could make use of electronic signatures.

The post Adding Visible Electronic Signatures To PDFs appeared first on Bozho's tech blog.

How to Enable Caching for AWS CodeBuild

Post Syndicated from Karthik Thirugnanasambandam original https://aws.amazon.com/blogs/devops/how-to-enable-caching-for-aws-codebuild/

AWS CodeBuild is a fully managed build service. There are no servers to provision and scale, or software to install, configure, and operate. You just specify the location of your source code, choose your build settings, and CodeBuild runs build scripts for compiling, testing, and packaging your code.

A typical application build process includes phases like preparing the environment, updating the configuration, downloading dependencies, running unit tests, and finally, packaging the built artifact.

Downloading dependencies is a critical phase in the build process. These dependent files can range in size from a few KBs to multiple MBs. Because most of the dependent files do not change frequently between builds, you can noticeably reduce your build time by caching dependencies.

In this post, I will show you how to enable caching for AWS CodeBuild.

Requirements

  • Create an Amazon S3 bucket for storing cache archives (You can use existing s3 bucket as well).
  • Create a GitHub account (if you don’t have one).

Create a sample build project:

1. Open the AWS CodeBuild console at https://console.aws.amazon.com/codebuild/.

2. If a welcome page is displayed, choose Get started.

If a welcome page is not displayed, on the navigation pane, choose Build projects, and then choose Create project.

3. On the Configure your project page, for Project name, type a name for this build project. Build project names must be unique across each AWS account.

4. In Source: What to build, for Source provider, choose GitHub.

5. In Environment: How to build, for Environment image, select Use an image managed by AWS CodeBuild.

  • For Operating system, choose Ubuntu.
  • For Runtime, choose Java.
  • For Version,  choose aws/codebuild/java:openjdk-8.
  • For Build specification, select Insert build commands.

Note: The build specification file (buildspec.yml) can be configured in two ways. You can package it along with your source root directory, or you can override it by using a project environment configuration. In this example, I will use the override option and will use the console editor to specify the build specification.

6. Under Build commands, click Switch to editor to enter the build specification.

Copy the following text.

version: 0.2

phases:
  build:
    commands:
      - mvn install
      
cache:
  paths:
    - '/root/.m2/**/*'

Note: The cache section in the build specification instructs AWS CodeBuild about the paths to be cached. Like the artifacts section, the cache paths are relative to $CODEBUILD_SRC_DIR and specify the directories to be cached. In this example, Maven stores the downloaded dependencies to the /root/.m2/ folder, but other tools use different folders. For example, pip uses the /root/.cache/pip folder, and Gradle uses the /root/.gradle/caches folder. You might need to configure the cache paths based on your language platform.

7. In Artifacts: Where to put the artifacts from this build project:

  • For Type, choose No artifacts.

8. In Cache:

  • For Type, choose Amazon S3.
  • For Bucket, choose your S3 bucket.
  • For Path prefix, type cache/archives/

9. In Service role, the Create a service role in your account option will display a default role name.  You can accept the default name or type your own.

If you already have an AWS CodeBuild service role, choose Choose an existing service role from your account.

10. Choose Continue.

11. On the Review page, to run a build, choose Save and build.

Review build and cache behavior:

Let us review our first build for the project.

In the first run, where no cache exists, overall build time would look something like below (notice the time for DOWNLOAD_SOURCE, BUILD and POST_BUILD):

If you check the build logs, you will see log entries for dependency downloads. The dependencies are downloaded directly from configured external repositories. At the end of the log, you will see an entry for the cache uploaded to your S3 bucket.

Let’s review the S3 bucket for the cached archive. You’ll see the cache from our first successful build is uploaded to the configured S3 path.

Let’s try another build with the same CodeBuild project. This time the build should pick up the dependencies from the cache.

In the second run, there was a cache hit (cache was generated from the first run):

You’ll notice a few things:

  1. DOWNLOAD_SOURCE took slightly longer. Because, in addition to the source code, this time the build also downloaded the cache from user’s s3 bucket.
  2. BUILD time was faster. As the dependencies didn’t need to get downloaded, but were reused from cache.
  3. POST_BUILD took slightly longer, but was relatively the same.

Overall, build duration was improved with cache.

Best practices for cache

  • By default, the cache archive is encrypted on the server side with the customer’s artifact KMS key.
  • You can expire the cache by manually removing the cache archive from S3. Alternatively, you can expire the cache by using an S3 lifecycle policy.
  • You can override cache behavior by updating the project. You can use the AWS CodeBuild the AWS CodeBuild console, AWS CLI, or AWS SDKs to update the project. You can also invalidate cache setting by using the new InvalidateProjectCache API. This API forces a new InvalidationKey to be generated, ensuring that future builds receive an empty cache. This API does not remove the existing cache, because this could cause inconsistencies with builds currently in flight.
  • The cache can be enabled for any folders in the build environment, but we recommend you only cache dependencies/files that will not change frequently between builds. Also, to avoid unexpected application behavior, don’t cache configuration and sensitive information.

Conclusion

In this blog post, I showed you how to enable and configure cache setting for AWS CodeBuild. As you see, this can save considerable build time. It also improves resiliency by avoiding external network connections to an artifact repository.

I hope you found this post useful. Feel free to leave your feedback or suggestions in the comments.

AWS Developer Tools Expands Integration to Include GitHub

Post Syndicated from Balaji Iyer original https://aws.amazon.com/blogs/devops/aws-developer-tools-expands-integration-to-include-github/

AWS Developer Tools is a set of services that include AWS CodeCommit, AWS CodePipeline, AWS CodeBuild, and AWS CodeDeploy. Together, these services help you securely store and maintain version control of your application’s source code and automatically build, test, and deploy your application to AWS or your on-premises environment. These services are designed to enable developers and IT professionals to rapidly and safely deliver software.

As part of our continued commitment to extend the AWS Developer Tools ecosystem to third-party tools and services, we’re pleased to announce AWS CodeStar and AWS CodeBuild now integrate with GitHub. This will make it easier for GitHub users to set up a continuous integration and continuous delivery toolchain as part of their release process using AWS Developer Tools.

In this post, I will walk through the following:

Prerequisites:

You’ll need an AWS account, a GitHub account, an Amazon EC2 key pair, and administrator-level permissions for AWS Identity and Access Management (IAM), AWS CodeStar, AWS CodeBuild, AWS CodePipeline, Amazon EC2, Amazon S3.

 

Integrating GitHub with AWS CodeStar

AWS CodeStar enables you to quickly develop, build, and deploy applications on AWS. Its unified user interface helps you easily manage your software development activities in one place. With AWS CodeStar, you can set up your entire continuous delivery toolchain in minutes, so you can start releasing code faster.

When AWS CodeStar launched in April of this year, it used AWS CodeCommit as the hosted source repository. You can now choose between AWS CodeCommit or GitHub as the source control service for your CodeStar projects. In addition, your CodeStar project dashboard lets you centrally track GitHub activities, including commits, issues, and pull requests. This makes it easy to manage project activity across the components of your CI/CD toolchain. Adding the GitHub dashboard view will simplify development of your AWS applications.

In this section, I will show you how to use GitHub as the source provider for your CodeStar projects. I’ll also show you how to work with recent commits, issues, and pull requests in the CodeStar dashboard.

Sign in to the AWS Management Console and from the Services menu, choose CodeStar. In the CodeStar console, choose Create a new project. You should see the Choose a project template page.

CodeStar Project

Choose an option by programming language, application category, or AWS service. I am going to choose the Ruby on Rails web application that will be running on Amazon EC2.

On the Project details page, you’ll now see the GitHub option. Type a name for your project, and then choose Connect to GitHub.

Project details

You’ll see a message requesting authorization to connect to your GitHub repository. When prompted, choose Authorize, and then type your GitHub account password.

Authorize

This connects your GitHub identity to AWS CodeStar through OAuth. You can always review your settings by navigating to your GitHub application settings.

Installed GitHub Apps

You’ll see AWS CodeStar is now connected to GitHub:

Create project

You can choose a public or private repository. GitHub offers free accounts for users and organizations working on public and open source projects and paid accounts that offer unlimited private repositories and optional user management and security features.

In this example, I am going to choose the public repository option. Edit the repository description, if you like, and then choose Next.

Review your CodeStar project details, and then choose Create Project. On Choose an Amazon EC2 Key Pair, choose Create Project.

Key Pair

On the Review project details page, you’ll see Edit Amazon EC2 configuration. Choose this link to configure instance type, VPC, and subnet options. AWS CodeStar requires a service role to create and manage AWS resources and IAM permissions. This role will be created for you when you select the AWS CodeStar would like permission to administer AWS resources on your behalf check box.

Choose Create Project. It might take a few minutes to create your project and resources.

Review project details

When you create a CodeStar project, you’re added to the project team as an owner. If this is the first time you’ve used AWS CodeStar, you’ll be asked to provide the following information, which will be shown to others:

  • Your display name.
  • Your email address.

This information is used in your AWS CodeStar user profile. User profiles are not project-specific, but they are limited to a single AWS region. If you are a team member in projects in more than one region, you’ll have to create a user profile in each region.

User settings

User settings

Choose Next. AWS CodeStar will create a GitHub repository with your configuration settings (for example, https://github.com/biyer/ruby-on-rails-service).

When you integrate your integrated development environment (IDE) with AWS CodeStar, you can continue to write and develop code in your preferred environment. The changes you make will be included in the AWS CodeStar project each time you commit and push your code.

IDE

After setting up your IDE, choose Next to go to the CodeStar dashboard. Take a few minutes to familiarize yourself with the dashboard. You can easily track progress across your entire software development process, from your backlog of work items to recent code deployments.

Dashboard

After the application deployment is complete, choose the endpoint that will display the application.

Pipeline

This is what you’ll see when you open the application endpoint:

The Commit history section of the dashboard lists the commits made to the Git repository. If you choose the commit ID or the Open in GitHub option, you can use a hotlink to your GitHub repository.

Commit history

Your AWS CodeStar project dashboard is where you and your team view the status of your project resources, including the latest commits to your project, the state of your continuous delivery pipeline, and the performance of your instances. This information is displayed on tiles that are dedicated to a particular resource. To see more information about any of these resources, choose the details link on the tile. The console for that AWS service will open on the details page for that resource.

Issues

You can also filter issues based on their status and the assigned user.

Filter

AWS CodeBuild Now Supports Building GitHub Pull Requests

CodeBuild is a fully managed build service that compiles source code, runs tests, and produces software packages that are ready to deploy. With CodeBuild, you don’t need to provision, manage, and scale your own build servers. CodeBuild scales continuously and processes multiple builds concurrently, so your builds are not left waiting in a queue. You can use prepackaged build environments to get started quickly or you can create custom build environments that use your own build tools.

We recently announced support for GitHub pull requests in AWS CodeBuild. This functionality makes it easier to collaborate across your team while editing and building your application code with CodeBuild. You can use the AWS CodeBuild or AWS CodePipeline consoles to run AWS CodeBuild. You can also automate the running of AWS CodeBuild by using the AWS Command Line Interface (AWS CLI), the AWS SDKs, or the AWS CodeBuild Plugin for Jenkins.

AWS CodeBuild

In this section, I will show you how to trigger a build in AWS CodeBuild with a pull request from GitHub through webhooks.

Open the AWS CodeBuild console at https://console.aws.amazon.com/codebuild/. Choose Create project. If you already have a CodeBuild project, you can choose Edit project, and then follow along. CodeBuild can connect to AWS CodeCommit, S3, BitBucket, and GitHub to pull source code for builds. For Source provider, choose GitHub, and then choose Connect to GitHub.

Configure

After you’ve successfully linked GitHub and your CodeBuild project, you can choose a repository in your GitHub account. CodeBuild also supports connections to any public repository. You can review your settings by navigating to your GitHub application settings.

GitHub Apps

On Source: What to Build, for Webhook, select the Rebuild every time a code change is pushed to this repository check box.

Note: You can select this option only if, under Repository, you chose Use a repository in my account.

Source

In Environment: How to build, for Environment image, select Use an image managed by AWS CodeBuild. For Operating system, choose Ubuntu. For Runtime, choose Base. For Version, choose the latest available version. For Build specification, you can provide a collection of build commands and related settings, in YAML format (buildspec.yml) or you can override the build spec by inserting build commands directly in the console. AWS CodeBuild uses these commands to run a build. In this example, the output is the string “hello.”

Environment

On Artifacts: Where to put the artifacts from this build project, for Type, choose No artifacts. (This is also the type to choose if you are just running tests or pushing a Docker image to Amazon ECR.) You also need an AWS CodeBuild service role so that AWS CodeBuild can interact with dependent AWS services on your behalf. Unless you already have a role, choose Create a role, and for Role name, type a name for your role.

Artifacts

In this example, leave the advanced settings at their defaults.

If you expand Show advanced settings, you’ll see options for customizing your build, including:

  • A build timeout.
  • A KMS key to encrypt all the artifacts that the builds for this project will use.
  • Options for building a Docker image.
  • Elevated permissions during your build action (for example, accessing Docker inside your build container to build a Dockerfile).
  • Resource options for the build compute type.
  • Environment variables (built-in or custom). For more information, see Create a Build Project in the AWS CodeBuild User Guide.

Advanced settings

You can use the AWS CodeBuild console to create a parameter in Amazon EC2 Systems Manager. Choose Create a parameter, and then follow the instructions in the dialog box. (In that dialog box, for KMS key, you can optionally specify the ARN of an AWS KMS key in your account. Amazon EC2 Systems Manager uses this key to encrypt the parameter’s value during storage and decrypt during retrieval.)

Create parameter

Choose Continue. On the Review page, either choose Save and build or choose Save to run the build later.

Choose Start build. When the build is complete, the Build logs section should display detailed information about the build.

Logs

To demonstrate a pull request, I will fork the repository as a different GitHub user, make commits to the forked repo, check in the changes to a newly created branch, and then open a pull request.

Pull request

As soon as the pull request is submitted, you’ll see CodeBuild start executing the build.

Build

GitHub sends an HTTP POST payload to the webhook’s configured URL (highlighted here), which CodeBuild uses to download the latest source code and execute the build phases.

Build project

If you expand the Show all checks option for the GitHub pull request, you’ll see that CodeBuild has completed the build, all checks have passed, and a deep link is provided in Details, which opens the build history in the CodeBuild console.

Pull request

Summary:

In this post, I showed you how to use GitHub as the source provider for your CodeStar projects and how to work with recent commits, issues, and pull requests in the CodeStar dashboard. I also showed you how you can use GitHub pull requests to automatically trigger a build in AWS CodeBuild — specifically, how this functionality makes it easier to collaborate across your team while editing and building your application code with CodeBuild.


About the author:

Balaji Iyer is an Enterprise Consultant for the Professional Services Team at Amazon Web Services. In this role, he has helped several customers successfully navigate their journey to AWS. His specialties include architecting and implementing highly scalable distributed systems, serverless architectures, large scale migrations, operational security, and leading strategic AWS initiatives. Before he joined Amazon, Balaji spent more than a decade building operating systems, big data analytics solutions, mobile services, and web applications. In his spare time, he enjoys experiencing the great outdoors and spending time with his family.

 

OpenSUSE Leap 42.3 released

Post Syndicated from corbet original https://lwn.net/Articles/728881/rss

OpenSUSE
Leap 42.3
is now available. “After basing openSUSE Leap on SLE
(SUSE Linux Enterprise) and adding more source code to Leap 42.2 from SLE
12, Leap 42.3 adds even more packages from SLE 12 SP 3 and synchronizes
several common packages. The shared codebase allows for openSUSE Leap 42.3
to receive enhanced maintenance and bug fixes from both the openSUSE
community and SUSE engineers.
” There is quite a bit of new stuff in
this release; see this
page
for some details.

Introducing Amazon CloudWatch Logs Integration for AWS OpsWorks Stacks

Post Syndicated from Kai Rubarth original https://aws.amazon.com/blogs/devops/introducing-amazon-cloudwatch-logs-integration-for-aws-opsworks-stacks/

AWS OpsWorks Stacks now supports Amazon CloudWatch Logs. This benefits all users who want to stream their log files from OpsWorks instances to CloudWatch. This enables you to take advantage of CloudWatch Logs features such as centralized log archival, real-time monitoring of log data, or generating CloudWatch alarms. Until now, OpsWorks customers had to manually install and configure the CloudWatch Logs agent on every instance they wanted to ship log data from. Now, with the built-in CloudWatch Logs integration these features are made available with just a few clicks across all instances in a layer.

The OpsWorks CloudWatch logs integration is compatible with all Chef 11 and Chef 12 Linux stacks. To get started, simply click the new CloudWatch Logs tab in your layer settings screen. OpsWorks agent command logs, which include Chef recipe output, are enabled by default; to enable logging for custom log files, simply input an absolute path or file pattern for every log file you want shipped.

It’s that simple. OpsWorks creates log groups and log streams for you, and starts streaming your system and application logs within a few minutes.

For more information, see Using Amazon CloudWatch Logs with AWS OpsWorks Stacks in the AWS OpsWorks User Guide.

Introducing Git Credentials: A Simple Way to Connect to AWS CodeCommit Repositories Using a Static User Name and Password

Post Syndicated from Ankur Agarwal original https://aws.amazon.com/blogs/devops/introducing-git-credentials-a-simple-way-to-connect-to-aws-codecommit-repositories-using-a-static-user-name-and-password/

Today, AWS is introducing a simplified way to authenticate to your AWS CodeCommit repositories over HTTPS.

With Git credentials, you can generate a static user name and password in the Identity and Access Management (IAM) console that you can use to access AWS CodeCommit repositories from the command line, Git CLI, or any Git tool that supports HTTPS authentication.

Because these are static credentials, they can be cached using the password management tools included in your local operating system or stored in a credential management utility. This allows you to get started with AWS CodeCommit within minutes. You don’t need to download the AWS CLI or configure your Git client to connect to your AWS CodeCommit repository on HTTPS. You can also use the user name and password to connect to the AWS CodeCommit repository from third-party tools that support user name and password authentication, including popular Git GUI clients (such as TowerUI) and IDEs (such as Eclipse, IntelliJ, and Visual Studio).

So, why did we add this feature? Until today, users who wanted to use HTTPS connections were required to configure the AWS credential helper to authenticate their AWS CodeCommit operations. Customers told us our credential helper sometimes interfered with password management tools such as Keychain Access and Windows Vault, which caused authentication failures. Also, many Git GUI tools and IDEs require a static user name and password to connect with remote Git repositories and do not support the credential helper.

In this blog post, I’ll walk you through the steps for creating an AWS CodeCommit repository, generating Git credentials, and setting up CLI access to AWS CodeCommit repositories.


Git Credentials Walkthrough
Let’s say Dave wants to create a repository on AWS CodeCommit and set up local access from his computer.

Prerequisite: If Dave had previously configured his local computer to use the credential helper for AWS CodeCommit, he must edit his .gitconfig file to remove the credential helper information from the file. Additionally, if his local computer is running macOS, he might need to clear any cached credentials from Keychain Access.

With Git credentials, Dave can now create a repository and start using AWS CodeCommit in four simple steps.

Step 1: Make sure the IAM user has the required permissions
Dave must have the following managed policies attached to his IAM user (or their equivalent permissions) before he can set up access to AWS CodeCommit using Git credentials.

  • AWSCodeCommitPowerUser (or an appropriate CodeCommit managed policy)
  • IAMSelfManageServiceSpecificCredentials
  • IAMReadOnlyAccess

Step 2: Create an AWS CodeCommit repository
Next, Dave signs in to the AWS CodeCommit console and create a repository, if he doesn’t have one already. He can choose any repository in his AWS account to which he has access. The instructions to create Git credentials are shown in the help panel. (Choose the Connect button if the instructions are not displayed.) When Dave clicks the IAM user link, the IAM console will open and he can generate the credentials.

GitCred_Blog1

 

Step 3: Create HTTPS Git credentials in the IAM console
On the IAM user page, Dave selects the Security Credentials tab and clicks Generate under HTTPS Git credentials for AWS CodeCommit section. This creates and displays the user name and password. Dave can then download the credentials.

GitCred_Blog2

Note: This is the only time the password is available to view or download.

 

Step 4: Clone the repository on the local machine
On the AWS CodeCommit console page for the repository, Dave chooses Clone URL, and then copy the HTTPS link for cloning the repository. At the command line or terminal, Dave will use the link he just copied to clone the repository. For example, Dave copies:

GitCred_Blog3

 

And then at the command line or terminal, Dave types:

$ git clone https://git-codecommit.us-east-1.amazonaws.com/v1/repos/TestRepo_Dave

When prompted for user name and password, Dave provides the Git credentials (user name and password) he generated in step 3.

Dave is now ready to start pushing his code to the new repository.

Git credentials can be made active or inactive based on your requirements. You can also reset the password if you would like to use the existing username with a new password.

Next Steps

  1. You can optionally cache your credentials using the Git credentials caching command here.
  2. Want to invite a collaborator to work on your AWS CodeCommit repository? Simply create a new IAM user in your AWS account, create Git credentials for that user, and securely share the repository URL and Git credentials with the person you want to collaborate on the repositories.
  3. Connect to any third-party client that supports connecting to remote Git repositories using Git credentials (a stored user name and password). Virtually all tools and IDEs allow you to connect with static credentials. We’ve tested these:
    • Visual Studio (using the default Git plugin)
    • Eclipse IDE (using the default Git plugin)
    • Git Tower UI

For more information, see the AWS CodeCommit documentation.

We are excited to provide this new way of connecting to AWS CodeCommit. We look forward to hearing from you about the many different tools and IDEs you will be able to use with your AWS CodeCommit repositories.

DevOps and Continuous Delivery at re:Invent 2016 – Wrap-up

Post Syndicated from Frank Li original https://aws.amazon.com/blogs/devops/devops-and-continuous-delivery-at-reinvent-2016-wrap-up/

The AWS re:Invent 2016 conference was packed with some exciting announcements and sessions around DevOps and Continuous Delivery. We launched AWS CodeBuild, a fully managed build service that eliminates the need to provision, manage, and scale your own build servers. You now have the ability to run your continuous integration and continuous delivery process entirely on AWS by plugging AWS CodeBuild into AWS CodePipeline, which automates building, testing, and deploying code each time you push a change to your source repository. If you are interested in learning more about AWS CodeBuild, you can sign up for the webinar on January 20th here.

The DevOps track had over 30 different breakout sessions ranging from customer stories to deep dive talks to best practices. If you weren’t able to attend the conference or missed a specific session, here is a link to the entire playlist.

 

There were a number of talks that can help you get started with your own DevOps practices for rapid software delivery. Here are some introductory sessions to give you the proper background:
DEV201: Accelerating Software Delivery with AWS Developer Tools
DEV211: Automated DevOps and Continuous Delivery

After you understand the big picture, you can dive into automating your software delivery. Here are some sessions on how to deploy your applications:
DEV310: Choosing the Right Software Deployment Technique
DEV403: Advanced Continuous Delivery Techniques
DEV404: Develop, Build, Deploy, and Manage Services and Applications

Finally, to maximize your DevOps efficiency, you’ll want to automate the provisioning of your infrastructure. Here are a couple sessions on how to manage your infrastructure:
DEV313: Infrastructure Continuous Delivery Using AWS CloudFormation
DEV319: Automating Cloud Management & Deployment

If you’re a Lambda developer, be sure to watch this session and read this documentation on how to practice continuous delivery for your serverless applications:
SVR307: Application Lifecycle Management in a Serverless World

For all 30+ DevOps sessions, click here.

Python FAQ: How do I port to Python 3?

Post Syndicated from Eevee original https://eev.ee/blog/2016/07/31/python-faq-how-do-i-port-to-python-3/

Part of my Python FAQ, which is doomed to never be finished.

Maybe you have a Python 2 codebase. Maybe you’d like to make it work with Python 3. Maybe you really wish someone would write a comically long article on how to make that happen.

I have good news! You’re already reading one.

(And if you’re not sure why you’d want to use Python 3 in the first place, perhaps you’d be interested in the companion article which delves into exactly that question?)

Don’t be intimidated

This article is quite long, but don’t take that as a sign that this is necessarily a Herculean task. I’m trying to cover every issue I can ever recall running across, which means a lot of small gotchas.

I’ve ported several codebases from Python 2 to Python 2+3, and most of them have gone pretty smoothly. If you have modern Python 2 code that handles Unicode responsibly, you’re already halfway there.

However… if you still haven’t ported by now, almost eight years after Python 3.0 was first released, chances are you have either a lumbering giant of an app or ancient and weird 2.2-era code. Or, perish the thought, a lumbering giant consisting largely of weird 2.2-era code. In that case, you’ll want to clean up the more obvious issues one at a time, then go back and start worrying about actually running parts of your code on Python 3.

On the other hand, if your Python 2 code is pretty small and you’ve just never gotten around to porting, good news! It’s not that bad, and much of the work can be done automatically. Python 3 is ultimately the same language as Python 2, just with some sharp bits filed off.

Making some tough decisions

We say “porting from 2 to 3”, but what we usually mean is “porting code from 2 to both 2 and 3”. That ends up being more difficult (and ugly), since rather than writing either 2 or 3, you have to write the common subset of 2 and 3. As nifty as some of the features in 3 are, you can’t actually use any of them if you have to remain compatible with Python 2.

The first thing you need to do, then, is decide exactly which versions of Python you’re targeting. For 2, your options are:

  • Python 2.5+ is possible, but very difficult, and this post doesn’t really discuss it. Even something as simple as exception handling becomes painful, because the only syntax that works in Python 3 was first introduced in Python 2.6. I wouldn’t recommend doing this.

  • Python 2.6+ used to be fairly common, and is well-tread ground. However, Python 2.6 reached end-of-life in 2013, and some common libraries have been dropping support for it. If you want to preserve Python 2.6 compatibility for the sake of making a library more widely-available, well, I’d urge you to reconsider. If you want to preserve Python 2.6 compatibility because you’re running a proprietary app on it, you should stop reading this right now and go upgrade to 2.7 already.

  • Python 2.7 is the last release of the Python 2 series, but is guaranteed to be supported until at least 2020. The major focus of the release was backporting a lot of minor Python 3 features, making it the best possible target for code that’s meant to run on both 2 and 3.

  • There is, of course, also the choice of dropping Python 2 support, in which case this process will be much easier. Python 2 is still very widely-used, though, so library authors probably won’t want to do this. App authors do have the option, but unless your app is trivial, it’s much easier to maintain Python 2 support during the port — that way you can port iteratively, and the app will still function on Python 2 in the interim, rather than being a 2/3 hybrid that can’t run on either.

Most of this post assumes you’re targeting Python 2.7, though there are mentions of 2.6 as well.

You also have to decide which version of Python 3 to target.

  • Python 3.0 and 3.1 are forgettable. Python 3 was still stabilizing for its first couple minor versions, and from what I hear, compatibility with both 2.7 and 3.0 is a huge pain. Both versions are also past end-of-life.

  • Python 3.2 and 3.3 are a common minimum version to target. Python 3.3 reinstated support for u'...' literals (redundant in Python 3, where normal strings are already Unicode), which makes supporting both 2 and 3 much easier. I bundle it with Python 3.2 because the latest version that stable PyPy supports is 3.2, but it also supports u'...' literals. You’ll support the biggest surface area by targeting that, a sort of 3.2½. (There’s an alpha PyPy supporting 3.3, but as of this writing it’s not released as stable yet.)

  • Python 3.4 and 3.5 add shiny new features, but you can only really use them if you’re dropping support for Python 2. Again, I’d suggest targeting Python 2.7 + Python 3.2½ first, then dropping the Python 2 support and adding whatever later Python 3 trinkets you want.

Another consideration is what attitude you want your final code to take. Do you want Python 2 code with enough band-aids that it also works on Python 3, or Python 3 code that’s carefully written so it still works on Python 2? The differences are subtle! Consider code like x = map(a, b). map returns a list in Python 2, but a lazy iterable in Python 3. Which way do you want to port this code?

1
2
3
4
5
6
7
8
9
# Python 2 style: force eager evaluation, even on Python 3
x = list(map(a, b))

# Python 3 style: use lazy evaluation, even on Python 2
try:
    from future_builtins import map
except ImportError:
    pass
x = map(a, b)

The answer may depend on which Python you primarily use for development, your target audience, or even case-by-case based on how x is used.

Personally, I’d err on the side of preserving Python 3 semantics and porting them to Python 2 when possible. I’m pretty used to Python 3, though, and you or your team might be thrown for a loop by changing Python 2’s behavior.

At the very least, prefer if PY2 to if not PY3. The former stresses that Python 2 is the special case, which is increasingly true going forward. Eventually there’ll be a Python 4, and perhaps even a Python 5, and those future versions will want the “Python 3” behavior.

Some helpful tools

The good news is that you don’t have to do all of this manually.

2to3 is a standard library module (since 2.6) that automatically modifies Python 2 source code to change some common Python 2 constructs to the Python 3 equivalent. (It also doubles as a framework for making arbitrary changes to Python code.)

Unfortunately, it ports 2 to 3, not 2 to 2+3. For libraries, it’s possible to rig 2to3 to run automatically on your code just before it’s installed on Python 3, so you can keep writing Python 2 code — but 2to3 isn’t perfect, and this makes it impossible to develop with your library on Python 3, so Python 3 ends up as a second-class citizen. I wouldn’t recommend it.

The more common approach is to use something like six, a library that wraps many of the runtime differences between 2 and 3, so you can run the same codebase on both 2 and 3.

Of course, that still leaves you making the changes yourself. A more recent innovation is the python-future project, which combines both of the above. It has a future library of renames and backports of Python 3 functionality that goes further than six and is designed to let you write Python 3-esque code that still runs on Python 2. It also includes a futurize script, based on the 2to3 plumbing, that rewrites your code to target 2+3 (using python-future’s library) rather than just 3.

The nice thing about python-future is that it explicitly takes the stance of writing code against Python 3 semantics and backporting them to Python 2. It’s very dedicated to this: it has a future.builtins module that includes not only easy cases like map, but also entire pure-Python reimplementations of types like bytes. (Naturally, this adds some significant overhead as well.) I do like the overall attitude, but I’m not totally sold on all the changes, and you might want to leaf through them to see which ones you like.

futurize isn’t perfect, but it’s probably the best starting point. The 2to3 design splits the various edits into a variety of “fixers” that each make a single style of change, and futurize works the same way, inheriting many of the fixers from 2to3. The nice thing about futurize is that it groups the fixers into “stages”, where stage 1 (futurize --stage1) only makes fairly straightforward changes, like fixing the except syntax. More importantly, it doesn’t add any dependencies on the future library, so it’s useful for making the easy changes even if you’d prefer to use six. You’re also free to choose individual fixes to apply, if you discover that some particular change breaks your code.

Another advantage of this approach is that you can tackle the porting piecemeal, which is great for very large projects. Run one fixer at a time, starting with the very simple ones like updating to except ... as ... syntax, and convince yourself that everything is fine before you do the next one. You can make some serious strides towards 3 compatibility just by eliminating behavior that already has cromulent alternatives in Python 2.

If you expect your Python 3 port to take a very long time — say, if you have a large project with numerous developers and a frantic release schedule — then you might want to prevent older syntax from creeping in with a tool like autopep8, which can automatically fix some deprecated features with a much lighter touch. If you’d like to automatically enforce that, say, from __future__ import absolute_import is at the top of every Python file, that’s a bit beyond the scope of this article, but I’ve had pre-commit + reorder_python_imports thrust upon me in the past to fairly good effect.

Anyway! For each of the issues below, I’ll mention whether futurize can fix it, the name of the responsible fixer, and whether six has anything relevant. If the name of the fixer begins with lib2to3, that means it’s part of the standard library, and you can use it with 2to3 without installing python-future.

Here we go!

Things you shouldn’t even be doing

These are ancient, ancient practices, and even Python 2 programmers may be surprised by them. Some of them are arguably outright bugs in the language; others are just old and forgotten. They generally have equivalents that work even in older versions of Python 2.

Old-style classes

1
2
class Foo:
    ...

In Python 3, this code creates a class that inherits from object. In Python 2, it creates a completely different kind of thing entirely: an “old-style” class, which worked a little differently from built-in types. The differences are generally subtle:

  • Old-style classes don’t support __getattribute__, __slots__

  • Old-style classes don’t correctly support data descriptors, i.e. the assignment behavior of @property.

  • Old-style classes had a __coerce__ method, which would attempt to turn a value into a built-in numeric type before performing a math operation.

  • Old-style classes didn’t use the C3 MRO, so in the case of diamond inheritance, a class could be skipped entirely by super().

  • Old-style instances check the instance for a special method name; new-style instances check the type. Additionally, if a special method isn’t found on an old-style instance, the lookup falls back to __getattr__; this is not the case for new-style classes (which makes proxying more complicated).

That last one is the only thing old-style classes can do that new-style classes cannot, and if you’re relying on it, you have a bit of refactoring to do. (The really curious thing is that there doesn’t seem to be a particularly good reason for the limitation on new-style classes, and it doesn’t even make things faster. Maybe that’ll be fixed in Python 4?)

If you have no idea what any of that means or why you should care, chances are you’re either not using old-style classes at all, or you’re only using them because you forgot to write (object) somewhere. In that case, futurize --stage2 will happily change class Foo: to class Foo(object): for you, using the libpasteurize.fixes.fix_newstyle fixer. (Strictly speaking, this is a Python 2 compatibility issue, since the old syntax still works fine in Python 3 — it just means something else now.)

cmp

Python 2 originally used the C approach for sorting. Given two things A and B, a comparison would produce a negative number if A < B, zero if A == B, and a positive number if A > B. This was the only way to customize sorting; there’s a cmp() built-in function, a __cmp__ special method, and cmp arguments to list.sort() and sorted().

This is a little cumbersome, as you may have noticed if you’ve ever tried to do custom sorting in Perl or JavaScript. Even a case-insensitive sort involves repeating yourself. Most custom sorts will have the same basic structure of cmp(op(a), op(b)), when the only thing you really care about is op.

1
names.sort(cmp=lambda a, b: cmp(a.lower(), b.lower()))

But more importantly, the C approach is flat-out wrong for some types. Consider sets, which use comparison to indicate subsets versus supersets:

1
2
3
4
{1, 2} < {1, 2, 3}  # True
{1, 2, 3} > {1, 2}  # True
{1, 2} < {1, 2}  # False
{1, 2} <= {1, 2}  # True

So what to do with {1, 2} < {3, 4}, where none of the three possible answers is correct?

Early versions of Python 2 added “rich comparisons”, which introduced methods for all six possible comparisons: __eq__, __ne__, __lt__, __le__, __gt__, and __ge__. You’re free to return False for all six, or even True for all six, or return NotImplemented to allow deferring to the other operand. The cmp argument became key instead, which allows mapping the original values to a different item to use for comparison:

1
names.sort(key=lambda a: a.lower())

(This is faster, too, since there are fewer calls to the lambda, fewer calls to .lower(), and no calls to cmp.)


So, fixing all this. Luckily, Python 2 supports all of the new stuff, so you don’t need compatibility hacks.

To replace simple implementations of __cmp__, you need only write the appropriate rich comparison methods. You could even do this the obvious way:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
class Foo(object):
    def __cmp__(self, other):
        return cmp(self.prop, other.prop)

    def __eq__(self, other):
        return self.__cmp__(other) == 0

    def __ne__(self, other):
        return self.__cmp__(other) != 0

    def __lt__(self, other):
        return self.__cmp__(other) < 0

    ...

You would also have to change the use of cmp to a manual if tree, since cmp is gone in Python 3. I don’t recommend this.

A lazier alternative would be to use functools.total_ordering (backported from 3.0 into 2.7), which generates four of the comparison methods, given a class that implements __eq__ and one other:

1
2
3
4
5
6
7
@functools.total_ordering
class Foo(object):
    def __eq__(self, other):
        return self.prop == other.prop

    def __lt__(self, other):
        return self.prop < other.prop

There are a couple problems with this code. For one, it’s still pretty repetitive, accessing .prop four times (and imagine if you wanted to compare several properties). For another, it’ll either cause an error or do entirely the wrong thing if you happen to compare with an object of a different type. You should return NotImplemented in this case, but total_ordering doesn’t handle that correctly until Python 3.4. If those bother you, you might enjoy my own classtools.keyed_ordering, which uses a __key__ method (much like the key argument) to generate all six methods:

1
2
3
4
@classtools.keyed_ordering
class Foo(object):
    def __key__(self):
        return self.prop

Replacing uses of key arguments should be straightforward: a cmp argument of cmp(op(a), op(b)) becomes a key argument of op. If you’re doing something more elaborate, there’s a functools.cmp_to_key function (also backported from 3.0 to 2.7), which converts a cmp function to one usable as a key. (The implementation is much like the first Foo example above: it involves a class that calls the wrapped function from its comparison methods, and returns True or False depending on the return value.)

Finally, if you’re using cmp directly, don’t do that. If you really, really need it for something other than Python’s own sorting, just use an if.

The only help futurize offers is in futurize --stage2, via libfuturize.fixes.fix_cmp, which adds an import of past.builtins.cmp if it detects you’re using the cmp function anywhere.

Comparing incompatible types

Python 2’s use of C-style ordering also means that any two objects, of any types, must be either equal or occur in some defined order. Python’s answer to this problem is to sort on the names of the types. So None < 3 < "1", because "NoneType" < "int" < "str".

Python 3 removes this fallback rule; if two values don’t know how to compare against each other (i.e. both return NotImplemented), you just get a TypeError.

This might affect you in subtle ways, such as if you’re sorting a list of objects that may contain Nones and expecting it to silently work. The fix depends entirely on the type of data you have, and no automated tool can handle that for you. Most likely, you didn’t mean to be sorting a heterogenous list in the first place.

Of course, you could always sort on type(x).__name__, but I don’t know why you would do that.

The sets module

Python 2.3 introduced its set types as Set and ImmutableSet in the sets module. Since Python 2.4, they’ve been built-in types, set and frozenset. The sets module is gone in Python 3, so just use the built-in names.

Creating exceptions

Python 2 allows you to do this:

1
raise RuntimeError, "an error happened at runtime!!"

There’s not really any good reason to do this, since you can just as well do:

1
raise RuntimeError("an error happened at runtime!!")

futurize --stage1 will rewrite the two-arg form to a regular object creation via the libfuturize.fixes.fix_raise fixer. It’ll also fix this alternative way of specifying an exception type, which is so bizarre and obscure that I did not know about it until I read the fixer’s source code:

1
raise (((A, B), C), ...)  # equivalent to `raise A` (?!)

Additionally, exceptions act like sequences in Python 2, but not in Python 3. You can just operate on the .args sequence directly, in either version. Alas, there’s no automated way to fix this.

Backticks

Did you know that `x` is equivalent to repr(x) in Python 2? Yeah, most people don’t. It’s super weird. futurize --stage1 will fix this with the lib2to3.fixes.fix_repr fixer.

has_key

Very old code may still be using somedict.has_key("foo"). "foo" in somedict has worked since Python 2.2. What are you doing. futurize --stage1 will fix this with the lib2to3.fixes.fix_has_key fixer.

<>

<> is equivalent to != in Python 2! This is an ancient, ancient holdover, and there’s no reason to still be using it. futurize --stage1 will fix this with the lib2to3.fixes.fix_ne fixer.

(You could also use from __future__ import barry_as_FLUFL, which restores <> in Python 3. It’s an easter egg. I’m joking. Please don’t actually do this.)

Things with easy Python 2 equivalents

These aren’t necessarily ancient, but they have an alternative you can just as well express in Python 2, so there’s no need to juggle 2 and 3.

Other ancient builtins

apply() is gone. Use the built-in syntax, f(*args, **kwargs).

callable() was briefly gone, but then came back in Python 3.2.

coerce() is gone; it was only used for old-style classes.

execfile() is gone. Read the file and pass its contents to exec() instead.

file() is gone; Python 3 has multiple file types, and a hierarchy of interfaces defined in the io module. Occasionally, code uses this as a synonym for open(), but you should really be using open() anyway.

intern() has been moved into the sys module, though I have no earthly idea why you’d be using it.

raw_input() has been renamed to input(), and the old ludicrous input() is gone. If you really need input(), please stop.

reduce() has been moved into the functools module, but it’s there in Python 2.6 as well.

reload() has been moved into the imp module. It’s unreliable garbage and you shouldn’t be using it anyway.

futurize --stage1 can fix several of these:

  • apply, via lib2to3.fixes.fix_apply
  • intern, via lib2to3.fixes.fix_intern
  • reduce, via lib2to3.fixes.fix_reduce

futurize --stage2 can also fix execfile via the libfuturize.fixes.fix_execfile fixer, which imports past.builtins.execfile. The 2to3 fixer uses an open() call, but the true correct fix is to use a with block.

futurize --stage2 has a couple of fixers for raw_input, but you can just as well import future.builtins.input or six.moves.input.

Nothing can fix coerce, which has no equivalent. Curiously, I don’t see a fixer for file, which is trivially fixed by replacing it with open. Nothing for reload, either.

Catching exceptions

Historically, the way to say “if there’s a ValueError, store it in e and run some code” was:

1
2
3
4
try:
    ...
except ValueError, e:
    ...

Unfortunately, that’s very easy to confuse with the syntax for catching two different types of exception:

1
2
except (ValueError, TypeError):
    ...

If you forget the parentheses, you’ll only catch ValueError, and the exception will be assigned to a variable called, er, TypeError. Whoops!

Python 3.0 introduced clearer syntax, which was also backported to Python 2.6:

1
2
except ValueError as e:
    ...

Python 3.0 finally removed the old syntax, so you must use the as form. futurize --stage1 will fix this with the lib2to3.fixes.fix_except fixer.

As an additional wrinkle, the extra variable e is deleted at the end of the block in Python 3, but not in Python 2. If you really need to refer to it after the block, just assign it to a different name.

(The reason for this is that captured exceptions contain a traceback in Python 3, and tracebacks contain the locals for the current frame, and those locals will contain the captured exception. The resulting cycle would keep all local variables alive until the cycle detector dealt with it, at least in CPython. Scrapping the exception as soon as it’s been dealt with was a simple way to keep this from accidentally happening all over the place. It usually doesn’t make sense to refer to a captured exception after the except block, anyway, since the variable may or may not even exist, and that’s generally weird and bad in Python.)

Octals

It’s not uncommon for a new programmer to try to zero-pad a set of numbers:

1
2
3
4
a = 07
b = 08
c = 09
d = 10

Of course, this will have the rather bizarre result that 08 is a SyntaxError, even though 07 works fine — because numbers starting with a 0 are parsed as octal.

This is a holdover from C, and it’s fairly surprising, since there’s virtually no reason to ever use octal. The only time I can ever remember using it is for passing file modes to chmod.

Python 3.0 requires octal literals to be prefixed with 0o, in line with 0x for hex and 0b for binary; literal integers starting with only a 0 are a syntax error. Python 2.6 supports both forms.

futurize --stage1 will fix this with the lib2to3.fixes.fix_numliterals fixer.

pickle

If you’re using the pickle module (which you shouldn’t be), and you intend to pass pickles back and forth between Python 2 and Python 3, there’s a small issue to be aware of. pickle has several different “protocol” versions, and the default version used in Python 3 is protocol 3, which Python 2 cannot read.

The fix is simple: just find where you’re calling pickle.dump() or pickle.dumps(), and pass a protocol argument of 2. Protocol 2 is the highest version supported by Python 2, and you probably want to be using it anyway, since it’s much more compact and faster to read/write than Python 2’s default, protocol 0.

You may be already using HIGHEST_PROTOCOL, but you’ll have the same problem: the highest protocol supported in any version of Python 3 is unreadable by Python 2.


A somewhat bigger problem is that if you pickle an instance of a user-defined class on Python 2, the pickle will record all its attributes as bytestrings, because that’s what they are in Python 2. Python 3 will then dutifully load the pickle and populate your object’s __dict__ with keys like b'foo'. obj.foo will then not actually exist, because obj.foo looks for the string 'foo', and 'foo' != b'foo' in Python 3.

Don’t use pickle, kids.

It’s possible to fix this, but also a huge pain in the ass. If you don’t know how, you definitely shouldn’t be using pickle.

Things that have a __future__ import

Occasionally, the syntax changed in an incompatible way, but the new syntax was still backported and hidden behind a __future__ import — Python’s mechanism for opting into syntax changes. You have to put such an import at the top of the file, optionally after a docstring, like this:

1
2
"""My super important module."""
from __future__ import with_statement

Ugh! Parentheses! Why, Guido, why?

The reason is that the print statement has incredibly goofy syntax, unlike anything else in the language:

1
print >>a, b, c,

You might not even recognize the >> bit, but it lets you print to a file other than sys.stdout. It’s baked specifically into the print syntax. Python 3 replaces this with a straightforward built-in function with a couple extra bells and whistles. The above would be written:

1
print(b, c, end='', file=a)

It’s slightly more verbose, but it’s also easier to tell what’s going on, and that teeny little comma at the end is now a more obvious keyword argument.

from __future__ import print_function will forget about the print statement for the rest of the file, and make the builtin print function available instead. futurize --stage1 will fix all uses of print and add the __future__ import, with the libfuturize.fixes.fix_print_with_import fixer. (There’s also a 2to3 fixer, but it doesn’t add the __future__ import, since it’s unnecessary in Python 3.)

A word of warning: do not just use print with parentheses without adding the __future__ import. This may appear to work in stock Python 2:

1
print("See, what's the problem?  This works fine!")

However, that’s parsed as the print statement followed by an expression in parentheses. It becomes more obvious if you try to print two values:

1
2
print("The answer is:", 3)
# ("The answer is:", 3)

Now you have a comma inside parentheses, which is a tuple, so the old print statement prints its repr.

Division always produces a float

Quick, what’s the answer here?

1
5 / 2

If you’re a normal human being, you’ll say 2.5 or 2½. Unfortunately, if you’re like Python and have been afflicted by C, you might say the answer is 2, because this is “integer division” — a bizarre and alien concept probably invented because CPUs didn’t have FPUs when C was first invented.

Python 3.0 decided that maybe contorting fundamental arithmetic to match the inadequacies of 1970s hardware is not the best idea, and so it changed division to always produce a float.

Since Python 2.6, from __future__ import division will alter the division operator to always do true division. If you want to do floor division, there’s a separate // operator, which has existed for ages; you can use it in Python 2 with or without the __future__ import.

Note that true division always produces a float, even if the result is integral: 6 / 3 is 2.0. On the other hand, floor division uses the same typing rules as C-style division: 5 // 2 is 2, but 5 // 2.0 is 2.0.

futurize --stage2 will “fix” this with the libfuturize.fixes.fix_division fixer, but unfortunately that just adds the __future__ import. With the --conservative option, it uses the libfuturize.fixes.fix_division_safe fixer instead, which imports past.utils.old_div, a forward-port of Python 2’s division operator.

The trouble here is that the new / always produces a float, and the new // always floors, but the old / sometimes did one and sometimes did the other. futurize can’t just replace all uses of / with //, because 5/2.0 is 2.5 but 5//2.0 is 2.0, and it can’t generally know what types the operands are.

You might be best off fixing this one manually — perhaps using fix_division_safe to find all the places you do division, then changing them to use the right operator.

Of course, the __div__ magic method is gone in Python 3, replaced explicitly by __floordiv__ (//) and __truediv__ (/). Both of those methods already exist in Python 2, and __truediv__ is even called when you use / in the presence of the future import, so being compatible is a simple matter of implementing all three and deferring to one of the others from __div__.

Relative imports

In Python 2, if you’re in the module foo.bar and say import quux, Python will look for a foo.quux before it looks for a top-level quux. The former behavior is called a relative import, though it might be more clearly called a sibling import. It’s troublesome for several reasons.

  • If you have a sibling called quux, and there’s also a top-level or standard library module called quux, you can’t import the latter. (There used to be a py.std module for providing indirect access to the standard library, for this very reason!)

  • If you import the top-level quux module, and then later add a foo.quux module, you’ll suddenly be importing a different module.

  • When reading the source code, it’s not clear which imports are siblings and which are top-level. In fact, the modules you get depend on the module you’re in, so moving or renaming a file may change its imports in non-obvious ways.

Python 3 eliminates this behavior: import quux always means the top-level module. It also adds syntax for “explicit relative” or “absolute relative” (yikes) imports: from . import quux or from .quux import somefunc explicitly means to look for a sibling named quux. (You can also use ..quux to look in the parent package, three dots to look in the grandparent, etc.)

The explicit syntax is supported since Python 2.5. The old sibling behavior can be disabled since Python 2.5 with from __future__ import absolute_import.

futurize --stage1 has a libfuturize.fixes.fix_absolute_import fixer, which attempts to detect sibling imports and convert them to explicit relative imports. If it finds any sibling imports, it’ll also add the __future__ line, though honestly you should make an effort to to put that line in all of your Python 2 code.

It’s possible for the futurize fixer to guess wrong about a sibling import, but in general it works pretty well.

(There is one case I’ve run across where simply replacing import sibling with from . import sibling didn’t work. Unfortunately, it was Yelp code that I no longer have access to, and I can’t remember the precise details. It involved having several sibling imports inside a __init__.py, where the siblings also imported from each other in complex ways. The sibling imports worked, but the explicit relative imports failed, for some really obscure timing reason. It’s even possible this was a 2.6 bug that’s been fixed in 2.7. If you see it, please let me know!)

Things that require some effort

These problems are a little more obscure, but many of them are also more difficult to fix automatically. If you have a massive codebase, these are where the problems start to appear.

The grand module shuffle

A whole bunch of modules were deleted, merged, or removed. A full list is in PEP 3108, but you’ll never have heard of most of them. Here are the ones that might affect you.

  • __builtin__ has been renamed to builtins. Note that this is a module, not the __builtins__ attribute of modules, which is exactly why it was renamed. Incidentally, you should be using the builtins module rather than __builtins__ anyway. Or, wait, no, just don’t use either, please don’t mess with the built-in scope.

  • ConfigParser has been renamed to configparser.

  • Queue has been renamed to queue.

  • SocketServer has been renamed to socketserver.

  • cStringIO and StringIO are gone; instead, use StringIO or BytesIO from the io module. Note that these also exist in Python 2, but are pure-Python rather than the C versions in current Python 3.

  • cPickle is gone. Importing pickle in Python 3 now gives you the C implementation automatically.

  • cProfile is gone. Importing profile in Python 3 gives you the C implementation automatically.

  • copy_reg has been renamed to copyreg.

  • anydbm, dbhash, dbm, dumbdm, gdbm, and whichdb have all been merged into a dbm package.

  • dummy_thread has become _dummy_thread. It’s an implementation of the _thread module that doesn’t actually do any threading. You should be using dummy_threading instead, I guess?

  • httplib has become http.client. BaseHTTPServer, CGIHTTPServer, and SimpleHTTPServer have been merged into a single http.server module. Cookie has become http.cookies. cookielib has become http.cookiejar.

  • repr has been renamed to reprlib. (The module, not the built-in function.)

  • thread has been renamed to _thread, and you should really be using the threading module instead.

  • A whole mess of top-level Tk modules have been combined into a tkinter package.

  • The contents of urllib, urllib2, and urlparse have been consolidated and then split into urllib.error, urllib.parse, and urllib.request.

  • xmlrpclib has become xmlrpc.client. DocXMLRPCServer and SimpleXMLRPCServer have been merged into xmlrpc.server.

futurize --stage2 will fix this with the somewhat invasive libfuturize.fixes.fix_future_standard_library fixer, which uses a mechanism from future that adds aliases to Python 2 to make all the Python 3 standard library names work. It’s an interesting idea, but it didn’t actually work for all cases when I tried it (though now I can’t recall what was broken), so YMMV.

Alternative, you could manually replace any affected imports with imports from six.moves, which provides aliases that work on either version.

Or as a last resort, you can just sprinkle try ... except ImportError around.

Built-in iterators are now lazy

filter, map, range, and zip are all lazy in Python 3. You can still iterate over their return values (once), but if you have code that expects to be able to index them or traverse them more than once, it’ll break in Python 3. (Well, not range, that’s fine.) The lazy equivalents — xrange and the functions in itertools — are of course gone in Python 3.

In either case, the easiest thing to do is force eager evaluation by wrapping the call in list() or tuple(), which you’ll occasionally need to do in Python 3 regardless.

For the sake of consistency, you may want to import the lazy versions from the standard library future_builtins module. It only exists in Python 2, so be sure to wrap the import in a try.

futurize --stage2 tries to address this with several of lib2to3s fixers, but the results aren’t particularly pleasing: calls to all four are unconditionally wrapped in list(), even in an obviously safe case like a for block. I’d just look through your uses of them manually.

A more subtle point: if you pass a string or tuple to Python 2’s filter, the return value will be the same type. Blindly wrapping the call in list() will of course change the behavior. Filtering a string is not a particularly common thing to do, but I’ve seen someone complain about it before, so take note.

Also, Python 3’s map stops at the shortest input sequence, whereas Python 2 extends shorter sequences with Nones. You can fix this with itertools.zip_longest (which in Python 2 is izip_longest!), but honestly, I’ve never even seen anyone pass multiple sequences to map.

Relatedly, dict.iteritems (plus its friends, iterkeys and itervalues) is gone in Python 3, as the plain items (plus keys and values) is already lazy. The dict.view* methods are also gone, as they were only backports of Python 3’s normal behavior.

Both six and future.utils contain functions called iteritems, etc., which provide a lazy iterator in both Python 2 and 3. They also offer view* functions, which are closer to the Python 3 behavior, though I can’t say I’ve ever seen anyone actually use dict.viewitems in real code.

Of course, if you explicitly want a list of dictionary keys (or items or values), list(d) and list(d.items()) do the same thing in both versions.

buffer is gone

The buffer type has been replaced by memoryview (also in Python 2.7), which is similar but not identical. If you’ve even heard of either of these types, you probably know more about the subtleties involved than I do. There’s a lib2to3.fixes.fix_buffer fixer that blindly replaces buffer with memoryview, but futurize doesn’t use it in either stage.

Several special methods were renamed

Where Python 2 has __str__ and __unicode__, Python 3 has __bytes__ and __str__. The trick is that __str__ should return the native str type for each version: a bytestring for Python 2, but a Unicode string for Python 3. Also, you almost certainly don’t want a __bytes__ method in Python 3, where bytes is no longer used for text.

Both six and python-future have a python_2_unicode_compatible class decorator that tries to do the right thing. You write only a single __str__ method that returns a Unicode string. In Python 3, that’s all you need, so the decorator does nothing; in Python 2, the decorator will rename your method to __unicode__ and add a __str__ that returns the same value encoded as UTF-8. If you need different behavior, you’ll have to roll it yourself with if PY2.


Python 2’s next method is more appropriately __next__ in Python 3. The easy way to address this is to call your method __next__, then alias it with next = __next__. Be sure you never call it directly as a method, only with the built-in next() function.

Alternatively, future.builtins contains an alternative next which always calls __next__, but on Python 2, it falls back to trying next if __next__ doesn’t exist.

futurize --stage1 changes all use of obj.next() to next(obj) via the libfuturize.fixes.fix_next_call fixer. futurize --stage2 renames next methods to __next__ via the lib2to3.fixes.fix_next fixer (which also fixes calls). Note that there’s a remote chance of false positives, if for some reason you happened to use next as a regular method name.


Python 2’s __nonzero__ is Python 3’s __bool__. Again, you can just alias it manually. Or futurize --stage2 will rename it with the lib2to3.fixes.fix_nonzero fixer.

Renaming it will of course break it in Python 2, but futurize --stage2 also has a libfuturize.fixes.fix_object fixer that imports python-future’s own builtins.object. The replacement object class has a few methods for making Python 3’s __str__, __next__, and __bool__ work on Python 2.

This is one of the mildly invasive things python-future does, and it may or may not sit well. Up to you.


__long__ is completely gone, as there is no long type in Python 3.

__getslice__, __setslice__, and __delslice__ are gone. Instead, slice objects are passed to __getitem__ and friends. On the off chance you use these, you’ll have to do something clever in the item methods to defer to your slice logic on Python 3.

__oct__ and __hex__ are gone; oct() and hex() now consult __index__. I seriously doubt this will impact anyone.

__div__ is gone, as mentioned previously.

Unbound methods are gone; function attributes renamed

Say you have this useless class.

1
2
3
class Foo(object):
    def bar(self):
        pass

In Python 2, Foo.bar is an “unbound method”, a type that’s generally unseen and unexposed other than as types.MethodType. In Python 3, Foo.bar is just a regular function.

Offhand, I can only think of one time this would matter: if you want to get at attributes on the function, perhaps for the sake of a method decorator. In Python 2, you have to go through the unbound method’s .im_func attribute to get the original function, but in Python 3, you already have the original function and can get the attributes directly.

If you’re doing this anywhere, an easy way to make it work in both versions is:

1
2
method = Foo.bar
method = getattr(method, 'im_func', method)

As for bound methods (the objects you get from accessing methods but not calling them, like [].append), the im_self and im_func attributes have been renamed to __self__ and __func__. Happily, these names also work in Python 2.6, so no compatibility hacks are necessary.

im_class is completely gone in Python 3. Methods have no interest in which class they’re attached to. They can’t, since the same function could easily be attached to more than one class. If you’re relying on im_class somehow, for some reason… well, don’t do that, maybe.

Relatedly, the func_* function attributes have been renamed to dunder names in Python 3, since assigning function attributes is a fairly common practice and Python doesn’t like to clog namespaces with its own builtin names. func_closure, func_code, func_defaults, func_dict, func_doc, func_globals, and func_name are now __closure__, __code__, etc. (Note that func_doc and func_name were already aliases for __doc__ and __name__, and func_defaults is much more easily inspected with the inspect module.) The new names are not available in Python 2, so you’ll need to do a getattr dance, or use the get_function_* functions from six.

Metaclass syntax has changed

In Python 2, a metaclass is declared by assigning to a special name in the class body:

1
2
3
class Foo(object):
    __metaclass__ = FooMeta
    ...

Admittedly, this doesn’t make a lot of sense. The metaclass affects how a class is created, and the class body is evaluated as part of that creation, so this is sort of a goofy hack.

Python 3 changed this, opening the door to a few new neat tricks in the process, which you can find out about in the companion article.

1
2
class Foo(object, metaclass=FooMeta):
    ...

The catch is finding a way to express this idea in both Python 2 and Python 3 — the old syntax is ignored in Python 3, and the new syntax is a syntax error in Python 2.

It’s a bit of a pain, but the class statement is really just a lot of sugar for calling the type() constructor; after all, Python classes are just instances of type. All you have to do is manually create an instance of your metaclass, rather than of type.

Fortunately, other people have already made this work for you. futurize --stage2 will fix this using the libfuturize.fixes.fix_metaclass fixer, which imports future.utils.with_metaclass and produces the following:

1
2
3
4
from future.utils import with_metaclass

class Foo(with_metaclass(object)):
    ...

This creates an intermediate dummy class with the right metaclass, which you then inherit from. Classes use the same metaclass as their parents, so this works fine in any Python.

If you don’t want to depend on python-future, the same function exists in the six module.

Re-raising exceptions has different syntax

raise with no arguments does the same thing in Python 2 and Python 3: it re-raises the exception currently being handled, preserving the original traceback.

The problem comes in with the three-argument form of raise, which is for preserving the traceback while raising a different exception. It might look like this:

1
2
3
4
try:
    some_fragile_function()
except Exception as e:
    raise MyLibraryError, MyLibraryError("Failed to do a thing: " + str(e)), sys.exc_info()[2]

sys.exc_info()[2] is, of course, the only way to get the current traceback in Python 2. You may have noticed that the three arguments to raise are the same three things that sys.exc_info() returns: the type, the value, and the traceback.

Python 3 introduces exception chaining. If something raises an exception from within an except block, Python will remember the original exception, attach it to the new one, and show both exceptions when printing a traceback — including both exceptions’ types, messages, and where they happened. So to wrap and rethrow an exception, you don’t need to do anything special at all.

1
2
3
4
try:
    some_fragile_function()
except Exception:
    raise MyLibraryError("Failed to do a thing")

For more complicated handling, you can also explicitly say raise new_exception from old_exception. Exceptions contain their associated tracebacks as a __traceback__ attribute in Python 3, so there’s no need to muck around getting the traceback manually. If you really want to give an explicit traceback, you can use the .with_traceback() method, which just assigns to __traceback__ and then returns self.

1
raise MyLibraryError("Failed to do a thing").with_traceback(some_traceback)

It’s hard to say what it even means to write code that works “equivalently” in both versions, because Python 3 handles this problem largely automatically, and Python 2 code tends to have a variety of ad-hoc solutions. Note that you cannot simply do this:

1
2
3
4
if PY3:
    raise MyLibraryError("Beep boop") from exc
else:
    raise MyLibraryError, MyLibraryError("Beep boop"), sys.exc_info()[2]

The first raise is a syntax error in Python 2, and the second is a syntax error in Python 3. if won’t protect you from parse errors. (On the other hand, you can hide .with_traceback() behind an if, since that’s just a regular method call and will parse with no issues.)

six has a reraise function that will smooth out the differences for you (probably by using exec). The drawback is that it’s of course Python 2-oriented syntax, and on Python 3 the final traceback will include more context than expected.

Alternatively, there’s a six.raise_from, which is designed around the raise X from Y syntax of Python 3. The drawback is that Python 2 has no obvious equivalent, so you just get raise X, losing the old exception and its traceback.

There’s no clear right approach here; it depends on how you’re handling re-raising. Code that just blindly raises new exceptions doesn’t need any changes, and will get exception chaining for free on Python 3. Code that does more elaborate things, like implementing its own form of chaining or storing exc_info tuples to be re-raised later, may need a little more care.

Bytestrings are sequences of integers

In Python 2, bytes is a synonym for str, the default string type. Iterating or indexing a bytes/str produces 1-character strs.

1
2
3
4
list(b'hello')  # ['h', 'e', 'l', 'l', 'o']
b'hello'[0:4]  # 'hell'
b'hello'[0]  # 'h'
b'hello'[0][0][0][0][0]  # 'h' -- it's turtles all the way down

In Python 3, bytes is a specialized type for handling binary data, not text. As such, iterating or indexing a bytes produces integers.

1
2
3
4
list(b'hello')  # [104, 101, 108, 108, 111]
b'hello'[0:4]  # b'hell'
b'hello'[0]  # 104
b'hello'[0][0][0][0]  # TypeError, since you can't index 104

If you have explicitly binary data that want to be bytes in Python 3, this may pose a bit of a problem. Aside from just checking the version explicitly and making heavy use of chr/ord, there are two approaches.

One is to use bytearray instead. This is like bytes, but mutable. More importantly, since it was introduced as a new type in Python 2.6 — after Python 3.0 came out — it has the same iterating and indexing behavior as Python 3’s bytes, even in Python 2.

1
bytearray(b'hello')[0]  # 104, on either Python 2 or 3

The other is to slice rather than index, since slicing always produces a new iterable of the same type. If you want to extract a single character from a bytes, just take a one-element slice.

1
2
b'hello'[0]  # 104
b'hello'[0:1]  # b'h'

Things that are just a royal pain in the ass

Unicode

Saving the best for last, almost!

Honestly, if your Python 2 code is already careful with Unicode — working with unicode internally, and encoding/decoding only at the “boundaries” of your code — then you shouldn’t have too many problems. If your code is not so careful, you should really try to make it a little more careful before you worry about Python 3, since Python 3’s whole jam is to force you to be careful.

See, in Python 2, you can combine bytestrings (str) and text strings (unicode) more or less freely. Python will automatically try to convert between the two using the “default encoding”, which is generally ascii. Python 3 makes text strings the default string type, demotes bytestrings, and forbids ever converting between them.

Most obviously, Python 2’s str and unicode have been renamed to bytes and str in Python 3. If you happen to be using the names anywhere, you’ll probably need to change them! six offers text_type and binary_type, though you can just use bytes to mean the same thing in either version. python-future also has backports for both Python 3’s bytes and str types, which seems like an extreme approach to me. Changing str to mean a text type even in Python 2 might be a good idea, though.

b'' and u'' work the same way in either Python 2 or 3, but unadorned strings like '' are always the str type, which has different behavior. There is a from __future__ import unicode_literals, which will cause unadorned strings to be unicode in Python 2, and this might work for you. However, this prevents you from writing literal “native” strings — strings of the same type Python uses for names, keyword arguments, etc. Usually this won’t matter, since Python 2 will silently convert between bytes and text, but it’s caused me the occasional problem.

The right thing to do is just explicitly mark every single string with either a b or u sigil as necessary. That just, you know, sucks. But you should be doing it even if you’re not porting to Python 3.

basestring is completely gone in Python 3. str and bytes have no common base type, and their semantics are different enough that it rarely makes sense to treat them the same way. If you’re using basestring in Python 2, it’s probably to allow code to work on either form of “text”, and you’ll only want to use str in Python 3 (where bytes are completely unsuitable for text). six.string_types provides exactly this. futurize --stage2 also runs the lib2to3.fixes.fix_basestring fixer, but this replaces basestring with str, which will almost certainly break your code in Python. If you intend to use stage 2, definitely audit your uses of basestring first.

As mentioned above, bytestrings are sequences of integers, which may affect code trying to work with explicitly binary data.

Python 2 has both .decode() and .encode() on both bytes and text; if you try to encode bytes or decode text, Python will try to implicitly convert to the right type first. In Python 3, only text has an .encode() and only bytes have a .decode().

Relatedly, Python 2 allows you to do some cute tricks with “encodings” that aren’t really encodings; for example, "hi".encode('hex') produces '6869'. In Python 3, encoding must produce bytes, and decoding must produce text, so these sorts of text-to-text or bytes-to-bytes translations aren’t allowed. You can still do them explicitly with the codecs module, e.g. codecs.encode(b'hi', 'hex'), which also works in Python 2, despite being undocumented. (Note that Python 3 specifically requires bytes for the hex codec, alas. If it’s any consolation, there’s a bytes.hex() method to do this directly, which you can’t use anyway if you’re targeting Python 2.)

Python 3’s open decodes as UTF-8 by default (a vast oversimplification, but usually), so if you’re manually decoding after reading, you’ll get an error in Python 3. You could explicitly open the file in binary mode (preserving the Python 2 behavior), or you could use codecs.open to decode transparently on read (preserving the Python 3 behavior). The same goes for writing.

sys.stdin, sys.stdout, and sys.stderr are all text streams in Python 3, so they have the same caveats as above, with the additional wrinkle that you didn’t actually open them yourself. Their .buffer attribute gives a handle opened in binary mode (Python 2 behavior), or you can adapt them to transcode transparently (Python 3 behavior):

1
2
3
4
if six.PY2:
    sys.stdin = codecs.getreader('utf-8')(sys.stdin)
    sys.stdout = codecs.getwriter('utf-8')(sys.stdout)
    sys.stderr = codecs.getwriter('utf-8')(sys.stderr)

A text-mode file’s .tell() in Python 3 still returns a number that can be passed back to .seek(), but the number is not necessarily meaningful, and in particular can’t be used to estimate progress through a file. (Python uses a few very high bits as flags to indicate the state of the decoder; if you mask them off, what’s left is probably the byte position in the file as you’d expect, but this is pretty definitively a hack.)

Python 3 likes to treat filenames as text, but most of the functions in os and os.path will accept either text or bytes as their arguments (and return a value of the same type), so you should be okay there.

os.environ has text keys and values in Python 3. If you direly need bytes, you can use os.environb (and os.getenvb()).

I think that covers most of the obvious basics. This is a whole sprawling topic that I can’t hope to cover off the top of my head. I’ve seen it be both fairly painful and completely pain-free, depending entirely on the state of the Python 2 codebase.

Oh, one final note: there’s a module for Python 2 called unicode-nazi (sorry, I didn’t name it) that will produce a warning anytime a bytestring is implicitly converted to a text string, or vice versa. It might help you root out places you’re accidentally slopping types back and forth, which will certainly break in Python 3. I’ve only tried it on a comically large project where it found thousands of violations, including plenty in surprising places in the standard library, so it may or may not be of any practical help.

Things that are not actually gone

String formatting with %

There’s a widespread belief that str % ... is deprecated, since there’s a newer and shinier str.format() method.

Well, it’s not. It’s not gone; it’s not deprecated; it still works just fine. I don’t like to use it, myself, since it’s easy to make accidentally ambiguous — "%s" % foo can crash if foo is a tuple! — but it’s not going anywhere. In fact, as of Python 3.5, bytes and bytearray support % but not .format.

optparse

argparse is certainly better, but the optparse module still exists in Python 3. It has been deprecated since Python 3.2, though.

Things that are preposterously obscure but that I have seen cause problems nonetheless

Tuple unpacking

A little-used feature of Python 2 is tuple unpacking in function arguments:

1
2
3
4
5
def foo(a, (b, c)):
    print a, b, c

x = (2, 3)
foo(1, x)

This syntax is gone in Python 3. I’ve rarely seen anyone use it, except in two cases. One was a parsing library that relied pretty critically on using it in every parsing function you wrote; whoops.

The other is when sorting a dict’s items:

1
sorted(d.items(), key=lambda (k, v): k + v)

In Python 3, you have to write that as lambda kv: kv[0] + kv[1]. Boo.

long is gone

Python 3 merged its long type with int, so now there’s only one integral type, called int.

Python 2 promotes int to long pretty much transparently, and longs aren’t very common in the first place, so it’s fairly unlikely that this will make a difference. On the off chance you’re type-checking for integers with isinstance(x, (int, long)) (and really, why are you doing that), you can just use six.integer_types instead.

Note that futurize --stage2 applies the lib2to3.fixes.fix_long fixer, which blindly renames long to int, leaving you with inappropriate code like isinstance(x, (int, int)).

However…

I have seen some very obscure cases where a hand-rolled binary protocol would encode ints and longs differently. My advice would be to not do that.

Oh, and a little-known feature of Python 2’s syntax is that you can have long literals by suffixing them with an L:

1
2
123  # int
123L  # long

You can write 1267650600228229401496703205376 directly in Python 2 code, and it’ll automatically create a long, so the only reason to do this is if you explicitly need a long with a small value like 1. If that’s the case, something has gone catastrophically wrong.

repr changes

These should really only affect you if you’re using reprs as expected test output (or, god forbid, as cache keys or something). Some notable changes:

  • Unicode strings have a u prefix in Python 2. In Python 3, of course, Unicode strings are just strings, so there’s no prefix.

  • Conversely, bytestrings have a b prefix in Python 3, but not in Python 2 (though the b prefix is allowed in source code).

  • Python 2 escapes all non-ASCII characters, even in the repr of a Unicode string. Python 3 only escapes control characters and codepoints considered non-printing.

  • Large integers and explicit longs have an L suffix in Python 2, but not in Python 3, where there is no separate long type.

  • A set becomes set([1, 2, 3]) in Python 2, but {1, 2, 3} in Python 3. The set literal syntax is allowed in source code in Python 2.7, but the repr wasn’t changed until 3.0.

  • floats stringify to the shortest possible representation that has the same underlying value — e.g., str(1.1) is '1.1' rather than '1.1000000000000001'. This change was backported to Python 2.7 as well, but I have seen it break tests.

Hash randomization

Python has traditionally had a predictable hashing mechanism: repr(dict(a=1, b=2, c=3)) will always produce the same string. (On the same platform with the same Python version, at least.) Unfortunately this opens the door to an obscure DoS exploit that was known to Perl long ago: if you know a web application is written in Python, you can construct a query string that will become a dict whose keys all go in the same hash bucket. If your query string is long enough and you send enough requests, you can tie up all the Python processes in dealing with hash collisions.

The fix is hash randomization, which seeds the hashing algorithm in such a way that items are bucketed differently every time Python runs. It’s available in Python 2.7 via an environment variable or the -R argument, but it wasn’t turned on by default until Python 3.3.

The fear was that it might break things. Naturally, it has broken things. Mostly, reprs in tests. But it also changes the iteration order of dicts between Python runs. I have seen code using dicts whose keys happened to always be sorted in alphabetical or insertion order before, but with hash randomization, the keys were of course in a different order every time the code ran. The author assumed that Python had somehow broken dict sorting (which it has never had).

nonlocal

Python 3 introduces the nonlocal keyword, which is like global except it looks through all outer scopes in the expected order. It fixes this mild annoyance:

1
2
3
4
5
6
7
def make_function():
    counter = 0
    def function():
        nonlocal counter
        counter += 1  # without 'nonlocal', this declares a new local!
        print("I've been called", counter, "times!")
    return function

The problem is that any use of assignment within a function automatically creates a new local, and locals are known statically for the entire body of the function. (They actually affect how functions are compiled, in CPython.) So without nonlocal, the above code would see counter += 1, but counter is a new local that has never been assigned a value, so Python cannot possibly add 1 to it, and you get an UnboundLocalError.

nonlocal tells Python that when it sees an assignment of a name that exists in some outer scope, it should reuse that outer variable rather than shadowing it. Great, right? Purely a new feature. No problem.

Unfortunately, I’ve worked on a codebase that needed this feature in Python 2, and decided to fake it with a class… named nonlocal.

1
2
3
4
5
6
7
def make_function():
    class nonlocal:
        counter = 0
    def function():
        nonlocal.counter += 1  # this alters an outer value in-place, so it's fine
        print("I've been called", counter, "times!")
    return function

The class here is used purely as a dummy container. Assigning to an attribute doesn’t create any locals, because it’s equivalent to a method call, so the operand must already exist. This is a slightly quirky approach, but it works fine.

Except that, of course, nonlocal is a keyword in Python 3, so this becomes complete gibberish. It’s such gibberish that (if I remember correctly) 2to3 actually cannot parse it, even though it’s perfectly valid Python 2 code.

I don’t have a magical fix for this one. Just, uh, don’t name things nonlocal.

List comprehensions no longer leak

Python 2 has the slightly inconsistent behavior that loop variables in a generator expression ((...)) are scoped to the generator expression, but loop variables in a list comprehension ([...]) belong to the enclosing scope.

The only reason is in implementation details: a list comprehension acts like a for loop, which has the same behavior, whereas a generator expression actually creates a generator internally.

Python 3 brings these cases into line: loop variables in list comprehensions (or dict or set comprehensions) are also scoped to the comprehension itself.

I cannot imagine any possible reason why this would affect you negatively, and yet, I can swear I’ve seen it happen. I wish I could remember where, because I’m sure it’s an exciting story.

cStringIO.h is gone

cStringIO.h is a private and undocumented C interface to Python 2’s cStringIO.StringIO type. It was removed in Python 3, or at least is somewhere I can’t find it.

This was one of the reasons Thrift’s Python 3 port took almost 3 years: Thrift has a “fast” C module that makes use of this private interface, and it’s not obvious how to replace it. I think they ended up just having the module not exist on Python 3, so Python 3 will just be mysteriously slower.

Some troublesome libraries

MySQLdb is some ancient, clunky, noncompliant, underdocumented trash, much like the database it connects to. It’s nigh abandoned, though it still promises Python 3 support in the MySQLdb 2.0 vaporware. I would suggest not using MySQL, but barring that, try mysqlclient, a fork of MySQLdb that continues development and adds Python 3 support. (The same people also maintain an earlier project, pymysql, which strives to be a pure-Python drop-in replacement for MySQLdb — it’s not quite perfect, but its existence is interesting and it’s sure easier to read than MySQLdb.)

At a glance, Thrift still hasn’t had a release since it merged Python 3 support, eight months ago. It’s some enterprise nightmare, anyway, and bizarrely does code generation for a bunch of dynamic languages. Might I suggest just using the pure-Python thriftpy, which parses Thrift definitions on the fly?

Twisted is, ah, large and complex. Parts of it now support Python 3; parts of it do not. If you need the parts that don’t, well, maybe you could give them a hand?

M2Crypto is working on it, though I’m pretty sure most Python crypto nerds would advise you to use cryptography instead.

And so on

You may find any number of other obscure compatibility problems, just as you might when upgrading from 2.6 to 2.7. The Python community has a lot of clever people willing to help you out, though, and they’ve probably even seen your super duper niche problem before.

Don’t let that, or this list of gotchas in general, dissaude you! Better to start now than later; even fixing an integer division gets you one step closer to having your code run on Python 3 as well.

AWS CodeDeploy Deployments with HashiCorp Consul

Post Syndicated from George Huang original http://blogs.aws.amazon.com/application-management/post/Tx1MURIM5X45IKX/AWS-CodeDeploy-Deployments-with-HashiCorp-Consul

Learn how to use AWS CodeDeploy and HashiCorp Consul together for your application deployments. 

AWS CodeDeploy automates code deployments to Amazon Elastic Compute Cloud (Amazon EC2) and on-premises servers. HashiCorp Consul is an open-source tool providing service discovery and orchestration for modern applications. 

Learn how to get started by visiting the guest post on the AWS Partner Network Blog. You can see a full list of CodeDeploy product integrations by visiting here

AWS CodeDeploy Deployments with HashiCorp Consul

Post Syndicated from George Huang original https://aws.amazon.com/blogs/devops/aws-codedeploy-deployments-with-hashicorp-consul/

Learn how to use AWS CodeDeploy and HashiCorp Consul together for your application deployments. 

AWS CodeDeploy automates code deployments to Amazon Elastic Compute Cloud (Amazon EC2) and on-premises servers. HashiCorp Consul is an open-source tool providing service discovery and orchestration for modern applications. 

Learn how to get started by visiting the guest post on the AWS Partner Network Blog. You can see a full list of CodeDeploy product integrations by visiting here

Using Custom JSON on AWS OpsWorks Layers

Post Syndicated from Daniel Huesch original http://blogs.aws.amazon.com/application-management/post/Tx2064HZ903DH8O/Using-Custom-JSON-on-AWS-OpsWorks-Layers

Custom JSON, which has always been available on AWS OpsWorks stacks and deployments, is now also available as a property on layers in stacks using Chef versions 11.10, 12, and 12.2.

In this post I show how you can use custom JSON to adapt a single Chef cookbook to support different use cases on individual layers. To demonstrate, I use the example of a MongoDB setup with multiple shards.

In OpsWorks, each instance belongs to one or more layers, which in turn make up a stack. You use layers to specify details about which Chef cookbooks are run when the instances are set up and configured, among other things. When your stacks have instances that serve different purposes, you use different cookbooks for each.

Sometimes, however, there are only small differences between the layers and they don’t justify using separate cookbooks. For example, when you have a large MongoDB installation with multiple shards, you would have a layer per shard, as shown in the following figure, but your cookbooks wouldn’t necessarily differ.

custom-json-per-layer-1.png

Let’s assume I’m using the community cookbook for MongoDB. I would configure this cookbook using attributes. The attribute for setting the shard name would be node[:mongodb][:shard_name]. But let’s say that I want to set a certain attribute for any deployment to any instance in a given layer. I would use custom JSON to set the that attribute.

When declared on a stack, custom JSON always applies to all instances, no matter which layer they’re in. Custom JSON declared on a deployment is helpful for one-off deployments with special settings; but, the provided custom JSON doesn’t stick to the instances you deploy to, so a subsequent deployment doesn’t know about custom JSON you might have specified in an earlier deployment.

Custom JSON declared on the layer applies to each instance that belongs to that layer. Like custom JSON declared on the stack, it’s permanently stored and applied to all subsequent deployments. So you just need to edit each layer and set the right shard, as shown in the following figure:

custom-json-per-layer-2.png

During a Chef run, OpsWorks makes custom JSON contents available as attributes. That way the settings are available in the MongoDB cookbook and configure the MongoDB server accordingly. For details about using custom JSON content as an attribute, see our documentation.

Custom JSON declared on the deployment overrides custom JSON declared on the stack. Custom JSON declared on the layer sits in between those two. So you can use it on the layer to override stack settings, and on the deployment to override stack or layer settings.

Using custom JSON gives you a way to tweak a setting for all instances in a given layer without having to affect the entire stack, and without having to provide custom JSON for every deployment.

Using Custom JSON on AWS OpsWorks Layers

Post Syndicated from Daniel Huesch original http://blogs.aws.amazon.com/application-management/post/Tx2064HZ903DH8O/Using-Custom-JSON-on-AWS-OpsWorks-Layers

Custom JSON, which has always been available on AWS OpsWorks stacks and deployments, is now also available as a property on layers in stacks using Chef versions 11.10, 12, and 12.2.

In this post I show how you can use custom JSON to adapt a single Chef cookbook to support different use cases on individual layers. To demonstrate, I use the example of a MongoDB setup with multiple shards.

In OpsWorks, each instance belongs to one or more layers, which in turn make up a stack. You use layers to specify details about which Chef cookbooks are run when the instances are set up and configured, among other things. When your stacks have instances that serve different purposes, you use different cookbooks for each.

Sometimes, however, there are only small differences between the layers and they don’t justify using separate cookbooks. For example, when you have a large MongoDB installation with multiple shards, you would have a layer per shard, as shown in the following figure, but your cookbooks wouldn’t necessarily differ.

custom-json-per-layer-1.png

Let’s assume I’m using the community cookbook for MongoDB. I would configure this cookbook using attributes. The attribute for setting the shard name would be node[:mongodb][:shard_name]. But let’s say that I want to set a certain attribute for any deployment to any instance in a given layer. I would use custom JSON to set the that attribute.

When declared on a stack, custom JSON always applies to all instances, no matter which layer they’re in. Custom JSON declared on a deployment is helpful for one-off deployments with special settings; but, the provided custom JSON doesn’t stick to the instances you deploy to, so a subsequent deployment doesn’t know about custom JSON you might have specified in an earlier deployment.

Custom JSON declared on the layer applies to each instance that belongs to that layer. Like custom JSON declared on the stack, it’s permanently stored and applied to all subsequent deployments. So you just need to edit each layer and set the right shard, as shown in the following figure:

custom-json-per-layer-2.png

During a Chef run, OpsWorks makes custom JSON contents available as attributes. That way the settings are available in the MongoDB cookbook and configure the MongoDB server accordingly. For details about using custom JSON content as an attribute, see our documentation.

Custom JSON declared on the deployment overrides custom JSON declared on the stack. Custom JSON declared on the layer sits in between those two. So you can use it on the layer to override stack settings, and on the deployment to override stack or layer settings.

Using custom JSON gives you a way to tweak a setting for all instances in a given layer without having to affect the entire stack, and without having to provide custom JSON for every deployment.

Continuous Delivery for a PHP Application Using AWS CodePipeline, AWS Elastic Beanstalk, and Solano Labs

Post Syndicated from David Nasi original http://blogs.aws.amazon.com/application-management/post/TxYSRRBH57NP2P/Continuous-Delivery-for-a-PHP-Application-Using-AWS-CodePipeline-AWS-Elastic-Bea

My colleague Joseph Fontes, an AWS Solutions Architect, wrote the guest post below to discuss continuous delivery for a PHP Application Using AWS CodePipeline, AWS Elastic Beanstalk, and Solano Labs.

Solano Labs recently integrated Solano CI with AWS CodePipeline, a continuous delivery service for fast and reliable application updates. Solano Labs provides CI/CD capabilities to a variety of organizations, such as Airbnb, Change.org, and Apptio.  Solano CI is an enterprise-grade, scalable continuous integration (CI) and continuous deployment (CD) SaaS solution.  CodePipeline builds, tests, and deploys your code every time there is a code change, based on release process models that you define. You can now take advantage of the CI/CD capabilities long enjoyed by Solano Labs customers from within CodePipeline and with all of the ease of using the AWS Management Console.

In this post, we demonstrate how to use Solano CI with CodePipeline to test a PHP application using PHPUnit, and then deploy the application to AWS Elastic Beanstalk (Elastic Beanstalk). 

You will learn how to:

Deploy a sample PHP application to Elastic Beanstalk

Create a CD tool chain to push your code to Elastic Beanstalk

Connect your GitHub source repository to CodePipeline

Set up a Solano CI build stage to build your application and perform integration tests

Deploy your tested application to Elastic Beanstalk 

After you have completed these steps, your code will be continuously tested and delivered safely in an automated fashion.

To follow along, you need to set up an account with Solano Labs and have a GitHub account and a repository for the demo. 

1. To create an account with Solano Labs, go to http://docs.solanolabs.com/introduction/.

2. If you don’t already have a GitHub account, create one. Also, create the repository for this demo. For instructions for both, go to

https://help.github.com/articles/signing-up-for-a-new-github-account/

https://help.github.com/articles/creating-a-new-repository/

As a part of the deployment process, you will use the PHPUnit testing framework on the demonstration application we’ve posted to our Elastic Beanstalk environment. For more information about PHPUnit and Solano Labs PHPUnit integration, go to https://phpunit.de/ and http://docs.solanolabs.com/ConfiguringLanguage/php/.

Now, let’s start the demo.

1. Clone the application code from the following location into your Git repository: https://github.com/awslabs/aws-demo-php-simple-app.git.

2. Create the destination application in Elastic Beanstalk by following the instructions at http://docs.aws.amazon.com/gettingstarted/latest/deploy/deploying-with-elastic-beanstalk.html.

You need to choose specific options for your configuration. Instead of Node.js for the Predefined configuration, choose PHP. 

When asked to provide an archive of an application to get started, download the following .zip file to your local machine and upload into the Elastic Beanstalk application.

3. Finish configuring the Elastic Beanstalk application and environment.

4. Ensure that the configuration is running properly by testing it in a web browser.  You should see a page similar to this:

5. Next, create a CodePipeline pipeline to establish the continuous delivery process.  From the AWS Management Console, choose Services, and then, choose AWS CodePipeline.  If this is the first time you’ve created a pipeline, CodePipeline displays the following page:

6. Choose Get started.

7. Enter a name for your pipeline. For this example, we use “solano-eb-build”.   Choose Next step.

8. Now define your source provider.  CodePipeline provides direct integration between GitHub repositories and versioned Amazon S3 locations.  After you define your source, CodePipeline tracks changes committed to the source and performs actions that you will define.

For Source provider, choose GitHub. You may be requested to login to your Github account to proceed.  Under Connect to GitHub, you’ll see a variety of repositories and branches.  Choose the repository and branch that you just created, and then choose Next step.

9. For Build provider, choose Solano CI, and then choose Connect.  You may be requested to login to your Solano CI account.  When you are redirected to the Solano site, confirm the connection between CodePipeline and Solano CI by choosing Connect.  Then, choose Next step on the Create pipeline page.

 

 

10. For Deployment provider, choose AWS Elastic Beanstalk, and then choose the Application name and the Environment name of the Elastic Beanstalk environment you created earlier.  Choose Next step.

An Amazon Identity and Access Management (IAM) role will provide the permissions necessary for CodePipeline to perform the necessary build actions and service calls.  If you already have a role that you want to use with the pipeline push, choose it for Role name. Otherwise, choose Create role to create a role with the sufficient permissions to perform the build and push tasks.  Please review these predefined permissions and then accept.  For information about IAM, see  http://docs.aws.amazon.com/codepipeline/latest/userguide/access-permissions.html. Then choose Next step.

 

11. Review the information and make any necessary changes, and then choose Create pipeline.

You’ll get confirmation that the pipeline’s been created:

12. Now that you have a pipeline and the initial version of the application running, let’s make some changes.  In your Git directory, open the www/index.php file with a text editor.  Change the value of:

$AppName = “Demo Web App”;

to

$AppName = “PHP Web App”;

Save the file and check your changes into Git, as follows.

Index.php prior to alteration:

Index.php after alteration:

Outcome of running the git commit command:

You can see that the name of the application rendered in the web browser has been changed:

13. Check on the status of the build process by viewing the CodePipeline console:

 

 

You can also see the build status on the Solano Labs dashboard and build details page:

 

 

You have leveraged Solano’s built-in capabilities to test the functionality of the updated code.

In the Git repository, there are six tests identified across two files.  If you look at the phpunit directory in the Git working directory, you will see two files, indexTest.php and loadGenTest.php.  loadGenTest.php tests the load generation feature of the demo application.  To test this functionality, you ordinarily need to generate load that would take time to test.  We can take advantage of the Solano CI parallel PHPUnit testing to ensure that the load is spread in a parallel fashion.

PHPUnit test definitions:

These are top three tests performed:

In the next screenshots, you can see the file responsible for PHPUnit test configuration and the solano.yml configuration file that invokes the PHPUnit testing:

This walkthrough demonstrates just a few of the many unique capabilities available when you integrate Solano CI into the CodePipeline build process.  Using Solano CI expands the portfolio of technologies available for use within your AWS CI/CD implementations.  You can expand the capabilities of this one example by pushing unique GitHub branches to different development, testing, and production environments.  You can also leverage other CodePipeline integrations to take advantage of AWS CodeDeploy and a collection of specialized testing services.

Now that you have the build process running, make some changes and observe the process flow from within the CodePipeline console, AWS Elastic Beanstalk console, and the Solano Labs’ Solano CI console.  After you check changes into the GitHub repository that you used for this demonstration, updates are automatically dispatched to your new Elastic Beanstalk application.

 

Continuous Delivery for a PHP Application Using AWS CodePipeline, AWS Elastic Beanstalk, and Solano Labs

Post Syndicated from David Nasi original http://blogs.aws.amazon.com/application-management/post/TxYSRRBH57NP2P/Continuous-Delivery-for-a-PHP-Application-Using-AWS-CodePipeline-AWS-Elastic-Bea

My colleague Joseph Fontes, an AWS Solutions Architect, wrote the guest post below to discuss continuous delivery for a PHP Application Using AWS CodePipeline, AWS Elastic Beanstalk, and Solano Labs.

Solano Labs recently integrated Solano CI with AWS CodePipeline, a continuous delivery service for fast and reliable application updates. Solano Labs provides CI/CD capabilities to a variety of organizations, such as Airbnb, Change.org, and Apptio.  Solano CI is an enterprise-grade, scalable continuous integration (CI) and continuous deployment (CD) SaaS solution.  CodePipeline builds, tests, and deploys your code every time there is a code change, based on release process models that you define. You can now take advantage of the CI/CD capabilities long enjoyed by Solano Labs customers from within CodePipeline and with all of the ease of using the AWS Management Console.

In this post, we demonstrate how to use Solano CI with CodePipeline to test a PHP application using PHPUnit, and then deploy the application to AWS Elastic Beanstalk (Elastic Beanstalk). 

You will learn how to:

Deploy a sample PHP application to Elastic Beanstalk

Create a CD tool chain to push your code to Elastic Beanstalk

Connect your GitHub source repository to CodePipeline

Set up a Solano CI build stage to build your application and perform integration tests

Deploy your tested application to Elastic Beanstalk 

After you have completed these steps, your code will be continuously tested and delivered safely in an automated fashion.

To follow along, you need to set up an account with Solano Labs and have a GitHub account and a repository for the demo. 

1. To create an account with Solano Labs, go to http://docs.solanolabs.com/introduction/.

2. If you don’t already have a GitHub account, create one. Also, create the repository for this demo. For instructions for both, go to

https://help.github.com/articles/signing-up-for-a-new-github-account/

https://help.github.com/articles/creating-a-new-repository/

As a part of the deployment process, you will use the PHPUnit testing framework on the demonstration application we’ve posted to our Elastic Beanstalk environment. For more information about PHPUnit and Solano Labs PHPUnit integration, go to https://phpunit.de/ and http://docs.solanolabs.com/ConfiguringLanguage/php/.

Now, let’s start the demo.

1. Clone the application code from the following location into your Git repository: https://github.com/awslabs/aws-demo-php-simple-app.git.

2. Create the destination application in Elastic Beanstalk by following the instructions at http://docs.aws.amazon.com/gettingstarted/latest/deploy/deploying-with-elastic-beanstalk.html.

You need to choose specific options for your configuration. Instead of Node.js for the Predefined configuration, choose PHP. 

When asked to provide an archive of an application to get started, download the following .zip file to your local machine and upload into the Elastic Beanstalk application.

3. Finish configuring the Elastic Beanstalk application and environment.

4. Ensure that the configuration is running properly by testing it in a web browser.  You should see a page similar to this:

5. Next, create a CodePipeline pipeline to establish the continuous delivery process.  From the AWS Management Console, choose Services, and then, choose AWS CodePipeline.  If this is the first time you’ve created a pipeline, CodePipeline displays the following page:

6. Choose Get started.

7. Enter a name for your pipeline. For this example, we use “solano-eb-build”.   Choose Next step.

8. Now define your source provider.  CodePipeline provides direct integration between GitHub repositories and versioned Amazon S3 locations.  After you define your source, CodePipeline tracks changes committed to the source and performs actions that you will define.

For Source provider, choose GitHub. You may be requested to login to your Github account to proceed.  Under Connect to GitHub, you’ll see a variety of repositories and branches.  Choose the repository and branch that you just created, and then choose Next step.

9. For Build provider, choose Solano CI, and then choose Connect.  You may be requested to login to your Solano CI account.  When you are redirected to the Solano site, confirm the connection between CodePipeline and Solano CI by choosing Connect.  Then, choose Next step on the Create pipeline page.

 

 

10. For Deployment provider, choose AWS Elastic Beanstalk, and then choose the Application name and the Environment name of the Elastic Beanstalk environment you created earlier.  Choose Next step.

An Amazon Identity and Access Management (IAM) role will provide the permissions necessary for CodePipeline to perform the necessary build actions and service calls.  If you already have a role that you want to use with the pipeline push, choose it for Role name. Otherwise, choose Create role to create a role with the sufficient permissions to perform the build and push tasks.  Please review these predefined permissions and then accept.  For information about IAM, see  http://docs.aws.amazon.com/codepipeline/latest/userguide/access-permissions.html. Then choose Next step.

 

11. Review the information and make any necessary changes, and then choose Create pipeline.

You’ll get confirmation that the pipeline’s been created:

12. Now that you have a pipeline and the initial version of the application running, let’s make some changes.  In your Git directory, open the www/index.php file with a text editor.  Change the value of:

$AppName = “Demo Web App”;

to

$AppName = “PHP Web App”;

Save the file and check your changes into Git, as follows.

Index.php prior to alteration:

Index.php after alteration:

Outcome of running the git commit command:

You can see that the name of the application rendered in the web browser has been changed:

13. Check on the status of the build process by viewing the CodePipeline console:

 

 

You can also see the build status on the Solano Labs dashboard and build details page:

 

 

You have leveraged Solano’s built-in capabilities to test the functionality of the updated code.

In the Git repository, there are six tests identified across two files.  If you look at the phpunit directory in the Git working directory, you will see two files, indexTest.php and loadGenTest.php.  loadGenTest.php tests the load generation feature of the demo application.  To test this functionality, you ordinarily need to generate load that would take time to test.  We can take advantage of the Solano CI parallel PHPUnit testing to ensure that the load is spread in a parallel fashion.

PHPUnit test definitions:

These are top three tests performed:

In the next screenshots, you can see the file responsible for PHPUnit test configuration and the solano.yml configuration file that invokes the PHPUnit testing:

This walkthrough demonstrates just a few of the many unique capabilities available when you integrate Solano CI into the CodePipeline build process.  Using Solano CI expands the portfolio of technologies available for use within your AWS CI/CD implementations.  You can expand the capabilities of this one example by pushing unique GitHub branches to different development, testing, and production environments.  You can also leverage other CodePipeline integrations to take advantage of AWS CodeDeploy and a collection of specialized testing services.

Now that you have the build process running, make some changes and observe the process flow from within the CodePipeline console, AWS Elastic Beanstalk console, and the Solano Labs’ Solano CI console.  After you check changes into the GitHub repository that you used for this demonstration, updates are automatically dispatched to your new Elastic Beanstalk application.