Tag Archives: ipv6

2016-11-22 Мрежата на OpenFest 2016

Post Syndicated from Vasil Kolev original https://vasil.ludost.net/blog/?p=3328

Нямахме голяма промяна от миналата година. Разликите бяха следните:

  • Тази година повечето switch-ове в опорната мрежа бяха tplink SG3210, имахме само 2 cisco-та. tplink-овете са по-тихи, по-малки, (буквално) железни и стават за странни deployment-и. Ако имаха и PoE, щяха да са направо невероятни.
  • Имахме още един switch, за NOC-а в мазето (който беше и единствения leaf в мрежата). Тази година стаичката за VOC беше оставена само на видео екипа, а мрежовия се ширеше в едно мазе.
  • Понеже имахме две зали за workshop-и, имахме малко повече user-ски switch-ове, в общи линии от кол и въже;

Ето тазгодишната схема.

С техника ни услужиха пак Светла от netissat (нейния switch вече 3-4 години е в опорната ни мрежа), Стефан Леков (noc switch-а и резервния сървър) и digger (два microtik-а за workshop switch-ове). Останалото беше от мен и initLab.

Голяма част от мрежовия setup беше организирана през github-ски issue-та, та лесно може да видите какво ни се е случвало по време на подготовката.

Тази година имаше повече наши "клиенти" на мрежата – интеркомът на залите беше по IP и имаше пръснати разни странни телефони, които бяха из мрежата. Като цяло wired мрежата не изглежда да се ползва от посетителите, но все повече се оказва полезна за нас.

Пак използвахме за сървърно подстълбищното пространство от другата страна на залата, и съответно имаме снимки преди и след. Не е от най-подходящите – всичко наоколо е дървено, тясно е и отдолу минава тръбата за парното (т.е. сме може би едно от малкото сървърни, които вместо климатик имат парно), но е точно до ел. таблото, далеч е от потоците хора и на достатъчно централна позиция, че да можем да пускаме от него независими трасета.

Тази година преизползвахме кабелите от миналата година и взехме един резервен кашон кабел, та нямахме никакви проблеми с изграждането на мрежата.

За uplink тази година ползвахме същата оптика на NetX, но с гигабитови конвертори и 300mbps, та не усетихме никакъв проблем със свързаността.

Използвахме и същия DL380G5 за сървър/router, като тази година Леков пусна още един такъв като backup. Пак го използвахме да encode-ва 1080p stream-а от зала България, въпреки че май тая година нормалния ни encoder щеше да се справи (за една година софтуерът е понапреднал).

Тази година се наложи да променим номерата на VLAN-ите, понеже една част от AP-тата (едни големи linksys-и) не поддържаха VLAN tag-ове над 64. Съответно адресният ни план изглеждаше по следния начин:

IPv4

id  range           name
10  185.108.141.104/30  external
20  10.20.0.0/24        mgmt
21  10.21.0.0/22        wired
22  10.22.0.0/22        wireless
23  10.23.0.0/24        video
24  10.24.0.0/24        overflow

IPv6

10  2a01:b760:1:2::/120
21  2a01:b760:2:4::/62
22  2a01:b760:2:5::/62

По firewall-а и forced forwarding-а нямахме разлика – пак пуснахме proxy_arp_pvlan за потребителските VLAN-и, филтрирахме 25ти порт и не се допускаше трафик до management/video/overflow VLAN-ите от нормални потребители.

Имахме пълна IPv6 поддръжка в потребителските VLAN-и (wired и wireless), като тази година нямахме проблемът с изчезващият IPv6 за random хора – явно най-накрая странният bug е бил ремонтиран.

Изобщо мрежата беше максимално стабилна и при събрания опит може да планираме догодина повече вътрешно-екипна комуникация върху нея, и всякакви странни екстри (например стационарни телефони, повече монитори с информация, някакви сигнализации, работни станции на рецепции и всякакви весели неща). За сега най-големите потребители са wireless-а и видео екипа.

AWS Storage Update – S3 & Glacier Price Reductions + Additional Retrieval Options for Glacier

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-storage-update-s3-glacier-price-reductions/

Back in 2006, we launched S3 with a revolutionary pay-as-you-go pricing model, with an initial price of 15 cents per GB per month. Over the intervening decade, we reduced the price per GB by 80%, launched S3 in every AWS Region, and enhanced the original one-size-fits-all model with user-driven features such as web site hosting, VPC integration, and IPv6 support, while adding new storage options including S3 Infrequent Access.

Because many AWS customers archive important data for legal, compliance, or other reasons and reference it only infrequently, we launched Glacier in 2012, and then gave you the ability to transition data between S3, S3 Infrequent Access, and Glacier by using lifecycle rules.

Today I have two big pieces of news for you: we are reducing the prices for S3 Standard Storage and for Glacier storage. We are also introducing additional retrieval options for Glacier.

S3 & Glacier Price Reduction
As long-time AWS customers already know, we work relentlessly to reduce our own costs, and to pass the resulting savings along in the form of a steady stream of AWS Price Reductions.

We are reducing the per-GB price for S3 Standard Storage in most AWS regions, effective December 1, 2016. The bill for your December usage will automatically reflect the new, lower prices. Here are the new prices for Standard Storage:

Regions0-50 TB
($ / GB / Month)
51 – 500 TB
($ / GB / Month)
500+ TB
($ / GB / Month)
  • US East (Northern Virginia)
  • US East (Ohio)
  • US West (Oregon)
  • EU (Ireland)

(Reductions range from 23.33% to 23.64%)

 $0.0230$0.0220$0.0210
  • US West (Northern California)

(Reductions range from 20.53% to 21.21%)

 $0.0260$0.0250$0.0240
  • EU (Frankfurt)

(Reductions range from 24.24% to 24.38%)

 $0.0245$0.0235$0.0225
  • Asia Pacific (Singapore)
  • Asia Pacific (Tokyo)
  • Asia Pacific (Sydney)
  • Asia Pacific (Seoul)
  • Asia Pacific (Bombay)

(Reductions range from 16.36% to 28.13%)

 $0.0250$0.0240$0.0230

As you can see from the table above, we are also simplifying the pricing model by consolidating six pricing tiers into three new tiers.

We are also reducing the price of Glacier storage in most AWS Regions. For example, you can now store 1 GB for 1 month in the US East (Northern Virginia), US West (Oregon), or EU (Ireland) Regions for just $0.004 (less than half a cent) per month, a 43% decrease. For reference purposes, this amount of storage cost $0.010 when we launched Glacier in 2012, and $0.007 after our last Glacier price reduction (a 30% decrease).

The lower pricing is a direct result of the scale that comes about when our customers trust us with trillions of objects, but it is just one of the benefits. Based on the feedback that I get when we add new features, the real value of a cloud storage platform is the rapid, steady evolution. Our customers often tell me that they love the fact that we anticipate their needs and respond with new features accordingly.

New Glacier Retrieval Options
Many AWS customers use Amazon Glacier as the archival component of their tiered storage architecture. Glacier allows them to meet compliance requirements (either organizational or regulatory) while allowing them to use any desired amount of cloud-based compute power to process and extract value from the data.

Today we are enhancing Glacier with two new retrieval options for your Glacier data. You can now pay a little bit more to expedite your data retrieval. Alternatively, you can indicate that speed is not of the essence and pay a lower price for retrieval.

We launched Glacier with a pricing model for data retrieval that was based on the amount of data that you had stored in Glacier and the rate at which you retrieved it. While this was an accurate reflection of our own costs to provide the service, it was somewhat difficult to explain. Today we are replacing the rate-based retrieval fees with simpler per-GB pricing.

Our customers in the Media and Entertainment industry archive their TV footage to Glacier. When an emergent situation calls for them to retrieve a specific piece of footage, minutes count and they want fast, cost-effective access to the footage. Healthcare customers are looking for rapid, “while you wait” access to archived medical imagery and genome data; photo archives and companies selling satellite data turn out to have similar requirements. On the other hand, some customers have the ability to plan their retrievals ahead of time, and are perfectly happy to get their data in 5 to 12 hours.

Taking all of this in to account, you can now select one of the following options for retrieving your data from Glacier (The original rate-based retrieval model is no longer applicable):

Standard retrieval is the new name for what Glacier already provides, and is the default for all API-driven retrieval requests. You get your data back in a matter of hours (typically 3 to 5), and pay $0.01 per GB along with $0.05 for every 1,000 requests.

Expedited retrieval addresses the need for “while you wait access.” You can get your data back quickly, with retrieval typically taking 1 to 5 minutes.  If you store (or plan to store) more than 100 TB of data in Glacier and need to make infrequent, yet urgent requests for subsets of your data, this is a great model for you (if you have less data, S3’s Infrequent Access storage class can be a better value). Retrievals cost $0.03 per GB and $0.01 per request.

Retrieval generally takes between 1 and 5 minutes, depending on overall demand. If you need to get your data back in this time frame even in rare situations where demand is exceptionally high, you can provision retrieval capacity. Once you have done this, all Expedited retrievals will automatically be served via your Provisioned capacity. Each unit of Provisioned capacity costs $100 per month and ensures that you can perform at least 3 Expedited Retrievals every 5 minutes, with up to 150 MB/second of retrieval throughput.

Bulk retrieval is a great fit for planned or non-urgent use cases, with retrieval typically taking 5 to 12 hours at a cost of $0.0025 per GB (75% less than for Standard Retrieval) along with $0.025 for every 1,000 requests. Bulk retrievals are perfect when you need to retrieve large amounts of data within a day, and are willing to wait a few extra hours in exchange for a very significant discount.

If you do not specify a retrieval option when you call InitiateJob to retrieve an archive, a Standard Retrieval will be initiated. Your existing jobs will continue to work as expected, and will be charged at the new rate.

To learn more, read about Data Retrieval in the Glacier FAQ.

As always, I am thrilled to be able to share this news with you, and I hope that you are equally excited!

Jeff;

 

AWS Week in Review – October 24, 2016

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-week-in-review-october-24-2016/

Another busy week in AWS-land! Today’s post included submissions from 21 internal and external contributors, along with material from my RSS feeds, my inbox, and other things that come my way. To join in the fun, create (or find) some awesome AWS-related content and submit a pull request!

Monday

October 24

Tuesday

October 25

Wednesday

October 26

Thursday

October 27

Friday

October 28

Saturday

October 29

Sunday

October 30

New & Notable Open Source

  • aws-git-backed-static-website is a Git-backed static website generator powered entirely by AWS.
  • rds-pgbadger fetches log files from an Amazon RDS for PostgreSQL instance and generates a beautiful pgBadger report.
  • aws-lambda-redshift-copy is an AWS Lambda function that automates the copy command in Redshift.
  • VarnishAutoScalingCluster contains code and instructions for setting up a shared, horizontally scalable Varnish cluster that scales up and down using Auto Scaling groups.
  • aws-base-setup contains starter templates for developing AWS CloudFormation-based AWS stacks.
  • terraform_f5 contains Terraform scripts to instantiate a Big IP in AWS.
  • claudia-bot-builder creates chat bots for Facebook, Slack, Skype, Telegram, GroupMe, Kik, and Twilio and deploys them to AWS Lambda in minutes.
  • aws-iam-ssh-auth is a set of scripts used to authenticate users connecting to EC2 via SSH with IAM.
  • go-serverless sets up a go.cd server for serverless application deployment in AWS.
  • awsq is a helper script to run batch jobs on AWS using SQS.
  • respawn generates CloudFormation templates from YAML specifications.

New SlideShare Presentations

New Customer Success Stories

  • AlbemaTV – AbemaTV is an Internet media-services company that operates one of Japan’s leading streaming platforms, FRESH! by AbemaTV. The company built its microservices platform on Amazon EC2 Container Service and uses an Amazon Aurora data store for its write-intensive microservices—such as timelines and chat—and a MySQL database on Amazon RDS for the remaining microservices APIs. By using AWS, AbemaTV has been able to quickly deploy its new platform at scale with minimal engineering effort.
  • Celgene – Celgene uses AWS to enable secure collaboration between internal and external researchers, allow individual scientists to launch hundreds of compute nodes, and reduce the time it takes to do computational jobs from weeks or months to less than a day. Celgene is a global biopharmaceutical company that creates drugs that fight cancer and other diseases and disorders. Celgene runs its high-performance computing research clusters, as well as its research collaboration environment, on AWS.
  • Under Armour – Under Armour can scale its Connected Fitness apps to meet the demands of more than 180 million global users, innovate and deliver new products and features more quickly, and expand internationally by taking advantage of the reliability and high availability of AWS. The company is a global leader in performance footwear, apparel, and equipment. Under Armour runs its growing Connected Fitness app platform on the AWS Cloud.

New YouTube Videos

Upcoming Events

Help Wanted

Stay tuned for next week! In the meantime, follow me on Twitter and subscribe to the RSS feed.

IPv6 Support Update – CloudFront, WAF, and S3 Transfer Acceleration

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/ipv6-support-update-cloudfront-waf-and-s3-transfer-acceleration/

As a follow-up to our recent announcement of IPv6 support for Amazon S3, I am happy to be able to tell you that IPv6 support is now available for Amazon CloudFront, Amazon S3 Transfer Acceleration, and AWS WAF and that all 50+ CloudFront edge locations now support IPv6. We are enabling IPv6 across all of our Autonomous System Networks (ASNs) in a phased rollout that starts today and will extend across all of the networks over the next few weeks.

CloudFront IPv6 Support
You can now enable IPv6 support for individual Amazon CloudFront distributions. Viewers and networks that connect to a CloudFront edge location over IPv6 will automatically be served content over IPv6. Those that connect over IPv4 will continue to work as before. Connections to your origin servers will be made using IPv4.

Newly created distributions are automatically enabled for IPv6; you can modify an existing distribution by checking Enable IPv6 in the console or setting it via the CloudFront API:

Here are a couple of important things to know about this new feature:

  • Alias Records – After you enable IPv6  support for a distribution, the DNS entry for the distribution will be updated to include an AAAA record. If you are using Amazon Route 53 and an alias record to map all or part of your domain to the distribution, you will need to add an AAAA alias to the domain.
  • Log Files – If you have enabled CloudFront Access Logs, IPv6 addresses will start to show up in the c-ip field; make sure that your log processing system knows what to do with them.
  • Trusted Signers -If you make use of Trusted Signers in conjunction with an IP address whitelist, we strongly recommend the use of an IPv4-only distribution for Trusted Signer URLs that have an IP whitelist and a separate, IPv4/IPv6 distribution for the actual content. This model sidesteps an issue that would arise if the signing request arrived over an IPv4 address and was signed as such, only to have the request for the content arrive via a different, IPv6 address that is not on the whitelist.
  • CloudFormation – CloudFormation support is in the works. With today’s launch, distributions that are created from a CloudFormation template will not be enabled for IPv6. If you update an existing stack, the setting will remain as-is for any distributions referenced in the stack..
  • AWS WAF – If you use AWS WAF in conjunction with CloudFront, be sure to update your WebACLs and your IP rulesets as appropriate in order to whitelist or blacklist IPv6 addresses.
  • Forwarded Headers – When you enable IPv6 for a distribution, the X-Forwarded-For header that is presented to the origin will contain an IPv6 address. You need to make sure that the origin is able to process headers of this form.

To learn more, read IPv6 Support for Amazon CloudFront.

AWS WAF IPv6 Support
AWS WAF helps you to protect your applications from application-layer attacks (read New – AWS WAF to learn more).

AWS WAF can now inspect requests that arrive via IPv4 or IPv6 addresses. You can create web ACLs that match IPv6 addresses, as described in Working with IP Match Conditions:

All existing WAF features will work with IPv6 and there will be no visible change in performance. The IPv6 will appear in the Sampled Requests collected and displayed by WAF:

S3 Transfer Acceleration IPv6 Support
This important new S3 feature (read AWS Storage Update – Amazon S3 Transfer Acceleration + Larger Snowballs in More Regions for more info) now has IPv6 support. You can simply switch to the new dual-stack endpoint for your uploads. Simply change:

https://BUCKET.s3-accelerate.amazonaws.com

to

https://BUCKET.s3-accelerate.dualstack.amazonaws.com

Here’s some code that uses the AWS SDK for Java to create a client object and enable dual-stack transfer:

AmazonS3Client s3 = new AmazonS3Client();
s3.setS3ClientOptions(S3ClientOptions.builder().enableDualstack().setAccelerateModeEnabled(true).build());

Most applications and network stacks will prefer IPv6 automatically, and no further configuration should be required. You should plan to take a look at the IAM policies for your buckets in order to make sure that they will work as expected in conjunction with IPv6 addresses.

To learn more, read about Making Requests to Amazon S3 over IPv6.

Don’t Forget to Test
As a reminder, if IPv6 connectivity to any AWS region is limited or non-existent, IPv4 will be used instead. Also, as I noted in my earlier post, the client system can be configured to support IPv6 but connected to a network that is not configured to route IPv6 packets to the Internet. Therefore, we recommend some application-level testing of end-to-end connectivity before you switch to IPv6.


Jeff;

 

Rintel: NetworkManager 1.4: with better privacy and easier to use

Post Syndicated from ris original http://lwn.net/Articles/698287/rss

Lubomir Rintel takes
a look
at new features in NetworkManager 1.4. “It is now possible to randomize the MAC address of Ethernet devices to mitigate possibility of tracking. The users can choose between different policies; use a completely random address, or just use different addresses in different networks. For Wi-Fi devices, the same randomization modes are now supported and does no longer require support from wpa-supplicant.
Also a newly added API for using configuration snapshots that automatically
roll back after a timeout, IPv6 tokenized interface identifiers can be
configured, new features in nmcli, and more are covered. (Thanks
to Paul Wise)

Now Available – IPv6 Support for Amazon S3

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-available-ipv6-support-for-amazon-s3/

As you probably know, every server and device that is connected to the Internet must have a unique IP address. Way back in 1981, RFC 791 (“Internet Protocol”) defined an IP address as a 32-bit entity, with three distinct network and subnet sizes (Classes A, B, and C – essentially large, medium, and small) designed for organizations with requirements for different numbers of IP addresses. In time, this format came to be seen as wasteful and the more flexible CIDR (Classless Inter-Domain Routing) format was standardized and put in to use. The 32-bit entity (commonly known as an IPv4 address) has served the world well, but the continued growth of the Internet means that all available IPv4 addresses will ultimately be assigned and put to use.

In order to accommodate this growth and to pave the way for future developments, networks, devices, and service providers are now in the process of moving to IPv6. With 128 bits per IP address, IPv6 has plenty of address space (according to my rough calculation, 128 bits is enough to give 3.5 billion IP addresses to every one of the 100 octillion or so stars in the universe). While the huge address space is the most obvious benefit of IPv6, there are other more subtle benefits as well. These include extensibility, better support for dynamic address allocation, and additional built-in support for security.

Today I am happy to announce that objects in Amazon S3 buckets are now accessible via IPv6 addresses via new “dual-stack” endpoints. When a DNS lookup is performed on an endpoint of this type, it returns an “A” record with an IPv4 address and an “AAAA” record with an IPv6 address. In most cases the network stack in the client environment will automatically prefer the AAAA record and make a connection using the IPv6 address.

Accessing S3 Content via IPv6
In order to start accessing your content via IPv6, you need to switch to new dual-stack endpoints that look like this:

http://BUCKET.s3.dualstack.REGION.amazonaws.com

or this:

http://s3.dualstack.REGION.amazonaws.com/BUCKET

If you are using the AWS Command Line Interface (CLI) or the AWS Tools for Windows PowerShell you can use the --enabledualstack flag to switch to the dual-stack endpoints.

We are currently updating the AWS SDKs to support the use_dualstack_endpoint setting and expect to push them out to production by the middle of next week. Until then, refer to the developer guide for your SDK to learn how to enable this feature.

Things to Know
Here are some things that you need to know in order to make a smooth transition to IPv6:

Bucket and IAM Policies – If you use policies to grant or restrict access via IP address, update them to include the desired IPv6 ranges before you switch to the new endpoints. If you don’t do this, clients may incorrectly gain or lose access to the AWS resources. Update any policies that exclude access from certain IPv4 addresses by adding the corresponding IPv6 addresses.

IPv6 Connectivity – Because the network stack will prefer an IPv6 address to an IPv4 address, an unusual situation can arise under certain circumstances. The client system can be configured for IPv6 but connected to a network that is not configured to route IPv6 packets to the Internet. Be sure to test for end-to-end connectivity before you switch to the dual-stack endpoints.

Log Entries – Log entries will include the IPv4 or IPv6 address, as appropriate. If you analyze your log files using internal or third-party applications, you should ensure that they are able to recognize and process entries that include an IPv6 address.

S3 Feature Support – IPv6 support is available for all S3 features with the exception of Website Hosting, S3 Transfer Acceleration, and access via BitTorrent.

Region Support – IPv6 support is available in all commercial AWS Regions and in AWS GovCloud (US). It is not available in the China (Beijing) Region.


Jeff;

Weekly roundup: three big things

Post Syndicated from Eevee original https://eev.ee/dev/2016/08/07/weekly-roundup-three-big-things/

August is about video games. Actually, the next three months are about video games. Primary goals and their rough stages:

  1. Draft three chapters of this book
    • August: one chapter (at which point I might start talking about what the book is)
    • September: another chapter
    • October: yet another chapter
  2. Get veekun beta-worthy
    • August: basics of the new schema committed; basics of gen 1 and gen 6 games dumped; skeleton cli and site
    • September: most games dumped; lookup; core pages working; new site in publicly-available beta
    • October: all games dumped; new site design work; extras like search and calculators
  3. Finish Runed Awakening
    • August: working ending; at least one solution to each puzzle; private beta
    • September: alternate solutions; huge wave of prose editing; patreon beta
    • October: fix the mountains of issues people find; finish any remaining illustrations

Yeah, we’ll see how all that goes. I also have some vague secondary goals like “do art” and “release tiny games” and “do Doom stuff” but those are extremely subject to change. Hopefully I can stick to the above three big things for three months.

Anyway, this week:

  • blog: Finished and published posts on why to use Python 3 and how to port to it, plus made numerous suggested edits. Wrote a brief thing about my frustrations with Pokémon Go. And wrote about veekun’s schema woes, which helped me reason through a few lingering thorny problems.

    That might be a record for most things I’ve published within a calendar week.

  • art: I tried an hour of timed (real-life) figure drawings, which was kinda weird. I’ve really lapsed on the daily Pokémon, possibly because I changed up the rules to be an hour for a single painting, and that feels like a huge amount of time (…for something I don’t think will come out very well). I’ll either make a better effort to do them every day, or change the rules again so I stop putting them off.

    I drew Griffin’s Nuzlocke team kind of on a whim? A day-long whim?

  • book: I wrote some preface, which you’re probably supposed to do last, but it helped me figure out the tone of the writing. I’ve mentioned this before regarding previous failed attempts, but writing a book is surprisingly harder than writing a blog post — I can’t quite put my finger on why, but the medium feels completely different and alien, and I’m much more self-conscious about how I write.

    I did get a bit of a chapter written, though. I probably spent much more time wrangling authoring tools into producing something acceptable.

  • doom: I somehow drifted into doing stuff to anachrony again. Apparently I left it in near-shambles, with at least a dozen half-finished things all over the place and few comments about what on Earth I was thinking. I’ve cleaned a lot of them up, figured out how to fix some long-standing irritations, and excised some bad ideas. It’s almost presentable now, and I started building a little contrived demo map that tries to show how some of the things work. Someday I might even use all this for a real map, wow.

  • zdoom: Oops, I also picked up my Lua-in-ZDoom experiment again. After doing some things to C++ that made me feel like a witch, someone recommended Sol, a single-file (10k line…) C++ library for interacting with Lua. It is fucking incredible and makes everything so much easier and the author is on Twitter and fixes things faster than I can bring them up.

    I don’t know how much time I want to devote to this — it is just an experiment — but Sol will make it go preposterously faster. It’s single-handedly made a proof of concept look feasible.

  • ops: I spent half a day fixing microscopic IPv6 problems that have been getting on my nerves for ages.

  • veekun: After publishing the schema post, I went to have a look at where I’d left the new dumper code. I spent a few hours writing rock-solid(-ish) version and language detection, which doesn’t have much to do with the schema but is really important to have.

I just about filled a page in my notebook with all this, which I haven’t done in a while. Feels pretty good! I’m also a quarter through the month already, so I’d better get moving on those three big things.

Python FAQ: Why should I use Python 3?

Post Syndicated from Eevee original https://eev.ee/blog/2016/07/31/python-faq-why-should-i-use-python-3/

Part of my Python FAQ, which is doomed to never be finished.

The short answer is: because it’s the actively-developed version of the language, and you should use it for the same reason you’d use 2.7 instead of 2.6.

If you’re here, I’m guessing that’s not enough. You need something to sweeten the deal. Well, friend, I have got a whole mess of sugar cubes just for you.

And once you’re convinced, you may enjoy the companion article, how to port to Python 3! It also has some more details on the diffences between Python 2 and 3, whereas this article doesn’t focus too much on the features removed in Python 3.

Some background

If you aren’t neck-deep in Python, you might be wondering what the fuss is all about, or why people keep telling you that Python 3 will set your computer on fire. (It won’t.)

Python 2 is a good language, but it comes with some considerable baggage. It has two integer types; it may or may not be built in a way that completely mangles 16/17 of the Unicode space; it has a confusing mix of lazy and eager functional tools; it has a standard library that takes “batteries included” to lengths beyond your wildest imagination; it boasts strong typing, then casually insists that None < 3 < "2"; overall, it’s just full of little dark corners containing weird throwbacks to the days of Python 1.

(If you’re really interested, Nick Coghlan has written an exhaustive treatment of the slightly different question of why Python 3 was created. This post is about why Python 3 is great, so let’s focus on that.)

Fixing these things could break existing code, whereas virtually all code written for 2.0 will still work on 2.7. So Python decided to fix them all at once, producing a not-quite-compatible new version of the language, Python 3.

Nothing like this has really happened with a mainstream programming language before, and it’s been a bit of a bumpy ride since then. Python 3 was (seemingly) designed with the assumption that everyone would just port to Python 3, drop Python 2, and that would be that. Instead, it’s turned out that most libraries want to continue to run on both Python 2 and Python 3, which was considerably difficult to make work at first. Python 2.5 was still in common use at the time, too, and it had none of the helpful backports that showed up in Python 2.6 and 2.7; likewise, Python 3.0 didn’t support u'' strings. Writing code that works on both 2.5 and 3.0 was thus a ridiculous headache.

The porting effort also had a dependency problem: if your library or app depends on library A, which depends on library B, which depends on C, which depends on D… then none of those projects can even think about porting until D’s porting effort is finished. Early days were very slow going.

Now, though, things are looking brighter. Most popular libraries work with Python 3, and those that don’t are working on it. Python 3’s Unicode handling, one of its most contentious changes, has had many of its wrinkles ironed out. Python 2.7 consists largely of backported Python 3 features, making it much simpler to target 2 and 3 with the same code — and both 2.5 and 2.6 are no longer supported.

Don’t get me wrong, Python 2 will still be around for a while. A lot of large applications have been written for Python 2 — think websites like Yelp, YouTube, Reddit, Dropbox — and porting them will take some considerable effort. I happen to know that at least one of those websites was still running 2.6 last year, years after 2.6 had been discontinued, if that tells you anything about the speed of upgrades for big lumbering software.

But if you’re just getting started in Python, or looking to start a new project, there aren’t many reasons not to use Python 3. There are still some, yes — but unless you have one specifically in mind, they probably won’t affect you.

I keep having Python beginners tell me that all they know about Python 3 is that some tutorial tried to ward them away from it for vague reasons. (Which is ridiculous, since especially for beginners, Python 2 and 3 are fundamentally not that different.) Even the #python IRC channel has a few people who react, ah, somewhat passive-aggressively towards mentions of Python 3. Most of the technical hurdles have long since been cleared; it seems like one of the biggest roadblocks now standing in the way of Python 3 adoption is the community’s desire to sabotage itself.

I think that’s a huge shame. Not many people seem to want to stand up for Python 3, either.

Well, here I am, standing up for Python 3. I write all my new code in Python 3 now — because Python 3 is great and you should use it. Here’s why.

Hang on, let’s be real for just a moment

None of this is going to 💥blow your mind💥. It’s just a programming language. I mean, the biggest change to Python 2 in the last decade was probably the addition of the with statement, which is nice, but hardly an earth-shattering innovation. The biggest changes in Python 3 are in the same vein: they should smooth out some points of confusion, help avoid common mistakes, and maybe give you a new toy to play with.

Also, if you’re writing a library that needs to stay compatible with Python 2, you won’t actually be able to use any of this stuff. Sorry. In that case, the best reason to port is so application authors can use this stuff, rather than citing your library as the reason they’re trapped on Python 2 forever. (But hey, if you’re starting a brand new library that will blow everyone’s socks off, do feel free to make it Python 3 exclusive.)

Application authors, on the other hand, can go wild.

Unicode by default

Let’s get the obvious thing out of the way.

In Python 2, there are two string types: str is a sequence of bytes (which I would argue makes it not a string), and unicode is a sequence of Unicode codepoints. A literal string in source code is a str, a bytestring. Reading from a file gives you bytestrings. Source code is assumed ASCII by default. It’s an 8-bit world.

If you happen to be an English speaker, it’s very easy to write Python 2 code that seems to work perfectly, but chokes horribly if fed anything outside of ASCII. The right thing involves carefully specifying encodings everywhere and using u'' for virtually all your literal strings, but that’s very tedious and easily forgotten.

Python 3 reshuffles this to put full Unicode support front and center.

Most obviously, the str type is a real text type, similar to Python 2’s unicode. Literal strings are still str, but now that makes them Unicode strings. All of the “structural” strings — names of types, functions, modules, etc. — are likewise Unicode strings. Accordingly, identifiers are allowed to contain any Unicode “letter” characters. repr() no longer escapes printable Unicode characters, though there’s a new ascii() (and corresponding !a format cast and %a placeholder) that does. Unicode completely pervades the language, for better or worse.

And just for the record: this is way better. It is so much better. It is incredibly better. Do you know how much non-ASCII garbage I type? Every single em dash in this damn post was typed by hand, and Python 2 would merrily choke on them.

Source files are now assumed to be UTF-8 by default, so adding an em dash in a comment will no longer break your production website. (I have seen this happen.) You’re still free to specify another encoding explicitly if you want, using a magic comment.

There is no attempted conversion between bytes and text, as in Python 2; b'a' + 'b' is a TypeError. Some modules require you to know what you’re dealing with: zlib.compress only accepts bytes, because zlib is defined in terms of bytes; json.loads only accepts str, because JSON is defined in terms of Unicode codepoints. Calling str() on some bytes will defer to repr, producing something like "b'hello'". (But see -b and -bb below.) Overall it’s pretty obvious when you’ve mixed bytes with text.

Oh, and two huge problem children are fixed: both the csv module and urllib.parse (formerly urlparse) can handle text. If you’ve never tried to make those work, trust me, this is miraculous.

I/O does its best to make everything Unicode. On Unix, this is a little hokey, since the filesystem is explicitly bytes with no defined encoding; Python will trust the various locale environment variables, which on most systems will make everything UTF-8. The default encoding of text-mode file I/O is derived the same way and thus usually UTF-8. (If it’s not what you expect, run locale and see what you get.) Files opened in binary mode, with a 'b', will still read and write bytes.

Python used to come in “narrow” and “wide” builds, where “narrow” builds actually stored Unicode as UTF-16, and this distinction could leak through to user code in subtle ways. On a narrow build, unichr(0x1F4A3) raises ValueError, and the length of u'💣' is 2. Surprise! Maybe your code will work on someone else’s machine, or maybe it won’t. Python 3.3 eliminated narrow builds.

I think those are the major points. For the most part, you should be able to write code as though encodings don’t exist, and the right thing will happen more often. And the wrong thing will immediately explode in your face. It’s good for you.

If you work with binary data a lot, you might be frowning at me at this point; it was a bit of a second-class citizen in Python 3.0. I think things have improved, though: a number of APIs support both bytes and text, the bytes-to-bytes codec issue has largely been resolved, we have bytes.hex() and bytes.fromhex(), bytes and bytearray both support % now, and so on. They’re listening!

Refs: Python 3.0 release notes; myriad mentions all over the documentation

Backported features

Python 3.0 was released shortly after Python 2.6, and a number of features were then backported to Python 2.7. You can use these if you’re only targeting Python 2.7, but if you were stuck with 2.6 for a long time, you might not have noticed them.

  • Set literals:

    1
    {1, 2, 3}
    
  • Dict and set comprehensions:

    1
    2
    {word.lower() for word in words}
    {value: key for (key, value) in dict_to_invert.items()}
    
  • Multi-with:

    1
    2
    with open("foo") as f1, open("bar") as f2:
        ...
    
  • print is now a function, with a couple bells and whistles added: you can change the delimiter with the sep argument, you can change the terminator to whatever you want (including nothing) with the end argument, and you can force a flush with the flush argument. In Python 2.6 and 2.7, you still have to opt into this with from __future__ import print_function.

  • The string representation of a float now uses the shortest decimal number that has the same underlying value — for example, repr(1.1) was '1.1000000000000001' in Python 2.6, but is just '1.1' in Python 2.7 and 3.1+, because both are represented the same way in a 64-bit float.

  • collections.OrderedDict is a dict-like type that remembers the order of its keys.

    Note that you cannot do OrderedDict(a=1, b=2), because the constructor still receives its keyword arguments in a regular dict, losing the order. You have to pass in a sequence of 2-tuples or assign keys one at a time.

  • collections.Counter is a dict-like type for counting a set of things. It has some pretty handy operations that allow it to be used like a multiset.

  • The entire argparse module is a backport from 3.2.

  • str.format learned a , formatting specifier for numbers, which always uses commas and groups of three digits. This is wrong for many countries, and the correct solution involves using the locale module, but it’s useful for quick output of large numbers.

  • re.sub, re.subn, and re.split accept a flags argument. Minor, but, thank fucking God.

Ref: Python 2.7 release notes

Iteration improvements

Everything is lazy

Python 2 has a lot of pairs of functions that do the same thing, except one is eager and one is lazy: range and xrange, map and itertools.imap, dict.keys and dict.iterkeys, and so on.

Python 3.0 eliminated all of the lazy variants and instead made the default versions lazy. Iterating over them works exactly the same way, but no longer creates an intermediate list — for example, range(1000000000) won’t eat all your RAM. If you need to index them or store them for later, you can just wrap them in list(...).

Even better, the dict methods are now “views“. You can keep them around, and they’ll reflect any changes to the underlying dict. They also act like sets, so you can do a.keys() & b.keys() to get the set of keys that exist in both dicts.

Refs: dictionary view docs; Python 3.0 release notes

Unpacking

Unpacking got a huge boost. You could always do stuff like this in Python 2:

1
a, b, c = range(3)  # a = 0, b = 1, c = 2

Python 3.0 introduces:

1
2
a, b, *c = range(5)  # a = 0, b = 1, c = [2, 3, 4]
a, *b, c = range(5)  # a = 0, b = [1, 2, 3], c = 4

Python 3.5 additionally allows use of the * and ** unpacking operators in literals, or multiple times in function calls:

1
2
3
4
5
print(*range(3), *range(3))  # 0 1 2 0 1 2

x = [*range(3), *range(3)]  # x = [0, 1, 2, 0, 1, 2]
y = {*range(3), *range(3)}  # y = {0, 1, 2}  (it's a set, remember!)
z = {**dict1, **dict2}  # finally, syntax for dict merging!

Refs: Python 3.0 release notes; PEP 3132; Python 3.5 release notes; PEP 448

yield from

yield from is an extension of yield. Where yield produces a single value, yield from yields an entire sequence.

1
2
3
4
5
def flatten(*sequences):
    for seq in sequences:
        yield from seq

list(flatten([1, 2], [3, 4]))  # [1, 2, 3, 4]

Of course, for a simple example like that, you could just do some normal yielding in a for loop. The magic of yield from is that it can also take another generator or other lazy iterable, and it’ll effectively pause the current generator until the given one has been exhausted. It also takes care of passing values back into the generator using .send() or .throw().

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def foo():
    a = yield 1
    b = yield from bar(a)
    print("foo got back", b)
    yield 4

def bar(a):
    print("in bar", a)
    x = yield 2
    y = yield 3
    print("leaving bar")
    return x + y

gen = foo()
val = None
while True:
    try:
        newval = gen.send(val)
    except StopIteration:
        break
    print("yielded", newval)
    val = newval * 10

# yielded 1
# in bar 10
# yielded 2
# yielded 3
# leaving bar
# foo got back 50
# yielded 4

Oh yes, and you can now return a value from a generator. The return value becomes the result of a yield from, or if the caller isn’t using yield from, it’s available as the argument to the StopIteration exception.

A small convenience, perhaps. The real power here isn’t in the use of generators as lazy iterators, but in the use of generators as coroutines.

A coroutine is a function that can “suspend” itself, like yield does, allowing other code to run until the function is resumed. It’s kind of like an alternative to threading, but only one function is actively running at any given time, and that function has to delierately relinquish control (or end) before anything else can run.

Generators could do this already, more or less, but only one stack frame deep. That is, you can yield from a generator to suspend it, but if the generator calls another function, that other function has no way to suspend the generator. This is still useful, but significantly less powerful than the coroutine functionality in e.g. Lua, which lets any function yield anywhere in the call stack.

With yield from, you can create a whole chain of generators that yield from one another, and as soon as the one on the bottom does a regular yield, the entire chain will be suspended.

This laid the groundwork for making the asyncio module possible. I’ll get to that later.

Refs: docs; Python 3.3 release notes; PEP 380

Syntactic sugar

Keyword-only arguments

Python 3.0 introduces “keyword-only” arguments, which must be given by name. As a corollary, you can now accept a list of args and have more arguments afterwards. The full syntax now looks something like this:

1
2
def foo(a, b=None, *args, c=None, d, **kwargs):
    ...

Here, a and d are required, b and c are optional. c and d must be given by name.

1
2
3
4
5
6
7
8
foo(1)                      # TypeError: missing d
foo(1, 2)                   # TypeError: missing d
foo(d=4)                    # TypeError: missing a
foo(1, d=4)                 # a = 1, d = 4
foo(1, 2, d=4)              # a = 1, b = 2, d = 4
foo(1, 2, 3, d=4)           # a = 1, b = 2, args = (3,), d = 4
foo(1, 2, c=3, d=4)         # a = 1, b = 2, c = 3, d = 4
foo(1, b=2, c=3, d=4, e=5)  # a = 1, b = 2, c = 3, d = f, kwargs = {'e': 5}

This is extremely useful for functions with a lot of arguments, functions with boolean arguments, functions that accept *args (or may do so in the future) but also want some options, etc. I use it a lot!

If you want keyword-only arguments, but you don’t want to accept *args, you just leave off the variable name:

1
2
def foo(*, arg=None):
    ...

Refs: Python 3.0 release notes; PEP 3102

Format strings

Python 3.6 (not yet out) will finally bring us string interpolation, more or less, using the str.format() syntax:

1
2
3
a = 0x133
b = 0x352
print(f"The answer is {a + b:04x}.")

It’s pretty much the same as str.format(), except that instead of a position or name, you can give an entire expression. The formatting suffixes with : still work, the special built-in conversions like !r still work, and __format__ is still invoked.

Refs: docs; Python 3.6 release notes; PEP 498

async and friends

Right, so, about coroutines.

Python 3.4 introduced the asyncio module, which offers building blocks for asynchronous I/O (and bringing together the myriad third-party modules that do it already).

The design is based around coroutines, which are really generators using yield from. The idea, as I mentioned above, is that you can create a stack of generators that all suspend at once:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
@coroutine
def foo():
    # do some stuff
    yield from bar()
    # do more stuff

@coroutine
def bar():
    # do some stuff
    response = yield from get_url("https://eev.ee/")
    # do more stuff

When this code calls get_url() (not actually a real function, but see aiohttp), get_url will send a request off into the æther, and then yield. The entire stack of generators — get_url, bar, and foo — will all suspend, and control will return to whatever first called foo, which with asyncio will be an “event loop”.

The event loop’s entire job is to notice that get_url yielded some kind of “I’m doing a network request” thing, remember it, and resume other coroutines in the meantime. (Or just twiddle its thumbs, if there’s nothing else to do.) When a response comes back, the event loop will resume get_url and send it the response. get_url will do some stuff and return it up to bar, who continues on, none the wiser that anything unusual happened.

The magic of this is that you can call get_url several times, and instead of having to wait for each request to completely finish before the next one can even start, you can do other work while you’re waiting. No threads necessary; this is all one thread, with functions cooperatively yielding control when they’re waiting on some external thing to happen.

Now, notice that you do have to use yield from each time you call another coroutine. This is nice in some ways, since it lets you see exactly when and where your function might be suspended out from under you, which can be important in some situations. There are also arguments about why this is bad, and I don’t care about them.

However, yield from is a really weird phrase to be sprinkling all over network-related code. It’s meant for use with iterables, right? Lists and tuples and things. get_url is only one thing. What are we yielding from it? Also, what’s this @coroutine decorator that doesn’t actually do anything?

Python 3.5 smoothed over this nonsense by introducing explicit syntax for these constructs, using new async and await keywords:

1
2
3
4
5
6
7
8
9
async def foo():
    # do some stuff
    await bar()
    # do more stuff

async def bar():
    # do some stuff
    response = await get_url("https://eev.ee/")
    # do more stuff

async def clearly identifies a coroutine, even one that returns immediately. (Before, you’d have a generator with no yield, which isn’t actually a generator, which causes some problems.) await explains what’s actually happening: you’re just waiting for another function to be done.

async for and async with are also available, replacing some particularly clumsy syntax you’d need to use before. And, handily, you can only use any of these things within an async def.

The new syntax comes with corresponding new special methods like __await__, whereas the previous approach required doing weird things with __iter__, which is what yield from ultimately calls.

I could fill a whole post or three with stuff about asyncio, and can’t possibly give it justice in just a few paragraphs. The short version is: there’s built-in syntax for doing network stuff in parallel without threads, and that’s cool.

Refs for asyncio: docs (asyncio); Python 3.4 release notes; PEP 3156

Refs for async and await: docs (await); docs (async); docs (special methods); Python 3.5 release notes; PEP 492

Function annotations

Function arguments and return values can have annotations:

1
2
def foo(a: "hey", b: "what's up") -> "whoa":
    ...

The annotations are accessible via the function’s __annotations__ attribute. They have no special meaning to Python, so you’re free to experiment with them.

Well…

You were free to experiment with them, but the addition of the typing module (mentioned below) has hijacked them for type hints. There’s no clear way to attach a type hint and some other value to the same argument, so you’ll have a tough time making function annotations part of your API.

There’s still no hard requirement that annotations be used exclusively for type hints (and it’s not like Python does anything with type hints, either), but the original PEP suggests it would like that to be the case someday. I guess we’ll see.

If you want to see annotations preserved for other uses as well, it would be a really good idea to do some creative and interesting things with them as soon as possible. Just saying.

Refs: docs; Python 3.0 release notes; PEP 3107

Matrix multiplication

Python 3.5 learned a new infix operator for matrix multiplication, spelled @. It doesn’t do anything for any built-in types, but it’s supported in NumPy. You can implement it yourself with the __matmul__ special method and its r and i variants.

Shh. Don’t tell anyone, but I suspect there are fairly interesting things you could do with an operator called @ — some of which have nothing to do with matrix multiplication at all!

Refs: Python 3.5 release notes; PEP 465

Ellipsis

... is now valid syntax everywhere. It evaluates to the Ellipsis singleton, which does nothing. (This exists in Python 2, too, but it’s only allowed when slicing.)

It’s not of much practical use, but you can use it to indicate an unfinished stub, in a way that’s clearly not intended to be final but will still parse and run:

1
2
3
class ReallyComplexFiddlyThing:
    # fuck it, do this later
    ...

Refs: docs; Python 3.0 release notes

Enhanced exceptions

A slightly annoying property of Python 2’s exception handling is that if you want to do your own error logging, or otherwise need to get at the traceback, you have to use the slightly funky sys.exc_info() API and carry the traceback around separately. As of Python 3.0, exceptions automatically have a __traceback__ attribute, as well as a .with_traceback() method that sets the traceback and returns the exception itself (so you can use it inline).

This makes some APIs a little silly — __exit__ still accepts the exception type and value and traceback, even though all three are readily available from just the exception object itself.

A much more annoying property of Python 2’s exception handling was that custom exception handling would lose track of where the problem actually occurred. Consider the following call stack.

1
2
3
4
5
A
B
C
D
E

Now say an exception happens in E, and it’s caught by code like this in C.

1
2
3
4
try:
    D()
except Exception as e:
    raise CustomError("Failed to call D")

Because this creates and raises a new exception, the traceback will start from this point and not even mention E. The best workaround for this involves manually creating a traceback between C and E, formatting it as a string, and then including that in the error message. Preposterous.

Python 3.0 introduced exception chaining, which allows you to do this:

1
raise CustomError("Failed to call D") from e

Now, if this exception reaches the top level, Python will format it as:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
Traceback (most recent call last):
File C, blah blah
File D, blah blah
File E, blah blah
SomeError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File A, blah blah
File B, blah blah
File C, blah blah
CustomError: Failed to call D

The best part is that you don’t need to explicitly say from e at all — if you do a plain raise while there’s already an active exception, Python will automatically chain them together. Even internal Python exceptions will have this behavior, so a broken exception handler won’t lose the original exception. (In the implicit case, the intermediate text becomes “During handling of the above exception, another exception occurred:”.)

The chained exception is stored on the new exception as either __cause__ (if from an explicit raise ... from) or __context__ (if automatic).

If you direly need to hide the original exception, Python 3.3 introduced raise ... from None.

Speaking of exceptions, the error messages for missing arguments have been improved. Python 2 does this:

1
TypeError: foo() takes exactly 1 argument (0 given)

Python 3 does this:

1
TypeError: foo() missing 1 required positional argument: 'a'

Refs:

Cooler classes

super() with no arguments

You can call super() with no arguments. It Just Works. Hallelujah.

Also, you can call super() with no arguments. That’s so great that I could probably just fill the rest of this article with it and be satisfied.

Did I mention you can call super() with no arguments?

Refs: docs; Python 3.0 release notes; PEP 3135

New metaclass syntax and kwargs for classes

Compared to that, everything else in this section is going to sound really weird and obscure.

For example, __metaclass__ is gone. It’s now a keyword-only argument to the class statement.

1
2
class Foo(metaclass=FooMeta):
    ...

That doesn’t sound like much, right? Just some needless syntax change that makes porting harder, right?? Right??? Haha nope watch this because it’s amazing but it barely gets any mention at all.

1
2
class Foo(metaclass=FooMeta, a=1, b=2, c=3):
    ...

You can include arbitrary keyword arguments in the class statement, and they will be passed along to the metaclass call as keyword arguments. (You have to catch them in both __new__ and __init__, since they always get the same arguments.) (Also, the class statement now has the general syntax of a function call, so you can put *args and **kwargs in it.)

This is pretty slick. Consider SQLAlchemy, which uses a metaclass to let you declare a table with a class.

1
2
3
4
class SomeTable(TableBase):
    __tablename__ = 'some_table'
    id = Column()
    ...

Note that SQLAlchemy has you put the name of the table in the clumsy __tablename__ attribute, which it invented. Why not just name? Well, because then you couldn’t declare a column called name! Any “declarative” metaclass will have the same problem of separating the actual class contents from configuration. Keyword arguments offer an easy way out.

1
2
3
4
# only hypothetical, alas
class SomeTable(TableBase, name='some_table'):
    id = Column()
    ...

Refs: docs; Python 3.0 release notes; PEP 3115

__prepare__

Another new metaclass feature is the introduction of the __prepare__ method.

You may have noticed that the body of a class is just a regular block, which can contain whatever code you want. Before decorators were a thing, you’d actually declare class methods in two stages:

1
2
3
4
class Foo:
    def do_the_thing(cls):
        ...
    do_the_thing = classmethod(do_the_thing)

That’s not magical class-only syntax; that’s just regular code assigning to a variable. You can put ifs and fors and whiles and dels inside a class body, too; you just don’t see it very often because there aren’t very many useful reasons to do it.

A class body is a kind of weird pseudo-scope. It can create locals, and it can read values from outer scopes, but methods don’t see the class body as an outer scope. Once the class body reaches its end, any remaining locals are passed to the type constructor and become the new class’s attributes. (This is why, for example, you can’t refer to a class directly within its own body — the class doesn’t and can’t exist until after the body has executed.)

All of this is to say: __prepare__ is a new hook that returns the dict the class body’s locals go into.

Maybe that doesn’t sound particularly interesting, but consider: the value you return doesn’t have to be an actual dict. It can be anything that understands __setitem__. You could, say, use an OrderedDict, and keep track of the order your attributes were declared. That’s useful for declarative metaclasses, where the order of attributes may be important (consider a C struct).

But you can go further. You might allow more than one attribute of the same name. You might do something special with the attributes as soon as they’re assigned, rather than at the end of the body. You might predeclare some attributes. __prepare__ is passed the class’s kwargs, so you might alter the behavior based on those.

For a nice practical example, consider the new enum module, which I briefly mention later on. One drawback of this module is that you have to specify a value for every variant, since variants are defined as class attributes, which must have a value. There’s an example of automatic numbering, but it still requires assigning a dummy value like (). Clever use of __prepare__ would allow lifting this restriction:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# XXX: Should prefer MutableMapping here, but the ultimate call to type()
# raises a TypeError if you pass a namespace object that doesn't inherit
# from dict!  Boo.
class EnumLocals(dict):
    def __init__(self):
        self.nextval = 1

    def __getitem__(self, key):
        if key not in self and not key.startswith('_') and not key.endswith('_'):
            self[key] = self.nextval
            self.nextval += 1
        return super().__getitem__(key)

class EnumMeta(type):
    @classmethod
    def __prepare__(meta, name, bases):
        return EnumLocals()

class Colors(metaclass=EnumMeta):
    red
    green
    blue

print(Colors.red, Colors.green, Colors.blue)
# 1 2 3

Deciding whether this is a good idea is left as an exercise.

This is an exceptionally obscure feature that gets very little attention — it’s not even mentioned explicitly in the 3.0 release notes — but there’s nothing else like it in the language. Between __prepare__ and keyword arguments, the class statement has transformed into a much more powerful and general tool for creating all kinds of objects. I almost wish it weren’t still called class.

Refs: docs; Python 3.0 release notes; PEP 3115

Attribute definition order

If that’s still too much work, don’t worry: a proposal was just accepted for Python 3.6 that makes this even easier. Now every class will have a __definition_order__ attribute, a tuple listing the names of all the attributes assigned within the class body, in order. (To make this possible, the default return value of __prepare__ will become an OrderedDict, but the __dict__ attribute will remain a regular dict.)

Now you don’t have to do anything at all: you can always check to see what order any class’s attributes were defined in.


Additionally, descriptors can now implement a __set_name__ method. When a class is created, any descriptor implementing the method will have it called with the containing class and the name of the descriptor.

I’m very excited about this, but let me try to back up. A descriptor is a special Python object that can be used to customize how a particular class attribute works. The built-in property decorator is a descriptor.

1
2
3
4
5
6
class MyClass:
    foo = SomeDescriptor()

c = MyClass()
c.foo = 5  # calls SomeDescriptor.__set__!
print(c.foo)  # calls SomeDescriptor.__get__!

This is super cool and can be used for all sorts of DSL-like shenanigans.

Now, most descriptors ultimately want to store a value somewhere, and the obvious place to do that is in the object’s __dict__. Above, SomeDescriptor might want to store its value in c.__dict__['foo'], which is fine since Python will still consult the descriptor first. If that weren’t fine, it could also use the key '_foo', or whatever. It probably wants to use its own name somehow, because otherwise… what would happen if you had two SomeDescriptors in the same class?

Therein lies the problem, and one of my long-running and extremely minor frustrations with Python. Descriptors have no way to know their own name! There are only really two solutions to this:

  1. Require the user to pass the name in as an argument, too: foo = SomeDescriptor('foo'). Blech!

  2. Also have a metaclass (or decorator, or whatever), which can iterate over all the class’s attributes, look for SomeDescriptor objects, and tell them what their names are. Needing a metaclass means you can’t make general-purpose descriptors meant for use in arbitrary classes; a decorator would work, but boy is that clumsy.

Both of these suck and really detract from what could otherwise be very neat-looking syntax trickery.

But now! Now, when MyClass is created, Python will have a look through its attributes. If it sees that the foo object has a __set_name__ method, it’ll call that method automatically, passing it both the owning class and the name 'foo'! Huzzah!

This is so great I am so happy you have no idea.


Lastly, there’s now an __init_subclass__ class method, which is called when the class is subclassed. A great many metaclasses exist just to do a little bit of work for each new subclass; now, you don’t need a metaclass at all in many simple cases. You want a plugin registry? No problem:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
class Plugin:
    _known_plugins = {}

    def __init_subclass__(cls, *, name, **kwargs):
        cls._known_plugins[name] = cls
        super().__init_subclass__(**kwargs)

    @classmethod
    def get_plugin(cls, name):
        return cls._known_plugins[name]

    # ...probably some interface stuff...

class FooPlugin(Plugin, name="foo"):
    ...

No metaclass needed at all.

Again, none of this stuff is available yet, but it’s all slated for Python 3.6, due out in mid-December. I am super pumped.

Refs: docs (customizing class creation); docs (descriptors); Python 3.6 release notes; PEP 520 (attribute definition order); PEP 487 (__init_subclass__ and __set_name__)

Math stuff

int and long have been merged, and there is no longer any useful distinction between small and very large integers. I’ve actually run into code that breaks if you give it 1 instead of 1L, so, good riddance. (Python 3.0 release notes; PEP 237)

The / operator always does “true” division, i.e., gives you a float. If you want floor division, use //. Accordingly, the __div__ magic method is gone; it’s split into two parts, __truediv__ and __floordiv__. (Python 3.0 release notes; PEP 238)

decimal.Decimal, fractions.Fraction, and floats now interoperate a little more nicely: numbers of different types hash to the same value; all three types can be compared with one another; and most notably, the Decimal and Fraction constructors can accept floats directly. (docs (decimal); docs (fractions); Python 3.2 release notes)

math.gcd returns the greatest common divisor of two integers. This existed before, but was in the fractions module, where nobody knew about it. (docs; Python 3.5 release notes)

math.inf is the floating-point infinity value. Previously, this was only available by writing float('inf'). There’s also a math.nan, but let’s not? (docs; Python 3.5 release notes)

math.isclose (and the corresponding complex version, cmath.isclose) determines whether two values are “close enough”. Intended to do the right thing when comparing floats. (docs; Python 3.5 release notes; PEP 485)

More modules

The standard library has seen quite a few improvements. In fact, Python 3.2 was developed with an explicit syntax freeze, so it consists almost entirely of standard library enhancements. There are far more changes across six and a half versions than I can possibly list here; these are the ones that stood out to me.

The module shuffle

Python 2, rather inexplicably, had a number of top-level modules that were named after the single class they contained, CamelCase and all. StringIO and SimpleHTTPServer are two obvious examples. In Python 3, the StringIO class lives in io (along with BytesIO), and SimpleHTTPServer has been renamed to http.server. If you’re anything like me, you’ll find this deeply satisfying.

Wait, wait, there’s a practical upside here. Python 2 had several pairs of modules that did the same thing with the same API, but one was pure Python and one was much faster C: pickle/cPickle, profile/cProfile, and StringIO/cStringIO. I’ve seen code (cough, older versions of Babel, cough) that spent a considerable amount of its startup time reading pickles with the pure Python version, because it did the obvious thing and used the pickle module. Now, these pairs have been merged: importing pickle gives you the faster C implementation, importing profile gives you the faster C implementation, and BytesIO/StringIO are the fast C implementations in the io module.

Refs: docs (sort of); Python 3.0 release notes; PEP 3108 (exhaustive list of removed and renamed modules)

Additions to existing modules

A number of file format modules, like bz2 and gzip, went through some cleanup and modernization in 3.2 through 3.4: some learned a more straightforward open function, some gained better support for the bytes/text split, and several learned to use their file types as context managers (i.e., with with).

collections.ChainMap is a mapping type that consults some number of underlying mappings in order, allowing for a “dict with defaults” without having to merge them together. (docs; Python 3.3 release notes)

configparser dropped its ridiculous distinction between ConfigParser and SafeConfigParser; there is now only ConfigParser, which is safe. The parsed data now preserves order by default and can be read or written using normal mapping syntax. Also there’s a fancier alternative interpolation parser. (docs; Python 3.2 release notes)

contextlib.ContextDecorator is some sort of devilry that allows writing a context manager which can also be used as a decorator. It’s used to implement the @contextmanager decorator, so those can be used as decorators as well. (docs; Python 3.2 release notes)

contextlib.ExitStack offers cleaner and more fine-grained handling of multiple context managers, as well as resources that don’t have their own context manager support. (docs; Python 3.3 release notes)

contextlib.suppress is a context manager that quietly swallows a given type of exception. (docs; Python 3.4 release notes)

contextlib.redirect_stdout is a context manager that replaces sys.stdout for the duration of a block. (docs; Python 3.4 release notes)

datetime.timedelta already existed, of course, but now it supports being multiplied and divided by numbers or divided by other timedeltas. The upshot of this is that timedelta finally, finally has a .total_seconds() method which does exactly what it says on the tin. (docs; Python 3.2 release notes)

datetime.timezone is a new concrete type that can represent fixed offsets from UTC. There has long been a datetime.tzinfo, but it was a useless interface, and you were left to write your own actual class yourself. datetime.timezone.utc is a pre-existing instance that represents UTC, an offset of zero. (docs; Python 3.2 release notes)

functools.lru_cache is a decorator that caches the results of a function, keyed on the arguments. It also offers cache usage statistics and a method for emptying the cache. (docs; Python 3.2 release notes)

functools.partialmethod is like functools.partial, but the resulting object can be used as a descriptor (read: method). (docs; Python 3.4 release notes)

functools.singledispatch allows function overloading, based on the type of the first argument. (docs; Python 3.4 release notes; PEP 443)

functools.total_ordering is a class decorator that allows you to define only __eq__ and __lt__ (or any other) and defines the other comparison methods in terms of them. Note that since Python 3.0, __ne__ is automatically the inverse of __eq__ and doesn’t need defining. Note also that total_ordering doesn’t correctly support NotImplemented until Python 3.4. For an even easier way to do this, consider my classtools.keyed_ordering decorator. (docs; Python 3.2 release notes)

inspect.getattr_static fetches an attribute like getattr but avoids triggering dynamic lookup like @property. (docs; Python 3.2 release notes)

inspect.signature fetches the signature of a function as the new and more featureful Signature object. It also knows to follow the __wrapped__ attribute set by functools.wraps since Python 3.2, so it can see through well-behaved wrapper functions to the “original” signature. (docs; Python 3.3 release notes; PEP 362)

The logging module can, finally, use str.format-style string formatting by passing style='{' to Formatter. (docs; Python 3.2 release notes)

The logging module spits warnings and higher to stderr if logging hasn’t been otherwise configured. This means that if your app doesn’t use logging, but it uses a library that does, you’ll get actual output rather than the completely useless “No handlers could be found for logger ‘foo’”. (docs; Python 3.2 release notes)

os.scandir lists the contents of a directory while avoiding stat calls as much as possible, making it significantly faster. (docs; Python 3.5 release notes; PEP 471)

re.fullmatch checks for a match against the entire input string, not just a substring. (docs; Python 3.4 release notes)

reprlib.recursive_repr is a decorator for __repr__ implementations that can detect recursive calls to the same object and replace them with ..., just like the built-in structures. Believe it or not, reprlib is an existing module, though in Python 2 it was called repr. (docs; Python 3.2 release notes)

shutil.disk_usage returns disk space statistics for a given path with no fuss. (docs; Python 3.3 release notes)

shutil.get_terminal_size tries very hard to detect the size of the terminal window. (docs; Python 3.3 release notes)

subprocess.run is a new streamlined function that consolidates several other helpers in the subprocess module. It returns an object that describes the final state of the process, and it accepts arguments for a timeout, requiring that the process return success, and passing data as stdin. This is now the recommended way to run a single subprocess. (docs; Python 3.5 release notes)

tempfile.TemporaryDirectory is a context manager that creates a temporary directory, then destroys it and its contents at the end of the block. (docs; Python 3.2 release notes)

textwrap.indent can add an arbitrary prefix to every line in a string. (docs; Python 3.3 release notes)

time.monotonic returns the value of a monotonic clock — i.e., it will never go backwards. You should use this for measuring time durations within your program; using time.time() will produce garbage results if the system clock changes due to DST, a leap second, NTP, manual intervention, etc. (docs; Python 3.3 release notes; PEP 418)

time.perf_counter returns the value of the highest-resolution clock available, but is only suitable for measuring a short duration. (docs; Python 3.3 release notes; PEP 418)

time.process_time returns the total system and user CPU time for the process, excluding sleep. Note that the starting time is undefined, so only durations are meaningful. (docs; Python 3.3 release notes; PEP 418)

traceback.walk_stack and traceback.walk_tb are small helper functions that walk back along a stack or traceback, so you can use simple iteration rather than the slightly clumsier linked-list approach. (docs; Python 3.5 release notes)

types.MappingProxyType offers a read-only proxy to a dict. Since it holds a reference to the dict in C, you can return MappingProxyType(some_dict) to effectively create a read-only dict, as the original dict will be inaccessible from Python code. This is the same type used for the __dict__ of an immutable object. Note that this has existed in various forms for a while, but wasn’t publicly exposed or documented; see my module dictproxyhack for something that does its best to work on every Python version. (docs; Python 3.3 release notes)

types.SimpleNamespace is a blank type for sticking arbitrary unstructed attributes to. Previously, you would have to make a dummy subclass of object to do this. (docs; Python 3.3 release notes)

weakref.finalize allows you to add a finalizer function to an arbitrary (weakrefable) object from the “outside”, without needing to add a __del__. The finalize object will keep itself alive, so there’s no need to hold onto it. (docs; Python 3.4 release notes)

New modules with backports

These are less exciting, since they have backports on PyPI that work in Python 2 just as well. But they came from Python 3 development, so I credit Python 3 for them, just like I credit NASA for inventing the microwave.

asyncio is covered above, but it’s been backported as trollius for 2.6+, with the caveat that Pythons before 3.3 don’t have yield from and you have to use yield From(...) as a workaround. That caveat means that third-party asyncio libraries will almost certainly not work with trollius! For this and other reasons, the maintainer is no longer supporting it. Alas. Guess you’ll have to upgrade to Python 3, then.

enum finally provides an enumeration type, something which has long been desired in Python and solved in myriad ad-hoc ways. The variants become instances of a class, can be compared by identity, can be converted between names and values (but only explicitly), can have custom methods, and can implement special methods as usual. There’s even an IntEnum base class whose values end up as subclasses of int (!), making them perfectly compatible with code expecting integer constants. Enums have a surprising amount of power, far more than any approach I’ve seen before; I heartily recommend that you skim the examples in the documentation. Backported as enum34 for 2.4+. (docs; Python 3.4 release notes; PEP 435)

ipaddress offers types for representing IPv4 and IPv6 addresses and subnets. They can convert between several representations, perform a few set-like operations on subnets, identify special addresses, and so on. Backported as ipaddress for 2.6+. (There’s also a py2-ipaddress, but its handling of bytestrings differs from Python 3’s built-in module, which is likely to cause confusing compatibility problems.) (docs; Python 3.3 release notes; PEP 3144)

pathlib provides the Path type, representing a filesystem path that you can manipulate with methods rather than the mountain of functions in os.path. It also overloads / so you can do path / 'file.txt', which is kind of cool. PEP 519 intends to further improve interoperability of Paths with classic functions for the not-yet-released Python 3.6. Backported as pathlib2 for 2.6+; there’s also a pathlib, but it’s no longer maintained, and I don’t know what happened there. (docs; Python 3.4 release notes; PEP 428)

selectors (created as part of the work on asyncio) attempts to wrap select in a high-level interface that doesn’t make you want to claw your eyes out. A noble pursuit. Backported as selectors34 for 2.6+. (docs; Python 3.4 release notes)

statistics contains a number of high-precision statistical functions. Backported as backports.statistics for 2.6+. (docs; Python 3.4 release notes; PEP 450)

unittest.mock provides multiple ways for creating dummy objects, temporarily (with a context manager or decorator) replacing an object or some of its attributes, and verifying that some sequence of operations was performed on a dummy object. I’m not a huge fan of mocking so much that your tests end up mostly testing that your source code hasn’t changed, but if you have to deal with external resources or global state, some light use of unittest.mock can be very handy — even if you aren’t using the rest of unittest. Backported as mock for 2.6+. (docs; Python 3.3, but no release notes)

New modules without backports

Perhaps more exciting because they’re Python 3 exclusive! Perhaps less exciting because they’re necessarily related to plumbing.

faulthandler

faulthandler is a debugging aid that can dump a Python traceback during a segfault or other fatal signal. It can also be made to hook on an arbitrary signal, and can intervene even when Python code is deadlocked. You can use the default behavior with no effort by passing -X faulthandler on the command line, by setting the PYTHONFAULTHANDLER environment variable, or by using the module API manually.

I think -X itself is new as of Python 3.2, though it’s not mentioned in the release notes. It’s reserved for implementation-specific options; there are a few others defined for CPython, and the options can be retrieved from Python code via sys._xoptions.

Refs: docs; Python 3.3 release notes

importlib

importlib is the culmination of a whole lot of work, performed in multiple phases across numerous Python releases, to extend, formalize, and cleanly reimplement the entire import process.

I can’t possibly describe everything the import system can do and what Python versions support what parts of it. Suffice to say, it can do a lot of things: Python has built-in support for importing from zip files, and I’ve seen third-party import hooks that allow transparently importing modules written in another programming language.

If you want to mess around with writing your own custom importer, importlib has a ton of tools for helping you do that. It’s possible in Python 2, too, using the imp module, but that’s a lot rougher around the edges.

If not, the main thing of interest is the import_module function, which imports a module by name without all the really weird semantics of __import__. Seriously, don’t use __import__. It’s so weird. It probably doesn’t do what you think. importlib.import_module even exists in Python 2.7.

Refs: docs; Python 3.3 release notes; PEP 302?

tracemalloc

tracemalloc is another debugging aid which tracks Python’s memory allocations. It can also compare two snapshots, showing how much memory has been allocated or released between two points in time, and who was responsible. If you have rampant memory use issues, this is probably more helpful than having Python check its own RSS.

Technically, tracemalloc can be used with Python 2.7… but that involves patching and recompiling Python, so I hesitate to call it a backport. Still, if you really need it, give it a whirl.

Refs: docs; Python 3.4 release notes; PEP 454

typing

typing offers a standard way to declare type hints — the expected types of arguments and return values. Type hints are given using the function annotation syntax.

Python itself doesn’t do anything with the annotations, though they’re accessible and inspectable at runtime. An external tool like mypy can perform static type checking ahead of time, using these standard types. mypy is an existing project that predates typing (and works with Python 2), but the previous syntax relied on magic comments; typing formalizes the constructs and puts them in the standard library.

I haven’t actually used either the type hints or mypy myself, so I can’t comment on how helpful or intrusive they are. Give them a shot if they sound useful to you.

Refs: docs; Python 3.5 release notes; PEP 484

venv and ensurepip

I mean, yes, of course, virtualenv and pip are readily available in Python 2. The whole point of these is that they are bundled with Python, so you always have them at your fingertips and never have to worry about installing them yourself.

Installing Python should now give you pipX and pipX.Y commands automatically, corresponding to the latest stable release of pip when that Python version was first released. You’ll also get pyvenv, which is effectively just virtualenv.

There’s also a module interface: python -m ensurepip will install pip (hopefully not necessary), python -m pip runs pip with a specific Python version (a feature of pip and not new to the bundling), and python -m venv runs the bundled copy of virtualenv with a specific Python version.

There was a time where these were completely broken on Debian, because Debian strongly opposes vendoring (the rationale being that it’s easiest to push out updates if there’s only one copy of a library in the Debian package repository), so they just deleted ensurepip and venv? Which completely defeated the point of having them in the first place? I think this has been fixed by now, but it might still bite you if you’re on the Ubuntu 14.04 LTS.

Refs: ensurepip docs; pyvenv docs; Python 3.4 release notes; PEP 453

zipapp

zipapp makes it easy to create executable zip applications, which have been a thing since 2.6 but have languished in obscurity. Well, no longer.

This wasn’t particularly difficult before: you just zip up some code, make sure there’s a __main__.py in the root, and pass it to Python. Optionally, you can set it executable and add a shebang line, since the ZIP format ignores any leading junk in the file. That’s basically all zipapp does. (It does not magically infer your dependencies and bundle them as well; you’re on your own there.)

I can’t find a backport, which is a little odd, since I don’t think this module does anything too special.

Refs: docs; Python 3.5 release notes; PEP 441

Miscellaneous nice enhancements

There were a lot of improvements to language semantics that don’t fit anywhere else above, but make me a little happier.

The interactive interpreter does tab-completion by default. I say “by default” because I’ve been told that it was supported before, but you had to do some kind of goat blood sacrifice to get it to work. Also, command history persists between runs. (docs; Python 3.4 release notes)

The -b command-line option produces a warning when calling str() on a bytes or bytearray, or when comparing text to bytes. -bb produces an error. (docs)

The -I command-like option runs Python in “isolated mode”: it ignores all PYTHON* environment variables and leaves the current directory and user site-packages directories off of sys.path. The idea is to use this when running a system script (or in the shebang line of a system script) to insulate it from any weird user-specific stuff. (docs; Python 3.4 release notes)

Functions and classes learned a __qualname__ attribute, which is a dotted name describing (lexically) where they were defined. For example, a method’s __name__ might be foo, but its __qualname__ would be something like SomeClass.foo. Similarly, a class or function defined within another function will list that containing function in its __qualname__. (docs; Python 3.3 release notes; PEP 3155)

Generators signal their end by raising StopIteration internally, but it was also possible to raise StopIteration directly within a generator — most notably, when calling next() on an exhausted iterator. This would cause the generator to end prematurely and silently. Now, raising StopIteration inside a generator will produce a warning, which will become a RuntimeError in Python 3.7. You can opt into the fatal behavior early with from __future__ import generator_stop. (Python 3.5 release notes; PEP 479)

Implicit namespace packages allow a package to span multiple directories. The most common example is a plugin system, foo.plugins.*, where plugins may come from multiple libraries, but all want to share the foo.plugins namespace. Previously, they would collide, and some sys.path tricks were necessary to make it work; now, support is built in. (This feature also allows you to have a regular package without an __init__.py, but I’d strongly recommend still having one.) (Python 3.3 release notes; PEP 420)

Object finalization behaves in less quirky ways when destroying an isolated reference cycle. Also, modules no longer have their contents changed to None during shutdown, which fixes a long-running type of error when a __del__ method tries to call, say, os.path.join() — if you were unlucky, os.path would have already have had its contents replaced with Nones, and you’d get an extremely confusing TypeError from trying to call a standard library function. (Python 3.4 release notes; PEP 442)

str.format_map is like str.format, but it accepts a mapping object directly (instead of having to flatten it with **kwargs). This allows some fancy things that weren’t previously possible, like passing a fake map that creates values on the fly based on the keys looked up in it. (docs; Python 3.2 release notes)

When a blocking system call is interrupted by a signal, it returns EINTR, indicating that the calling code should try the same system call again. In Python, this becomes OSError or InterruptedError. I have never in my life seen any C or Python code that actually deals with this correctly. Now, Python will do it for you: all the built-in and standard library functions that make use of system calls will automatically retry themselves when interrupted. (Python 3.5 release notes; PEP 475)

File descriptors created by Python code are now flagged “non-inheritable”, meaning they’re closed automatically when spawning a child process. (docs; Python 3.4 release notes; PEP 446)

A number of standard library functions now accept file descriptors in addition to paths. (docs; Python 3.3 release notes)

Several different OS and I/O exceptions were merged into a single and more fine-grained hierarchy, rooted at OSError. Code can now catch a specific subclass in most cases, rather than examine .errno. (docs; Python 3.3 release notes; PEP 3151)

ResourceWarning is a new kind of warning for issues with resource cleanup. One is produced if a file object is destroyed, but was never closed, which can cause issues on Windows or with garbage-collected Python implementations like PyPy; one is also produced if uncollectable objects still remain when Python shuts down, indicating some severe finalization problems. The warning is ignored by default, but can be enabled with -W default on the command line. (Python 3.2 release notes)

hasattr() only catches (and returns False for) AttributeErrors. Previously, any exception would be considered a sign that the attribute doesn’t exist, even though an unusual exception like an OSError usually means the attribute is computed dynamically, and that code is broken somehow. Now, exceptions other than AttributeError are allowed to propagate to the caller. (docs; Python 3.2 release notes)

Hash randomization is on by default, meaning that dict and set iteration order is different per Python runs. This protects against some DoS attacks, but more importantly, it spitefully forces you not to rely on incidental ordering. (docs; Python 3.3 release notes)

List comprehensions no longer leak their loop variables into the enclosing scope. (Python 3.0 release notes)

nonlocal allows writing to a variable in an enclosing (but non-global) scope. (docs; Python 3.0 release notes; PEP 3104)

Comparing objects of incompatible types now produces a TypeError, rather than using Python 2’s very silly fallback. (Python 3.0 release notes)

!= defaults to returning the opposite of ==. (Python 3.0 release notes)

Accessing a method as a class attribute now gives you a regular function, not an “unbound method” object. (Python 3.0 release notes)

The input builtin no longer performs an eval (!), removing a huge point of confusion for beginners. This is the behavior of raw_input in Python 2. (docs; Python 3.0 release notes; PEP 3111)

Fast and furious

These aren’t necessarily compelling, and they may not even make any appreciable difference for your code, but I think they’re interesting technically.

Objects’ __dict__s can now share their key storage internally. Instances of the same type generally have the same attribute names, so this provides a modest improvement in speed and memory usage for programs that create a lot of user-defined objects. (Python 3.3 release notes; PEP 412)

OrderedDict is now implemented in C, making it “4 to 100” (!) times faster. Note that the backport in the 2.7 standard library is pure Python. So, there’s a carrot. (Python 3.5 release notes)

The GIL was made more predictable. My understanding is that the old behavior was to yield after some number of Python bytecode operations, which could take wildly varying amounts of time; the new behavior yields after a given duration, by default 5ms. (Python 3.2 release notes)

The io library was rewritten in C, making it more fast. Again, the Python 2.7 implementation is pure Python. (Python 3.1 release notes)

Tuples and dicts containing only immutable objects — i.e., objects that cannot possibly contain circular references — are ignored by the garbage collector. This was backported to Python 2.7, too, but I thought it was super interesting. (Python 3.1 release notes)

That’s all I’ve got

Huff, puff.

I hope something here appeals to you as a reason to at least experiment with Python 3. It’s fun over here. Give it a try.

Full Support for IPv6

Post Syndicated from Let's Encrypt - Free SSL/TLS Certificates original https://letsencrypt.org//2016/07/26/full-ipv6-support.html

Let’s Encrypt is happy to announce full support for IPv6.

As IPv4 address space is exhausted, more and more people are deploying services that are only reachable via IPv6. Adding full support for IPv6 allows us to serve more people and organizations, which is important if we’re going to encrypt the entire Web.

IPv6 is an exciting step forward which will allow the Internet to grow and reach more people. You can learn more about it by watching this video from Google’s Chief Internet Evangelist, Vint Cerf. We’re looking forward to the day when both TLS and IPv6 are ubiquitous.

Let’s Encrypt depends on industry and community support. Please consider getting involved, and if your company or organization would like to sponsor Let’s Encrypt please email us at sponsor@letsencrypt.org.

systemd for Developers I

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/socket-activation.html

systemd
not only brings improvements for administrators and users, it also
brings a (small) number of new APIs with it. In this blog story (which might
become the first of a series) I hope to shed some light on one of the
most important new APIs in systemd:

Socket Activation

In the original blog
story about systemd
I tried to explain why socket activation is a
wonderful technology to spawn services. Let’s reiterate the background
here a bit.

The basic idea of socket activation is not new. The inetd
superserver was a standard component of most Linux and Unix systems
since time began: instead of spawning all local Internet services
already at boot, the superserver would listen on behalf of the
services and whenever a connection would come in an instance of the
respective service would be spawned. This allowed relatively weak
machines with few resources to offer a big variety of services at the
same time. However it quickly got a reputation for being somewhat
slow: since daemons would be spawned for each incoming connection a
lot of time was spent on forking and initialization of the services
— once for each connection, instead of once for them all.

Spawning one instance per connection was how inetd was primarily
used, even though inetd actually understood another mode: on the first
incoming connection it would notice this via poll() (or
select()) and spawn a single instance for all future
connections. (This was controllable with the
wait/nowait options.) That way the first connection
would be slow to set up, but subsequent ones would be as fast as with
a standalone service. In this mode inetd would work in a true
on-demand mode: a service would be made available lazily when it was
required.

inetd’s focus was clearly on AF_INET (i.e. Internet) sockets. As
time progressed and Linux/Unix left the server niche and became
increasingly relevant on desktops, mobile and embedded environments
inetd was somehow lost in the troubles of time. Its reputation for
being slow, and the fact that Linux’ focus shifted away from only
Internet servers made a Linux machine running inetd (or one of its newer
implementations, like xinetd) the exception, not the rule.

When Apple engineers worked on optimizing the MacOS boot time they
found a new way to make use of the idea of socket activation: they
shifted the focus away from AF_INET sockets towards AF_UNIX
sockets. And they noticed that on-demand socket activation was only
part of the story: much more powerful is socket activation when used
for all local services including those which need to be started
anyway on boot. They implemented these ideas in launchd, a central building
block of modern MacOS X systems, and probably the main reason why
MacOS is so fast booting up.

But, before we continue, let’s have a closer look what the benefits
of socket activation for non-on-demand, non-Internet services in
detail are. Consider the four services Syslog, D-Bus, Avahi and the
Bluetooth daemon. D-Bus logs to Syslog, hence on traditional Linux
systems it would get started after Syslog. Similarly, Avahi requires
Syslog and D-Bus, hence would get started after both. Finally
Bluetooth is similar to Avahi and also requires Syslog and D-Bus but
does not interface at all with Avahi. Sinceoin a traditional
SysV-based system only one service can be in the process of getting
started at a time, the following serialization of startup would take
place: Syslog → D-Bus → Avahi → Bluetooth (Of course, Avahi and
Bluetooth could be started in the opposite order too, but we have to
pick one here, so let’s simply go alphabetically.). To illustrate
this, here’s a plot showing the order of startup beginning with system
startup (at the top).

Parallelization plot

Certain distributions tried to improve this strictly serialized
start-up: since Avahi and Bluetooth are independent from each other,
they can be started simultaneously. The parallelization is increased,
the overall startup time slightly smaller. (This is visualized in the
middle part of the plot.)

Socket activation makes it possible to start all four services
completely simultaneously, without any kind of ordering. Since the
creation of the listening sockets is moved outside of the daemons
themselves we can start them all at the same time, and they are able
to connect to each other’s sockets right-away. I.e. in a single step
the /dev/log and /run/dbus/system_bus_socket sockets
are created, and in the next step all four services are spawned
simultaneously. When D-Bus then wants to log to syslog, it just writes
its messages to /dev/log. As long as the socket buffer does
not run full it can go on immediately with what else it wants to do
for initialization. As soon as the syslog service catches up it will
process the queued messages. And if the socket buffer runs full then
the client logging will temporarily block until the socket is writable
again, and continue the moment it can write its log messages. That
means the scheduling of our services is entirely done by the kernel:
from the userspace perspective all services are run at the same time,
and when one service cannot keep up the others needing it will
temporarily block on their request but go on as soon as these
requests are dispatched. All of this is completely automatic and
invisible to userspace. Socket activation hence allows us to
drastically parallelize start-up, enabling simultaneous start-up of
services which previously were thought to strictly require
serialization. Most Linux services use sockets as communication
channel. Socket activation allows starting of clients and servers of
these channels at the same time.

But it’s not just about parallelization. It offers a number of
other benefits:

  • We no longer need to configure dependencies explicitly. Since the
    sockets are initialized before all services they are simply available,
    and no userspace ordering of service start-up needs to take place
    anymore. Socket activation hence drastically simplifies configuration
    and development of services.
  • If a service dies its listening socket stays around, not losing a
    single message. After a restart of the crashed service it can continue
    right where it left off.
  • If a service is upgraded we can restart the service while keeping
    around its sockets, thus ensuring the service is continously
    responsive. Not a single connection is lost during the upgrade.
  • We can even replace a service during runtime in a way that is
    invisible to the client. For example, all systems running systemd
    start up with a tiny syslog daemon at boot which passes all log
    messages written to /dev/log on to the kernel message
    buffer. That way we provide reliable userspace logging starting from
    the first instant of boot-up. Then, when the actual rsyslog daemon is
    ready to start we terminate the mini daemon and replace it with the
    real daemon. And all that while keeping around the original logging
    socket and sharing it between the two daemons and not losing a single
    message. Since rsyslog flushes the kernel log buffer to disk after
    start-up all log messages from the kernel, from early-boot and from
    runtime end up on disk.

For another explanation of this idea consult the original blog
story about systemd
.

Socket activation has been available in systemd since its
inception. On Fedora 15 a number of services have been modified to
implement socket activation, including Avahi, D-Bus and rsyslog (to continue with the example above).

systemd’s socket activation is quite comprehensive. Not only classic
sockets are support but related technologies as well:

  • AF_UNIX sockets, in the flavours SOCK_DGRAM, SOCK_STREAM and SOCK_SEQPACKET; both in the filesystem and in the abstract namespace
  • AF_INET sockets, i.e. TCP/IP and UDP/IP; both IPv4 and IPv6
  • Unix named pipes/FIFOs in the filesystem
  • AF_NETLINK sockets, to subscribe to certain kernel features. This
    is currently used by udev, but could be useful for other
    netlink-related services too, such as audit.
  • Certain special files like /proc/kmsg or device nodes like /dev/input/*.
  • POSIX Message Queues

A service capable of socket activation must be able to receive its
preinitialized sockets from systemd, instead of creating them
internally. For most services this requires (minimal)
patching. However, since systemd actually provides inetd compatibility
a service working with inetd will also work with systemd — which is
quite useful for services like sshd for example.

So much about the background of socket activation, let’s now have a
look how to patch a service to make it socket activatable. Let’s start
with a theoretic service foobard. (In a later blog post we’ll focus on
real-life example.)

Our little (theoretic) service includes code like the following for
creating sockets (most services include code like this in one way or
another):

/* Source Code Example #1: ORIGINAL, NOT SOCKET-ACTIVATABLE SERVICE */
...
union {
        struct sockaddr sa;
        struct sockaddr_un un;
} sa;
int fd;

fd = socket(AF_UNIX, SOCK_STREAM, 0);
if (fd < 0) {
        fprintf(stderr, "socket(): %m\n");
        exit(1);
}

memset(&sa, 0, sizeof(sa));
sa.un.sun_family = AF_UNIX;
strncpy(sa.un.sun_path, "/run/foobar.sk", sizeof(sa.un.sun_path));

if (bind(fd, &sa.sa, sizeof(sa)) < 0) {
        fprintf(stderr, "bind(): %m\n");
        exit(1);
}

if (listen(fd, SOMAXCONN) < 0) {
        fprintf(stderr, "listen(): %m\n");
        exit(1);
}
...

A socket activatable service may use the following code instead:

/* Source Code Example #2: UPDATED, SOCKET-ACTIVATABLE SERVICE */
...
#include "sd-daemon.h"
...
int fd;

if (sd_listen_fds(0) != 1) {
        fprintf(stderr, "No or too many file descriptors received.\n");
        exit(1);
}

fd = SD_LISTEN_FDS_START + 0;
...

systemd might pass you more than one socket (based on
configuration, see below). In this example we are interested in one
only. sd_listen_fds()
returns how many file descriptors are passed. We simply compare that
with 1, and fail if we got more or less. The file descriptors systemd
passes to us are inherited one after the other beginning with fd
#3. (SD_LISTEN_FDS_START is a macro defined to 3). Our code hence just
takes possession of fd #3.

As you can see this code is actually much shorter than the
original. This of course comes at the price that our little service
with this change will no longer work in a non-socket-activation
environment. With minimal changes we can adapt our example to work nicely
both with and without socket activation:

/* Source Code Example #3: UPDATED, SOCKET-ACTIVATABLE SERVICE WITH COMPATIBILITY */
...
#include "sd-daemon.h"
...
int fd, n;

n = sd_listen_fds(0);
if (n > 1) {
        fprintf(stderr, "Too many file descriptors received.\n");
        exit(1);
} else if (n == 1)
        fd = SD_LISTEN_FDS_START + 0;
else {
        union {
                struct sockaddr sa;
                struct sockaddr_un un;
        } sa;

        fd = socket(AF_UNIX, SOCK_STREAM, 0);
        if (fd < 0) {
                fprintf(stderr, "socket(): %m\n");
                exit(1);
        }

        memset(&sa, 0, sizeof(sa));
        sa.un.sun_family = AF_UNIX;
        strncpy(sa.un.sun_path, "/run/foobar.sk", sizeof(sa.un.sun_path));

        if (bind(fd, &sa.sa, sizeof(sa)) < 0) {
                fprintf(stderr, "bind(): %m\n");
                exit(1);
        }

        if (listen(fd, SOMAXCONN) < 0) {
                fprintf(stderr, "listen(): %m\n");
                exit(1);
        }
}
...

With this simple change our service can now make use of socket
activation but still works unmodified in classic environments. Now,
let’s see how we can enable this service in systemd. For this we have
to write two systemd unit files: one describing the socket, the other
describing the service. First, here’s foobar.socket:

[Socket]
ListenStream=/run/foobar.sk

[Install]
WantedBy=sockets.target

And here’s the matching service file foobar.service:

[Service]
ExecStart=/usr/bin/foobard

If we place these two files in /etc/systemd/system we can
enable and start them:

# systemctl enable foobar.socket
# systemctl start foobar.socket

Now our little socket is listening, but our service not running
yet. If we now connect to /run/foobar.sk the service will be
automatically spawned, for on-demand service start-up. With a
modification of foobar.service we can start our service
already at startup, thus using socket activation only for
parallelization purposes, not for on-demand auto-spawning anymore:

[Service]
ExecStart=/usr/bin/foobard

[Install]
WantedBy=multi-user.target

And now let’s enable this too:

# systemctl enable foobar.service
# systemctl start foobar.service

Now our little daemon will be started at boot and on-demand,
whatever comes first. It can be started fully in parallel with its
clients, and when it dies it will be automatically restarted when it
is used the next time.

A single .socket file can include multiple ListenXXX stanzas, which
is useful for services that listen on more than one socket. In this
case all configured sockets will be passed to the service in the exact
order they are configured in the socket unit file. Also,
you may configure various socket settings in the .socket
files.

In real life it’s a good idea to include description strings in
these unit files, to keep things simple we’ll leave this out of our
example. Speaking of real-life: our next installment will cover an
actual real-life example. We’ll add socket activation to the CUPS
printing server.

The sd_listen_fds() function call is defined in sd-daemon.h
and sd-daemon.c. These
two files are currently drop-in .c sources which projects should
simply copy into their source tree. Eventually we plan to turn this
into a proper shared library, however using the drop-in files allows
you to compile your project in a way that is compatible with socket
activation even without any compile time dependencies on
systemd. sd-daemon.c is liberally licensed, should compile
fine on the most exotic Unixes and the algorithms are trivial enough
to be reimplemented with very little code if the license should
nonetheless be a problem for your project. sd-daemon.c
contains a couple of other API functions besides
sd_listen_fds() that are useful when implementing socket
activation in a project. For example, there’s sd_is_socket()
which can be used to distuingish and identify particular sockets when
a service gets passed more than one.

Let me point out that the interfaces used here are in no way bound
directly to systemd. They are generic enough to be implemented in
other systems as well. We deliberately designed them as simple and
minimal as possible to make it possible for others to adopt similar
schemes.

Stay tuned for the next installment. As mentioned, it will cover a
real-life example of turning an existing daemon into a
socket-activatable one: the CUPS printing service. However, I hope
this blog story might already be enough to get you started if you plan
to convert an existing service into a socket activatable one. We
invite everybody to convert upstream projects to this scheme. If you
have any questions join us on #systemd on freenode.

Re: Avahi – what happened. on Solaris..?

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/project-indiana-part2.html

In response to Darren Kenny:

  • On Linux (and FreeBSD) nss-mdns has been
    providing decent low-level integration of mDNS at the nsswitch level for
    ages. In fact it even predates Avahi by a few months. Porting it to Solaris
    would have been almost trivial. And, Sun engineers even asked about nss-mdns,
    so I am quite sure that Sun knew about this.
  • You claim that our C API was internal? I wonder who told you that. I
    definitely did not. The API has been available on the Avahi web site for ages
    and is relatively well documented [1], I wonder how anyone could
    ever come to the opinion that it was “internal”. Regarding API stability: yes, I
    said that we make no guarantees about API stability — but I also said it was
    a top-priority for us to keep the API compatible. I think that is the best you
    can get from any project of the Free Software community. If there is
    something in an API that we later learn is irrecoverably broken or stupid by design, then
    we take the freedom to replace that or remove it entirely. Oh, and even Sun
    does things like that in Java, Just think of the Java 1.x
    java.lang.Thread.stop() API.
  • nss-mdns does not make any use of D-Bus. It never did, it never will.
  • GNOME never formally made the decision to go Avahi AFAIK. It’s just what
    everyone uses because it is available on all distributions. Also, a lot of GNOME software
    can also be compiled against HOWL/Bonjour.
  • Implementing the Avahi API on top of the Bonjour API is just crack. For a
    crude comparison: this is like implementing a POSIX compatiblity layer on top
    of the DOS API. Crack. Just crack. There is lot of functionality you can
    *never* emulate in any reasonable way on top of the current Bonjour API:
    properly integrated IPv4+IPv6 support, AVAHI_BROWSER_ALL_FOR_NOW, the fact that the Avahi API is
    transaction-based, all the different flag definitions, and a lot more. From a
    technical persepective emulating Avahi on top of Bonjour is not feasible, while
    the other way round perfectly is.

Let’s also not forget that Avahi comes with a Bonjour compatibility layer,
which gets almost any Bonjour app working on top of Avahi. And in contrast your
Avahi-on-top-of-Bonjour stuff it is not inherently borked. Yes, our Bonjour compatibility layer is
not perfect, but should be very easy to fix if there should still be an
incompatibility left. And the API of that layer is of course as much set in
stone as the upstream Bonjour API. Oh, and you wouldn’t have to run two daemons instead of
just one. And you would only need to ship and maintain a single mDNS package.
Oh, and the compatibility layer would only be needed for the few remaing
applications that still use Bonjour exclusively, and not by the majority of
applications.

So, in effect you chose Bonjour because of its API and added some Avahi’sh
API on top and this all is totally crackish. If you’d have done it the other way round
you would have gotten both APIs as well, but the overall solution would not
have been totally crackish. And let’s not forget that Avahi is much more
complete than Bonjour. (Maybe except wide-area support, Federico!).

Anyway, my original rant was not about the way Sun makes its decision but
just about the fact that your Avahi-to-Bonjour-bridge is … crack! And that
it remains.

Wow, six times crack in a single article.

Footnotes:

[1] For a Free Software API at least.