Tag Archives: issa

Former Judge Accuses IP Court of Using ‘Pirate’ Microsoft Software

Post Syndicated from Andy original https://torrentfreak.com/former-judge-accuses-ip-court-of-using-pirate-microsoft-software-180429/

While piracy of movies, TV shows, and music grabs most of the headlines, software piracy is a huge issue, from both consumer and commercial perspectives.

For many years, software such as Photoshop has been pirated on a grand scale and around the world, millions of computers rely on cracked and unlicensed copies of Microsoft’s Windows software.

One of the key drivers of this kind of piracy is the relative expense of software. Open source variants are nearly always available but big brand names always seem more popular due to their market penetration and perceived ease of use.

While using pirated software very rarely gets individuals into trouble, the same cannot be said of unlicensed commercial operators. That appears to be the case in Russia where somewhat ironically the Court for Intellectual Property Rights stands accused of copyright infringement.

A complaint filed by the Paragon law firm at the Prosecutor General’s Office of the Court for Intellectual Property Rights (CIP) alleges that the Court is illegally using Microsoft software, something which has the potential to affect the outcome of court cases involving the US-based software giant.

Paragon is representing Alexander Shmuratov, who is a former Assistant Judge at the Court for Intellectual Property Rights. Shmuratov worked at the Court for several years and claims that the computers there were being operated with expired licenses.

Shmuratov himself told Kommersant that he “saw the notice of an activation failure every day when using MS Office products” in intellectual property court.

A representative of the Prosecutor General’s Office confirmed that a complaint had been received but said it had been forwarded to the Ministry of Internal Affairs.

In respect of the counterfeit software claims, CIP categorically denies the allegations. CIP says that licenses for all Russian courts were purchased back in 2008 and remained in force until 2011. In 2013, Microsoft agreed to an extension.

Only adding more intrigue to the story, CIP Assistant chairman Catherine Ulyanova said that the initator of the complaint, former judge Alexander Shmuratov, was dismissed from the CIP because he provided false information about income. He later mounted a challenge against his dismissal but was unsuccessful.

Ulyanova said that Microsoft licensed all courts from 2006 for use of Windows and MS Office. The licenses were acquired through a third-party company and more licenses than necessary were purchased, with some licenses being redistributed for use by CIP in later years with the consent of Microsoft.

Kommersant was unable to confirm how licenses were paid for beyond December 2011 but apparently an “official confirmation letter from the Irish headquarters of Microsoft, which does not object to the transfer of CIP licenses” had been sent to the Court.

Responding to Shmuratov’s allegations that software he used hadn’t been activated, Ulyanova said that technical problems had no relationship with the existence of software licenses.

The question of whether the Court is properly licensed will be determined at a later date but observers are already raising questions concerning CIP’s historical dealings with Microsoft not only in terms of licensing, but in cases it handled.

In the period 2014-2017, the Court for Intellectual Property Rights handled around 80 cases involving Microsoft and claims of between 50 thousand ($800) and several million rubles.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Dotcom’s Bid to Compel Obama to Give Evidence Rejected By High Court

Post Syndicated from Andy original https://torrentfreak.com/dotcoms-bid-to-compel-obama-to-give-evidence-rejected-by-high-court-180321/

With former US president Barack Obama in New Zealand until Friday, the visit provided a golden opportunity for Kim Dotcom to pile on yet more pressure over the strained prosecution of both him and his defunct cloud storage site, Megaupload.

In a statement issued yesterday, Dotcom reiterated his claims that attempts to have him extradited to the United States have no basis in law, chiefly due to the fact that the online dissemination of copyright-protected works by Megaupload’s users is not an extradition offense in New Zealand.

Mainly, however, Dotcom shone yet more light on what he perceives to be the dark politics behind the case, arguing that the Obama administration was under pressure from Hollywood to do something about copyright enforcement or risk losing funding. He says they pulled out all the stops and trampled his rights to prevent that from happening.

In a lengthy affidavit, filed this week to coincide with Obama’s visit, Dotcom called on the High Court to compel the former president to give evidence in the entrepreneur’s retaliatory multi-billion dollar damages claim against the Kiwi government.

This morning, however, Chief High Court Judge, Justice Geoffrey Venning, quickly shut that effort down.

With Obama enjoying a round of golf alongside former Prime Minister and Dotcom nemesis John Key, Justice Venning declined the request to compel Obama to give evidence, whether in New Zealand during the current visit or via letter of request to judicial authorities in the United States.

In his decision, Justice Venning notes that Dotcom’s applications were filed late on March 19 and the matter was only handed to him yesterday. As a result, he convened a telephone conference this morning to “deal with the application as a matter of urgency.”

Dotcom’s legal team argued that in the absence of a Court order it’s unlikely that Obama would give evidence. Equally, given that no date has yet been set for Dotcom’s damages hearing, it will “not be practicable” to serve Obama at a later point in the United States.

Furthermore, absent an order compelling his attendance, Obama would be unlikely to be called as a witness, despite him being the most competent potential witness currently present in New Zealand.

Dotcom counsel Ron Mansfield accepted that there would be practical limitations on what could be achieved between March 21 and March 23 while Obama is in New Zealand. However, he asked that an order be granted so that it could be served while Obama is in the country, even if the examination took place at a later date.

The Judge wasn’t convinced.

“Despite Mr Mansfield’s concession, I consider the application is still premature. The current civil proceedings were only filed on 22 December 2017. The defendants have applied for an order deferring the filing of a statement of defense pending the determination of the hearing of two appeals currently before the Court of Appeal. That application is yet to be determined,” Justice Venning’s decision reads.

The Judge also questions whether evidence Obama could give would be relevant.

He notes that Dotcom’s evidence is based on the fact that Hollywood was a major benefactor of the Democratic Party in the United States and that, in his opinion, the action against Megaupload and him “met the United States’ need to appease the Hollywood lobby” and “that the United States and New Zealand’s interests were perfectly aligned.”

However, Dotcom’s transcripts of his conversations with a lobbyist, which appeared to indicate Obama’s dissatisfaction with the Megaupload prosecution, are dismissed as “hearsay evidence”. Documentation of a private lunch with Obama and the head of the MPAA is also played down.

“Mr Dotcom’s opinion that Mr Obama’s evidence will be relevant to the present claims appears at best speculative,” the Judge notes.

But even if the evidence had been stronger, Justice Venning says that Obama would need to be given time to prepare for an examination, given that it would relate to matters that occurred several years ago.

“He would need to review relevant documents and materials from the time in preparation for any examination. That confirms the current application is premature,” the Judge writes.

In support, it is noted that Dotcom knew as early as February 21 that Obama’s visit would be taking place this week, yet his application was filed just days ago.

With that, the Judge dismissed the application, allowing Obama to play golf in peace. Well, relative peace at least. Dotcom isn’t done yet.

“I am disappointed of course because I believe my affidavit contains compelling evidence of the link between the Obama administration, Hollywood, and my extradition proceeding. However, after seven years of this, I am used to fighting to get to the truth and will keep fighting. Next round!” Dotcom said in response.

“The judgment is no surprise and we’ll get the opportunity to question Obama sooner or later,” he added.

As a further indication of the international nature of Dotcom’s case, the Megaupload founder also reminded people of his former connections to Hong Kong, noting that people in power there are keeping an eye on his case.

“The Chinese Government is watching my case with interest. Expect some bold action in the Hong Kong Courts soon. Never again shall an accusation from the US DOJ be enough to destroy a Hong Kong business. That lesson will soon be learned,” he said.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Dotcom Affidavit Calls For Obama to Give Evidence in Megaupload Case

Post Syndicated from Andy original https://torrentfreak.com/dotcom-affidavit-calls-for-obama-to-give-evidence-in-megaupload-case-180320/

For more than six years since the raid on Megaupload, founder Kim Dotcom has insisted that the case against him, his co-defendants, and his company, was politically motivated.

The serial entrepreneur states unequivocally that former president Barack Obama’s close ties to Hollywood were the driving force.

Later today, Obama will touch down for a visit to New Zealand. In what appears to be a tightly managed affair, with heavy restrictions placed on the media and publicity, it seems clear that Obama wants to maintain control over his social and business engagements in the country.

But of course, New Zealand is home to Kim Dotcom and as someone who feels wronged by the actions of the former administration, he is determined to use this opportunity to shine more light on Obama’s role in the downfall of his company.

In a statement this morning, Dotcom reiterated his claims that attempts to have him extradited to the United States have no basis in law, chiefly due to the fact that the online dissemination of copyright-protected works by Megaupload’s users is not an extradition offense in New Zealand.

But Dotcom also attacks the politics behind his case, arguing that the Obama administration was under pressure from Hollywood to do something about copyright enforcement or risk losing financial support.

In connection with his case, Dotcom is currently suing the New Zealand government for billions of dollars so while Obama is in town, Dotcom is demanding that the former president gives evidence.

Dotcom’s case is laid out in a highly-detailed sworn affidavit dated March 19, 2018. The Megaupload founder explains that Hollywood has historically been a major benefactor of the Democrats so when seeking re-election for a further term, the Democrats were under pressure from the movie companies to make an example of Megaupload and Dotcom.

Dotcom notes that while he was based in Hong Kong, extradition to the US would be challenging. So, with Dotcom seeking residence in New Zealand, a plot was hatched to allow him into the country, despite the New Zealand government knowing that a criminal prosecution lay in wait for him. Dotcom says that by doing a favor for Hollywood, it could mean that New Zealand became a favored destination for US filmmakers.

“The interests of the United States and New Zealand were therefore perfectly aligned. I provided the perfect opportunity for New Zealand to facilitate the United States’ show of force on copyright enforcement,” Dotcom writes.

Citing documents obtained from Open Secrets, Dotcom shows how the Democrats took an 81% share of more than $46m donated to political parties in the US during the 2008 election cycle. In the 2010 cycle, 76% of more than $24m went to the Democrats and in 2012, they scooped up 78% of more than $56m.

Dotcom then recalls the attempts at passing the Stop Online Piracy Act (SOPA), which would have shifted the enforcement of copyright onto ISPs, assisting Hollywood greatly. Ultimately, Congressional support for the proposed legislation was withdrawn and Dotcom recalls this was followed by a public threat from the MPAA to withdraw campaign contributions on which the Democrats were especially reliant.

“The message to the White House was plain: do not expect funding if you do not advance the MPAA’s legislative agenda. On 20 January 2012, the day after this statement, I was arrested,” Dotcom notes.

Describing Megaupload as a highly profitable and innovative platform that highlighted copyright owners’ failure to keep up with the way in which content is now consumed, Dotcom says it made the perfect target for the Democrats.

Convinced the party was at the root of his prosecution, he utilized his connections in Hong Kong to contact Thomas Hart, a lawyer and lobbyist in Washington, D.C. with strong connections to the Democrats and the White House.

Dotcom said a telephone call between him and Mr Hart revealed that then Vice President Joe Biden was at the center of Dotcom’s prosecution but that Obama was dissatisfied with the way things had been handled.

“Biden did admit to have… you know, kind of started it, you know, along with support from others but it was Biden’s decision…,” Hart allegedly said.

“What he [President Obama] expressed to me was a growing concern about the matter. He indicated an awareness of that it had not gone well, that it was more complicated than he thought, that he will turn his attention to it more prominently after November.”

Dotcom says that Obama was “questioning the whole thing,” a suggestion that he may not have been fully committed to the continuing prosecution.

The affidavit then lists a whole series of meetings in 2011, documented in the White House visitor logs. They include meetings with then United States Attorney Neil McBride, various representatives from Hollywood, MPAA chief Chris Dodd, Mike Ellis of the MPA (who was based in Hong Kong and had met with New Zealand’s then Minister of Justice, Simon Power) and the Obama administration.

In summary, Dotcom suggests there was a highly organized scheme against him, hatched between Hollywood and the Obama administration, that had the provision of funds to win re-election at its heart.

From there, an intertwined agreement was reached at the highest levels of both the US and New Zealand governments where the former would benefit through tax concessions to Hollywood (and a sweetening of relations between the countries) and the latter would benefit financially through investment.

All New Zealand had to do was let Dotcom in for a while and then hand him over to the United States for prosecution. And New Zealand definitely knew that Dotcom was wanted by the US. Emails obtained by Dotcom concerning his residency application show that clearly.

“Kim DOTCOM is not of security concern but is likely to soon become the subject of a joint FBI / NZ Police criminal investigation. We have passed this over to NZ Police,” one of the emails reads. Another, well over a year before the raid, also shows the level of knowledge.

Bad but wealthy, so we have plans for him…

With “political pressure” to grant Dotcom’s application in place, Immigration New Zealand finally gave the Megaupload founder the thumbs-up on November 1, 2010. Dotcom believes that New Zealand was concerned he may have walked away from his application.

“This would have been of grave concern to the Government, which, at that time, was in negotiations with Hollywood lobby,” his affidavit reads.

“The last thing they would have needed at that delicate stage of the negotiations was for me to walk away from New Zealand and return to Hong Kong, where extradition would be more difficult. I believe that this concern is what prompted the ‘political pressure’ that led to my application finally being granted despite the presence of factors that would have caused anyone else’s application to have been rejected.”

Dotcom says that after being granted residency, there were signs things weren’t going to plan for him. The entrepreneur applied to buy his now-famous former mansion for NZ$37m, an application that was initially approved. However, after being passed to Simon Power, the application was denied.

“It would appear that, although my character was apparently good enough for me to be granted residence in November 2010, in July 2011 it was not considered good enough for me to buy property in New Zealand,” Dotcom notes.

“The Honourable Mr Power clearly did not want me purchasing $37 million of real estate, presumably because he knew that the United States was going to seek forfeiture of my assets and he did not want what was then the most expensive property in New Zealand being forfeited to the United States government.”

Of course, Dotcom concludes by highlighting the unlawful spying by New Zealand’s GCSB spy agency and the disproportionate use of force displayed by the police when they raided him in 2010 using dozens of armed officers. This, combined with all of the above, means that questions about his case must now be answered at the highest levels. With Obama in town, there’s no time like the present.

“As the evidence above demonstrates, this improper purpose which was then embraced by the New Zealand authorities, originated in the White House under the Obama administration. It is therefore necessary to examine Mr Obama in this proceeding,” Dotcom concludes.

Press blackouts aside, it appears that Obama has rather a lot of golf lined up for the coming days. Whether he’ll have any time to answer Dotcom’s questions is one thing but whether he’ll even be asked to is perhaps the most important point of all.

The full affidavit and masses of supporting evidence can be found here.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Pi 3B+: 48 hours later

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/3b-plus-aftermath/

Unless you’ve been AFK for the last two days, you’ll no doubt be aware of the release of the brand-spanking-new Raspberry Pi 3 Model B+. With faster connectivity, more computing power, Power over Ethernet (PoE) pins, and the same $35 price point, the new board has been a hit across all our social media accounts! So while we wind down from launch week, let’s all pull up a chair, make yet another cup of coffee, and look through some of our favourite reactions from the last 48 hours.

Twitter

Our Twitter mentions were refreshing at hyperspeed on Wednesday, as you all began to hear the news and spread the word about the newest member to the Raspberry Pi family.

Tanya Fish on Twitter

Happy Pi Day, people! New @Raspberry_Pi 3B+ is out.

News outlets, maker sites, and hobbyists published posts and articles about the new Pi’s spec upgrades and their plans for the device.

Hackster.io on Twitter

This sort of attention to detail work is exactly what I love about being involved with @Raspberry_Pi. We’re squeezing the last drops of performance out of the 40nm process node, and perfecting Pi 3 in the same way that the original B+ perfected Pi 1.” https://t.co/hEj7JZOGeZ

And I think we counted about 150 uses of this GIF on Twitter alone:

YouTube

Andy Warburton 👾 on Twitter

Is something going on with the @Raspberry_Pi today? You’d never guess from my YouTube subscriptions page… 😀

A few members of our community were lucky enough to get their hands on a 3B+ early, and sat eagerly by the YouTube publish button, waiting to release their impressions of our new board to the world. Others, with no new Pi in hand yet, posted reaction vids to the launch, discussing their plans for the upgraded Pi and comparing statistics against its predecessors.

New Raspberry Pi 3 B+ (2018) Review and Speed Tests

Happy Pi Day World! There is a new Raspberry Pi 3, the B+! In this video I will review the new Pi 3 B+ and do some speed tests. Let me know in the comments if you are getting one and what you are planning on making with it!

Long-standing community members such as The Raspberry Pi Guy, Alex “RasPi.TV” Eames, and Michael Horne joined Adafruit, element14, and RS Components (whose team produced the most epic 3B+ video we’ve seen so far), and makers Tinkernut and Estefannie Explains It All in sharing their thoughts, performance tests, and baked goods on the big day.

What’s new on the Raspberry Pi 3 B+

It’s Pi day! Sorry, wondrous Mathematical constant, this day is no longer about you. The Raspberry Pi foundation just released a new version of the Raspberry Pi called the Rapsberry Pi B+.

If you have a YouTube or Vimeo channel, or if you create videos for other social media channels, and have published your impressions of the new Raspberry Pi, be sure to share a link with us so we can see what you think!

Instagram

We shared a few photos and videos on Instagram, and over 30000 of you checked out our Instagram Story on the day.

Some glamour shots of the latest member of the #RaspberryPi family – the Raspberry Pi 3 Model B+ . Will you be getting one? What are your plans for our newest Pi?

5,609 Likes, 103 Comments – Raspberry Pi (@raspberrypifoundation) on Instagram: “Some glamour shots of the latest member of the #RaspberryPi family – the Raspberry Pi 3 Model B+ ….”

As hot off the press (out of the oven? out of the solder bath?) Pi 3B+ boards start to make their way to eager makers’ homes, they are all broadcasting their excitement, and we love seeing what they plan to get up to with it.

The new #raspberrypi 3B+ suits the industrial setting. Check out my website for #RPI3B Vs RPI3BPlus network speed test. #NotEnoughTECH #network #test #internet

8 Likes, 1 Comments – Mat (@notenoughtech) on Instagram: “The new #raspberrypi 3B+ suits the industrial setting. Check out my website for #RPI3B Vs RPI3BPlus…”

The new Raspberry Pi 3 Model B+ is here and will be used for our Python staging server for our APIs #raspberrypi #pythoncode #googleadwords #shopify #datalayer

16 Likes, 3 Comments – Rob Edlin (@niddocks) on Instagram: “The new Raspberry Pi 3 Model B+ is here and will be used for our Python staging server for our APIs…”

In the news

Eben made an appearance on ITV Anglia on Wednesday, talking live on Facebook about the new Raspberry Pi.

ITV Anglia

As the latest version of the Raspberry Pi computer is launched in Cambridge, Dr Eben Upton talks about the inspiration of Professor Stephen Hawking and his legacy to science. Add your questions in…

He was also fortunate enough to spend the morning with some Sixth Form students from the local area.

Sascha Williams on Twitter

On a day where science is making the headlines, lovely to see the scientists of the future in our office – getting tips from fab @Raspberry_Pi founder @EbenUpton #scientists #RaspberryPi #PiDay2018 @sirissac6thform

Principal Hardware Engineer Roger Thornton will also make a live appearance online this week: he is co-hosting Hack Chat later today. And of course, you can see more of Roger and Eben in the video where they discuss the new 3B+.

Introducing the Raspberry Pi 3 Model B+

Raspberry Pi 3 Model B+ is now on sale now for $35.

It’s been a supremely busy week here at Pi Towers and across the globe in the offices of our Approved Resellers, and seeing your wonderful comments and sharing in your excitement has made it all worth it. Please keep it up, and be sure to share the arrival of your 3B+ as well as the projects into which you’ll be integrating them.

If you’d like to order a Raspberry Pi 3 Model B+, you can do so via our product page. And if you have any questions at all regarding the 3B+, the conversation is still taking place in the comments of Wednesday’s launch post, so head on over.

The post Pi 3B+: 48 hours later appeared first on Raspberry Pi.

Huge Rightsholder Coalition Calls on New EU Presidency to Remove Safe Harbors

Post Syndicated from Andy original https://torrentfreak.com/huge-rightsholder-coalition-calls-on-new-eu-presidency-to-remove-safe-harbors-180131/

While piracy of all kinds is often viewed as a threat to the creative industries, a new type of unauthorized content distribution has been gaining prominence over the past few years.

Sites like YouTube, that allow their users to upload all kinds of material – some of it infringing – are now seen as undermining a broad range of industries that rely on both video and audio to generate revenue.

The cries against such User Uploaded Content (UUC) sites are often led by the music industry, which complains that the safe harbor provisions of copyright law are being abused while UUC sites generate review from infringing content. In tandem, while that free content is made available, UUC sites have little or no incentive to pay for official content licenses, and certainly not at a rate considered fair by the industry.

This mismatch, between the price that content industries would like to achieve for licenses and what they actually achieve, is now known as the ‘Value Gap’.

Today, in advance of an EU meeting on the draft Copyright Directive, a huge coalition of rightsholder groups is calling on the new EU Presidency not to pass up an “unmissable opportunity” to find a solution to their problems.

In a letter addressed to the Presidency of the Council of the European Union, which Bulgaria officially took over January 1, 2018, an army of rightsholders lay out their demands.

“We represent musical, audio-visual, literary, visual authors; performers; book, press, musical, scientific, technical and medical publishers; recorded music, film and TV producers; football leagues; broadcasters; distributors and photo agencies. These are at the very heart of Europe’s creative sector,” the groups write.

“We have formed an alliance to campaign for a solution to a major problem which is holding back our sector and jeopardizing future sustainability – the Transfer of Value, otherwise known as the Value Gap.

“User uploaded content services have become vast distributors of our creative works e.g. film, music, photos, broadcasts, text and sport content – all while refusing to negotiate fair or any copyright licences with us as right holders.”

Value Gap Coalition

Featuring groups representing many thousands of rightsholders, the coalition is the broadest yet to call for action against the ‘Value Gap’. Or, to put it another way, to demand a change in the law to prevent sites like YouTube, Facebook and other hosting platforms from “hiding” behind provisions designed to protect them from the infringing activities of others.

“This problem is caused by a lack of clarity surrounding the application of copyright to certain online services and the abuse of European copyright ‘safe harbor’ rules in the e-Commerce Directive (2000/31/EC) by those services,” the coalition writes.

Referencing the EU Copyright Directive proposal tabled by the European Commission in September 2016, the coalition says that UUC services communicating content to the public should be compelled to obtain licenses for that content. If they play an “active role” through promotion or optimization of content, UUC platforms should be denied ‘safe harbors’ under copyright law, they argue.

Noting that there is “no solution” to the problem without the above fixes, the coalition cites last year’s ruling by the Court of Justice of the European Union which found that The Pirate Bay knowingly provide users with a platform to share copyright-infringing links.

“It is important to recall that the underlying policy objective of this legislation is to address the current unfairness in the online market due to the misapplication of copyright liability rules by UUC services. We would therefore like to stress that the focus should remain on finding effective solutions to tackle this issue.

“As an alliance, we look forward to working with your Presidency to achieve an effective solution to the Value Gap problem for the benefit of Europe,” the coalition concludes.

The letter, addressed to Prime Minister Borissov, Minister Pavlova and Minister Banov, arrives in the wake of an alert sounded by several Members of the European Parliament.

Earlier this month they warned that the EU’s proposed mandatory upload filters – which could see UUC sites pre-screen user-uploaded content for infringement – amount to “censorship machines” that will do more harm than good.

The full letter can be found here (pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

timeShift(GrafanaBuzz, 1w) Issue 30

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2018/01/19/timeshiftgrafanabuzz-1w-issue-30/

Welcome to TimeShift

We’re only 6 weeks away from the next GrafanaCon and here at Grafana Labs we’re buzzing with excitement. We have some great talks lined up that you won’t want to miss.

This week’s TimeShift covers Grafana’s annotation functionality, monitoring with Prometheus, integrating Grafana with NetFlow and a peek inside Stream’s monitoring stack. Enjoy!


Latest Stable Release

Grafana 4.6.3 is now available. Latest bugfixes include:

  • Gzip: Fixes bug Gravatar images when gzip was enabled #5952
  • Alert list: Now shows alert state changes even after adding manual annotations on dashboard #99513
  • Alerting: Fixes bug where rules evaluated as firing when all conditions was false and using OR operator. #93183
  • Cloudwatch: CloudWatch no longer display metrics’ default alias #101514, thx @mtanda

Download Grafana 4.6.3 Now


From the Blogosphere

Walkthrough: Watch your Ansible deployments in Grafana!: Your graphs start spiking and your platform begins behaving abnormally. Did the config change in a deployment, causing the problem? This article covers Grafana’s new annotation functionality, and specifically, how to create deployment annotations via Ansible playbooks.

Application Monitoring in OpenShift with Prometheus and Grafana: There are many article describing how to monitor OpenShift with Prometheus running in the same cluster, but what if you don’t have admin permissions to the cluster you need to monitor?

Spring Boot Metrics Monitoring Using Prometheus & Grafana: As the title suggests, this post walks you through how to configure Prometheus and Grafana to monitor you Spring Boot application metrics.

How to Integrate Grafana with NetFlow: Learn how to monitor NetFlow from Scrutinizer using Grafana’s SimpleJSON data source.

Stream & Go: News Feeds for Over 300 Million End Users: Stream lets you build scalable newsfeeds and activity streams via their API, which is used by more than 300 million end users. In this article, they discuss their monitoring stack and why they chose particular components and technologies.


GrafanaCon EU Tickets are Going Fast!

We’re six weeks from kicking off GrafanaCon EU! Join us for talks from Google, Bloomberg, Tinder, eBay and more! You won’t want to miss two great days of open source monitoring talks and fun in Amsterdam. Get your tickets before they sell out!

Get Your Ticket Now


Grafana Plugins

We have a couple of plugin updates to share this week that add some new features and improvements. Updating your plugins is easy. For on-prem Grafana, use the Grafana-cli tool, or update with 1 click on your Hosted Grafana.

UPDATED PLUGIN

Druid Data Source – This new update is packed with new features. Notable enhancement include:

  • Post Aggregation feature
  • Support for thetaSketch
  • Improvements to the Query editor

Update Now

UPDATED PLUGIN

Breadcrumb Panel – The Breadcrumb Panel is a small panel you can include in your dashboard that tracks other dashboards you have visited – making it easy to navigate back to a previously visited dashboard. The latest release adds support for dashboards loaded from a file.

Update Now


Upcoming Events

In between code pushes we like to speak at, sponsor and attend all kinds of conferences and meetups. We also like to make sure we mention other Grafana-related events happening all over the world. If you’re putting on just such an event, let us know and we’ll list it here.

SnowCamp 2018: Yves Brissaud – Application metrics with Prometheus and Grafana | Grenoble, France – Jan 24, 2018:
We’ll take a look at how Prometheus, Grafana and a bit of code make it possible to obtain temporal data to visualize the state of our applications as well as to help with development and debugging.

Register Now

Women Who Go Berlin: Go Workshop – Monitoring and Troubleshooting using Prometheus and Grafana | Berlin, Germany – Jan 31, 2018: In this workshop we will learn about one of the most important topics in making apps production ready: Monitoring. We will learn how to use tools you’ve probably heard a lot about – Prometheus and Grafana, and using what we learn we will troubleshoot a particularly buggy Go app.

Register Now

FOSDEM | Brussels, Belgium – Feb 3-4, 2018: FOSDEM is a free developer conference where thousands of developers of free and open source software gather to share ideas and technology. There is no need to register; all are welcome.

Jfokus | Stockholm, Sweden – Feb 5-7, 2018:
Carl Bergquist – Quickie: Monitoring? Not OPS Problem

Why should we monitor our system? Why can’t we just rely on the operations team anymore? They use to be able to do that. What’s currently changing? Presentation content: – Why do we monitor our system – How did it use to work? – Whats changing – Why do we need to shift focus – Everyone should be on call. – Resilience is the goal (Best way of having someone care about quality is to make them responsible).

Register Now

Jfokus | Stockholm, Sweden – Feb 5-7, 2018:
Leonard Gram – Presentation: DevOps Deconstructed

What’s a Site Reliability Engineer and how’s that role different from the DevOps engineer my boss wants to hire? I really don’t want to be on call, should I? Is Docker the right place for my code or am I better of just going straight to Serverless? And why should I care about any of it? I’ll try to answer some of these questions while looking at what DevOps really is about and how commodisation of servers through “the cloud” ties into it all. This session will be an opinionated piece from a developer who’s been on-call for the past 6 years and would like to convince you to do the same, at least once.

Register Now

Stockholm Metrics and Monitoring | Stockholm, Sweden – Feb 7, 2018:
Observability 3 ways – Logging, Metrics and Distributed Tracing

Let’s talk about often confused telemetry tools: Logging, Metrics and Distributed Tracing. We’ll show how you capture latency using each of the tools and how they work differently. Through examples and discussion, we’ll note edge cases where certain tools have advantages over others. By the end of this talk, we’ll better understand how each of Logging, Metrics and Distributed Tracing aids us in different ways to understand our applications.

Register Now

OpenNMS – Introduction to “Grafana” | Webinar – Feb 21, 2018:
IT monitoring helps detect emerging hardware damage and performance bottlenecks in the enterprise network before any consequential damage or disruption to business processes occurs. The powerful open-source OpenNMS software monitors a network, including all connected devices, and provides logging of a variety of data that can be used for analysis and planning purposes. In our next OpenNMS webinar on February 21, 2018, we introduce “Grafana” – a web-based tool for creating and displaying dashboards from various data sources, which can be perfectly combined with OpenNMS.

Register Now


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard and show it off! #monitoringLove

As we say with pie charts, use emojis wisely 😉


Grafana Labs is Hiring!

We are passionate about open source software and thrive on tackling complex challenges to build the future. We ship code from every corner of the globe and love working with the community. If this sounds exciting, you’re in luck – WE’RE HIRING!

Check out our Open Positions


How are we doing?

That wraps up our 30th issue of TimeShift. What do you think? Are there other types of content you’d like to see here? Submit a comment on this issue below, or post something at our community forum.

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

NSA "Red Disk" Data Leak

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/11/nsa_red_disk_da.html

ZDNet is reporting about another data leak, this one from US Army’s Intelligence and Security Command (INSCOM), which is also within to the NSA.

The disk image, when unpacked and loaded, is a snapshot of a hard drive dating back to May 2013 from a Linux-based server that forms part of a cloud-based intelligence sharing system, known as Red Disk. The project, developed by INSCOM’s Futures Directorate, was slated to complement the Army’s so-called distributed common ground system (DCGS), a legacy platform for processing and sharing intelligence, surveillance, and reconnaissance information.

[…]

Red Disk was envisioned as a highly customizable cloud system that could meet the demands of large, complex military operations. The hope was that Red Disk could provide a consistent picture from the Pentagon to deployed soldiers in the Afghan battlefield, including satellite images and video feeds from drones trained on terrorists and enemy fighters, according to a Foreign Policy report.

[…]

Red Disk was a modular, customizable, and scalable system for sharing intelligence across the battlefield, like electronic intercepts, drone footage and satellite imagery, and classified reports, for troops to access with laptops and tablets on the battlefield. Marking files found in several directories imply the disk is “top secret,” and restricted from being shared to foreign intelligence partners.

A couple of points. One, this isn’t particularly sensitive. It’s an intelligence distribution system under development. It’s not raw intelligence. Two, this doesn’t seem to be classified data. Even the article hedges, using the unofficial term of “highly sensitive.” Three, it doesn’t seem that Chris Vickery, the researcher that discovered the data, has published it.

Chris Vickery, director of cyber risk research at security firm UpGuard, found the data and informed the government of the breach in October. The storage server was subsequently secured, though its owner remains unknown.

This doesn’t feel like a big deal to me.

Slashdot thread.

Game of Thrones Leaks “Carried Out By Former Iranian Military Hacker”

Post Syndicated from Andy original https://torrentfreak.com/game-of-thrones-leaks-carried-out-by-former-iranian-military-hacker-171122/

Late July it was reported that hackers had stolen proprietary information from media giant HBO.

The haul was said to include confidential details of the then-unreleased fourth episode of the latest Game of Thrones season, plus episodes of Ballers, Barry, Insecure, and Room 104.

“Hi to all mankind,” an email sent to reporters read. “The greatest leak of cyber space era is happening. What’s its name? Oh I forget to tell. Its HBO and Game of Thrones……!!!!!!”

In follow-up correspondence, the hackers claimed to have penetrated HBO’s internal network, gaining access to emails, technical platforms, and other confidential information.

Image released by the hackers

Soon after, HBO chairman and CEO Richard Plepler confirmed a breach at his company, telling employees that there had been a “cyber incident” in which information and programming had been taken.

“Any intrusion of this nature is obviously disruptive, unsettling, and disturbing for all of us. I can assure you that senior leadership and our extraordinary technology team, along with outside experts, are working round the clock to protect our collective interests,” he said.

During mid-August, problems persisted, with unreleased shows hitting the Internet. HBO appeared rattled by the ongoing incident, refusing to comment to the media on every new development. Now, however, it appears the tide is turning on HBO’s foe.

In a statement last evening, Joon H. Kim, Acting United States Attorney for the Southern District of New York, and William F. Sweeney Jr., Assistant Director-in-Charge of the New York Field Division of the FBI, announced the unsealing of an indictment charging a 29-year-old man with offenses carried out against HBO.

“Behzad Mesri, an Iranian national who had previously hacked computer systems for the Iranian military, allegedly infiltrated HBO’s systems, stole proprietary data, including scripts and plot summaries for unaired episodes of Game of Thrones, and then sought to extort HBO of $6 million in Bitcoins,” Kim said.

“Mesri now stands charged with federal crimes, and although not arrested today, he will forever have to look over his shoulder until he is made to face justice. American ingenuity and creativity is to be cultivated and celebrated — not hacked, stolen, and held for ransom. For hackers who test our resolve in protecting our intellectual property — even those hiding behind keyboards in countries far away — eventually, winter will come.”

According to the Department of Justice, Mesri honed his computer skills working for the Iranian military, conducting cyber attacks against enemy military systems, nuclear software, and Israeli infrastructure. He was also a member of the Turk Black Hat hacking team which defaced hundreds of websites with the online pseudonym “Skote Vahshat”.

The indictment states that Mesri began his campaign against HBO during May 2017, when he conducted “online reconnaissance” of HBO’s networks and employees. Between May and July, he then compromised a number of HBO employee user accounts and used them to access the company’s data and TV shows, copying them to his own machines.

After allegedly obtaining around 1.5 terabytes of HBO’s data, Mesri then began to extort HBO, warning that unless a ransom of $5.5 million wasn’t paid in Bitcoin, the leaking would begin. When the amount wasn’t paid, three days later Mesri told HBO that the amount had now risen to $6m and as an additional punishment, data could be wiped from HBO’s servers.

Subsequently, on or around July 30 and continuing through August 2017, Mesri allegedly carried through with his threats, leaking information and TV shows online and promoting them via emails to members of the press.

As a result of the above, Mesri is charged with one count of wire fraud, which carries a maximum sentence of 20 years in prison, one count of computer hacking (five years), three counts of threatening to impair the confidentiality of information (five years each), and one count of interstate transmission of an extortionate communication (two years). No copyright infringement offenses are mentioned in the indictment.

The big question now is whether the US will ever get their hands on Mesri. The answer to that, at least through any official channels, seems to be a resounding no. There is no extradition treaty between the US and Iran meaning that if Mesri stays put, he’s likely to remain a free man.

Wanted

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

New – Interactive AWS Cost Explorer API

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-interactive-aws-cost-explorer-api/

We launched the AWS Cost Explorer a couple of years ago in order to allow you to track, allocate, and manage your AWS costs. The response to that launch, and to additions that we have made since then, has been very positive. However our customers are, as Jeff Bezos has said, “beautifully, wonderfully, dissatisfied.”

I see this first-hand every day. We launch something and that launch inspires our customers to ask for even more. For example, with many customers going all-in and moving large parts of their IT infrastructure to the AWS Cloud, we’ve had many requests for the raw data that feeds into the Cost Explorer. These customers want to programmatically explore their AWS costs, update ledgers and accounting systems with per-application and per-department costs, and to build high-level dashboards that summarize spending. Some of these customers have been going to the trouble of extracting the data from the charts and reports provided by Cost Explorer!

New Cost Explorer API
Today we are making the underlying data that feeds into Cost Explorer available programmatically. The new Cost Explorer API gives you a set of functions that allow you do everything that I described above. You can retrieve cost and usage data that is filtered and grouped across multiple dimensions (Service, Linked Account, tag, Availability Zone, and so forth), aggregated by day or by month. This gives you the power to start simple (total monthly costs) and to refine your requests to any desired level of detail (writes to DynamoDB tables that have been tagged as production) while getting responses in seconds.

Here are the operations:

GetCostAndUsage – Retrieve cost and usage metrics for a single account or all accounts (master accounts in an organization have access to all member accounts) with filtering and grouping.

GetDimensionValues – Retrieve available filter values for a specified filter over a specified period of time.

GetTags – Retrieve available tag keys and tag values over a specified period of time.

GetReservationUtilization – Retrieve EC2 Reserved Instance utilization over a specified period of time, with daily or monthly granularity plus filtering and grouping.

I believe that these functions, and the data that they return, will give you the ability to do some really interesting things that will give you better insights into your business. For example, you could tag the resources used to support individual marketing campaigns or development projects and then deep-dive into the costs to measure business value. You how have the potential to know, down to the penny, how much you spend on infrastructure for important events like Cyber Monday or Black Friday.

Things to Know
Here are a couple of things to keep in mind as you start to think about ways to make use of the API:

Grouping – The Cost Explorer web application provides you with one level of grouping; the APIs give you two. For example you could group costs or RI utilization by Service and then by Region.

Pagination – The functions can return very large amounts of data and follow the AWS-wide model for pagination by including a nextPageToken if additional data is available. You simply call the same function again, supplying the token, to move forward.

Regions – The service endpoint is in the US East (Northern Virginia) Region and returns usage data for all public AWS Regions.

Pricing – Each API call costs $0.01. To put this into perspective, let’s say you use this API to build a dashboard and it gets 1000 hits per month from your users. Your operating cost for the dashboard should be $10 or so; this is far less expensive than setting up your own systems to extract & ingest the data and respond to interactive queries.

The Cost Explorer API is available now and you can start using it today. To learn more, read about the Cost Explorer API.

Jeff;

Nazis, are bad

Post Syndicated from Eevee original https://eev.ee/blog/2017/08/13/nazis-are-bad/

Anonymous asks:

Could you talk about something related to the management/moderation and growth of online communities? IOW your thoughts on online community management, if any.

I think you’ve tweeted about this stuff in the past so I suspect you have thoughts on this, but if not, again, feel free to just blog about … anything 🙂

Oh, I think I have some stuff to say about community management, in light of recent events. None of it hasn’t already been said elsewhere, but I have to get this out.

Hopefully the content warning is implicit in the title.


I am frustrated.

I’ve gone on before about a particularly bothersome phenomenon that hurts a lot of small online communities: often, people are willing to tolerate the misery of others in a community, but then get up in arms when someone pushes back. Someone makes a lot of off-hand, off-color comments about women? Uses a lot of dog-whistle terms? Eh, they’re not bothering anyone, or at least not bothering me. Someone else gets tired of it and tells them to knock it off? Whoa there! Now we have the appearance of conflict, which is unacceptable, and people will turn on the person who’s pissed off — even though they’ve been at the butt end of an invisible conflict for who knows how long. The appearance of peace is paramount, even if it means a large chunk of the population is quietly miserable.

Okay, so now, imagine that on a vastly larger scale, and also those annoying people who know how to skirt the rules are Nazis.


The label “Nazi” gets thrown around a lot lately, probably far too easily. But when I see a group of people doing the Hitler salute, waving large Nazi flags, wearing Nazi armbands styled after the SS, well… if the shoe fits, right? I suppose they might have flown across the country to join a torch-bearing mob ironically, but if so, the joke is going way over my head. (Was the murder ironic, too?) Maybe they’re not Nazis in the sense that the original party doesn’t exist any more, but for ease of writing, let’s refer to “someone who espouses Nazi ideology and deliberately bears a number of Nazi symbols” as, well, “a Nazi”.

This isn’t a new thing, either; I’ve stumbled upon any number of Twitter accounts that are decorated in Nazi regalia. I suppose the trouble arises when perfectly innocent members of the alt-right get unfairly labelled as Nazis.

But hang on; this march was called “Unite the Right” and was intended to bring together various far right sub-groups. So what does their choice of aesthetic say about those sub-groups? I haven’t heard, say, alt-right coiner Richard Spencer denounce the use of Nazi symbology — extra notable since he was fucking there and apparently didn’t care to discourage it.


And so begins the rule-skirting. “Nazi” is definitely overused, but even using it to describe white supremacists who make not-so-subtle nods to Hitler is likely to earn you some sarcastic derailment. A Nazi? Oh, so is everyone you don’t like and who wants to establish a white ethno state a Nazi?

Calling someone a Nazi — or even a white supremacist — is an attack, you see. Merely expressing the desire that people of color not exist is perfectly peaceful, but identifying the sentiment for what it is causes visible discord, which is unacceptable.

These clowns even know this sort of thing and strategize around it. Or, try, at least. Maybe it wasn’t that successful this weekend — though flicking through Charlottesville headlines now, they seem to be relatively tame in how they refer to the ralliers.

I’m reminded of a group of furries — the alt-furries — who have been espousing white supremacy and wearing red armbands with a white circle containing a black… pawprint. Ah, yes, that’s completely different.


So, what to do about this?

Ignore them” is a popular option, often espoused to bullied children by parents who have never been bullied, shortly before they resume complaining about passive-aggressive office politics. The trouble with ignoring them is that, just like in smaller communitiest, they have a tendency to fester. They take over large chunks of influential Internet surface area like 4chan and Reddit; they help get an inept buffoon elected; and then they start to have torch-bearing rallies and run people over with cars.

4chan illustrates a kind of corollary here. Anyone who’s steeped in Internet Culture™ is surely familiar with 4chan; I was never a regular visitor, but it had enough influence that I was still aware of it and some of its culture. It was always thick with irony, which grew into a sort of ironic detachment — perhaps one of the major sources of the recurring online trope that having feelings is bad — which proceeded into ironic racism.

And now the ironic racism is indistinguishable from actual racism, as tends to be the case. Do they “actually” “mean it”, or are they just trying to get a rise out of people? What the hell is unironic racism if not trying to get a rise out of people? What difference is there to onlookers, especially as they move to become increasingly involved with politics?

It’s just a joke” and “it was just a thoughtless comment” are exceptionally common defenses made by people desperate to preserve the illusion of harmony, but the strain of overt white supremacy currently running rampant through the US was built on those excuses.


The other favored option is to debate them, to defeat their ideas with better ideas.

Well, hang on. What are their ideas, again? I hear they were chanting stuff like “go back to Africa” and “fuck you, faggots”. Given that this was an overtly political rally (and again, the Nazi fucking regalia), I don’t think it’s a far cry to describe their ideas as “let’s get rid of black people and queer folks”.

This is an underlying proposition: that white supremacy is inherently violent. After all, if the alt-right seized total political power, what would they do with it? If I asked the same question of Democrats or Republicans, I’d imagine answers like “universal health care” or “screw over poor people”. But people whose primary goal is to have a country full of only white folks? What are they going to do, politely ask everyone else to leave? They’re invoking the memory of people who committed genocide and also tried to take over the fucking world. They are outright saying, these are the people we look up to, this is who we think had a great idea.

How, precisely, does one defeat these ideas with rational debate?

Because the underlying core philosophy beneath all this is: “it would be good for me if everything were about me”. And that’s true! (Well, it probably wouldn’t work out how they imagine in practice, but it’s true enough.) Consider that slavery is probably fantastic if you’re the one with the slaves; the issue is that it’s reprehensible, not that the very notion contains some kind of 101-level logical fallacy. That’s probably why we had a fucking war over it instead of hashing it out over brunch.

…except we did hash it out over brunch once, and the result was that slavery was still allowed but slaves only counted as 60% of a person for the sake of counting how much political power states got. So that’s how rational debate worked out. I’m sure the slaves were thrilled with that progress.


That really only leaves pushing back, which raises the question of how to push back.

And, I don’t know. Pushing back is much harder in spaces you don’t control, spaces you’re already struggling to justify your own presence in. For most people, that’s most spaces. It’s made all the harder by that tendency to preserve illusory peace; even the tamest request that someone knock off some odious behavior can be met by pushback, even by third parties.

At the same time, I’m aware that white supremacists prey on disillusioned young white dudes who feel like they don’t fit in, who were promised the world and inherited kind of a mess. Does criticism drive them further away? The alt-right also opposes “political correctness”, i.e. “not being a fucking asshole”.

God knows we all suck at this kind of behavior correction, even within our own in-groups. Fandoms have become almost ridiculously vicious as platforms like Twitter and Tumblr amplify individual anger to deafening levels. It probably doesn’t help that we’re all just exhausted, that every new fuck-up feels like it bears the same weight as the last hundred combined.

This is the part where I admit I don’t know anything about people and don’t have any easy answers. Surprise!


The other alternative is, well, punching Nazis.

That meme kind of haunts me. It raises really fucking complicated questions about when violence is acceptable, in a culture that’s completely incapable of answering them.

America’s relationship to violence is so bizarre and two-faced as to be almost incomprehensible. We worship it. We have the biggest military in the world by an almost comical margin. It’s fairly mainstream to own deadly weapons for the express stated purpose of armed revolution against the government, should that become necessary, where “necessary” is left ominously undefined. Our movies are about explosions and beating up bad guys; our video games are about explosions and shooting bad guys. We fantasize about solving foreign policy problems by nuking someone — hell, our talking heads are currently in polite discussion about whether we should nuke North Korea and annihilate up to twenty-five million people, as punishment for daring to have the bomb that only we’re allowed to have.

But… violence is bad.

That’s about as far as the other side of the coin gets. It’s bad. We condemn it in the strongest possible terms. Also, guess who we bombed today?

I observe that the one time Nazis were a serious threat, America was happy to let them try to take over the world until their allies finally showed up on our back porch.

Maybe I don’t understand what “violence” means. In a quest to find out why people are talking about “leftist violence” lately, I found a National Review article from May that twice suggests blocking traffic is a form of violence. Anarchists have smashed some windows and set a couple fires at protests this year — and, hey, please knock that crap off? — which is called violence against, I guess, Starbucks. Black Lives Matter could be throwing a birthday party and Twitter would still be abuzz with people calling them thugs.

Meanwhile, there’s a trend of murderers with increasingly overt links to the alt-right, and everyone is still handling them with kid gloves. First it was murders by people repeating their talking points; now it’s the culmination of a torches-and-pitchforks mob. (Ah, sorry, not pitchforks; assault rifles.) And we still get this incredibly bizarre both-sides-ism, a White House that refers to the people who didn’t murder anyone as “just as violent if not more so“.


Should you punch Nazis? I don’t know. All I know is that I’m extremely dissatisfied with discourse that’s extremely alarmed by hypothetical punches — far more mundane than what you’d see after a sporting event — but treats a push for ethnic cleansing as a mere difference of opinion.

The equivalent to a punch in an online space is probably banning, which is almost laughable in comparison. It doesn’t cause physical harm, but it is a use of concrete force. Doesn’t pose quite the same moral quandary, though.

Somewhere in the middle is the currently popular pastime of doxxing (doxxxxxxing) people spotted at the rally in an attempt to get them fired or whatever. Frankly, that skeeves me out, though apparently not enough that I’m directly chastizing anyone for it.


We aren’t really equipped, as a society, to deal with memetic threats. We aren’t even equipped to determine what they are. We had a fucking world war over this, and now people are outright saying “hey I’m like those people we went and killed a lot in that world war” and we give them interviews and compliment their fashion sense.

A looming question is always, what if they then do it to you? What if people try to get you fired, to punch you for your beliefs?

I think about that a lot, and then I remember that it’s perfectly legal to fire someone for being gay in half the country. (Courts are currently wrangling whether Title VII forbids this, but with the current administration, I’m not optimistic.) I know people who’ve been fired for coming out as trans. I doubt I’d have to look very far to find someone who’s been punched for either reason.

And these aren’t even beliefs; they’re just properties of a person. You can stop being a white supremacist, one of those people yelling “fuck you, faggots”.

So I have to recuse myself from this asinine question, because I can’t fairly judge the risk of retaliation when it already happens to people I care about.

Meanwhile, if a white supremacist does get punched, I absolutely still want my tax dollars to pay for their universal healthcare.


The same wrinkle comes up with free speech, which is paramount.

The ACLU reminds us that the First Amendment “protects vile, hateful, and ignorant speech”. I think they’ve forgotten that that’s a side effect, not the goal. No one sat down and suggested that protecting vile speech was some kind of noble cause, yet that’s how we seem to be treating it.

The point was to avoid a situation where the government is arbitrarily deciding what qualifies as vile, hateful, and ignorant, and was using that power to eliminate ideas distasteful to politicians. You know, like, hypothetically, if they interrogated and jailed a bunch of people for supporting the wrong economic system. Or convicted someone under the Espionage Act for opposing the draft. (Hey, that’s where the “shouting fire in a crowded theater” line comes from.)

But these are ideas that are already in the government. Bannon, a man who was chair of a news organization he himself called “the platform for the alt-right”, has the President’s ear! How much more mainstream can you get?

So again I’m having a little trouble balancing “we need to defend the free speech of white supremacists or risk losing it for everyone” against “we fairly recently were ferreting out communists and the lingering public perception is that communists are scary, not that the government is”.


This isn’t to say that freedom of speech is bad, only that the way we talk about it has become fanatical to the point of absurdity. We love it so much that we turn around and try to apply it to corporations, to platforms, to communities, to interpersonal relationships.

Look at 4chan. It’s completely public and anonymous; you only get banned for putting the functioning of the site itself in jeopardy. Nothing is stopping a larger group of people from joining its politics board and tilting sentiment the other way — except that the current population is so odious that no one wants to be around them. Everyone else has evaporated away, as tends to happen.

Free speech is great for a government, to prevent quashing politics that threaten the status quo (except it’s a joke and they’ll do it anyway). People can’t very readily just bail when the government doesn’t like them, anyway. It’s also nice to keep in mind to some degree for ubiquitous platforms. But the smaller you go, the easier it is for people to evaporate away, and the faster pure free speech will turn the place to crap. You’ll be left only with people who care about nothing.


At the very least, it seems clear that the goal of white supremacists is some form of destabilization, of disruption to the fabric of a community for purely selfish purposes. And those are the kinds of people you want to get rid of as quickly as possible.

Usually this is hard, because they act just nicely enough to create some plausible deniability. But damn, if someone is outright telling you they love Hitler, maybe skip the principled hand-wringing and eject them.

Growing up alongside tech

Post Syndicated from Eevee original https://eev.ee/blog/2017/08/09/growing-up-alongside-tech/

IndustrialRobot asks… or, uh, asked last month:

industrialrobot: How has your views on tech changed as you’ve got older?

This is so open-ended that it’s actually stumped me for a solid month. I’ve had a surprisingly hard time figuring out where to even start.


It’s not that my views of tech have changed too much — it’s that they’ve changed very gradually. Teasing out and explaining any one particular change is tricky when it happened invisibly over the course of 10+ years.

I think a better framework for this is to consider how my relationship to tech has changed. It’s gone through three pretty distinct phases, each of which has strongly colored how I feel and talk about technology.

Act I

In which I start from nothing.

Nothing is an interesting starting point. You only really get to start there once.

Learning something on my own as a kid was something of a magical experience, in a way that I don’t think I could replicate as an adult. I liked computers; I liked toying with computers; so I did that.

I don’t know how universal this is, but when I was a kid, I couldn’t even conceive of how incredible things were made. Buildings? Cars? Paintings? Operating systems? Where does any of that come from? Obviously someone made them, but it’s not the sort of philosophical point I lingered on when I was 10, so in the back of my head they basically just appeared fully-formed from the æther.

That meant that when I started trying out programming, I had no aspirations. I couldn’t imagine how far I would go, because all the examples of how far I would go were completely disconnected from any idea of human achievement. I started out with BASIC on a toy computer; how could I possibly envision a connection between that and something like a mainstream video game? Every new thing felt like a new form of magic, so I couldn’t conceive that I was even in the same ballpark as whatever process produced real software. (Even seeing the source code for GORILLAS.BAS, it didn’t quite click. I didn’t think to try reading any of it until years after I’d first encountered the game.)

This isn’t to say I didn’t have goals. I invented goals constantly, as I’ve always done; as soon as I learned about a new thing, I’d imagine some ways to use it, then try to build them. I produced a lot of little weird goofy toys, some of which entertained my tiny friend group for a couple days, some of which never saw the light of day. But none of it felt like steps along the way to some mountain peak of mastery, because I didn’t realize the mountain peak was even a place that could be gone to. It was pure, unadulterated (!) playing.

I contrast this to my art career, which started only a couple years ago. I was already in my late 20s, so I’d already spend decades seeing a very broad spectrum of art: everything from quick sketches up to painted masterpieces. And I’d seen the people who create that art, sometimes seen them create it in real-time. I’m even in a relationship with one of them! And of course I’d already had the experience of advancing through tech stuff and discovering first-hand that even the most amazing software is still just code someone wrote.

So from the very beginning, from the moment I touched pencil to paper, I knew the possibilities. I knew that the goddamn Sistine Chapel was something I could learn to do, if I were willing to put enough time in — and I knew that I’m not, so I’d have to settle somewhere a ways before that. I knew that I’d have to put an awful lot of work in before I’d be producing anything very impressive.

I did it anyway (though perhaps waited longer than necessary to start), but those aren’t things I can un-know, and so I can never truly explore art from a place of pure ignorance. On the other hand, I’ve probably learned to draw much more quickly and efficiently than if I’d done it as a kid, precisely because I know those things. Now I can decide I want to do something far beyond my current abilities, then go figure out how to do it. When I was just playing, that kind of ambition was impossible.


So, I played.

How did this affect my views on tech? Well, I didn’t… have any. Learning by playing tends to teach you things in an outward sprawl without many abrupt jumps to new areas, so you don’t tend to run up against conflicting information. The whole point of opinions is that they’re your own resolution to a conflict; without conflict, I can’t meaningfully say I had any opinions. I just accepted whatever I encountered at face value, because I didn’t even know enough to suspect there could be alternatives yet.

Act II

That started to seriously change around, I suppose, the end of high school and beginning of college. I was becoming aware of this whole “open source” concept. I took classes that used languages I wouldn’t otherwise have given a second thought. (One of them was Python!) I started to contribute to other people’s projects. Eventually I even got a job, where I had to work with other people. It probably also helped that I’d had to maintain my own old code a few times.

Now I was faced with conflicting subjective ideas, and I had to form opinions about them! And so I did. With gusto. Over time, I developed an idea of what was Right based on experience I’d accrued. And then I set out to always do things Right.

That’s served me decently well with some individual problems, but it also led me to inflict a lot of unnecessary pain on myself. Several endeavors languished for no other reason than my dissatisfaction with the architecture, long before the basic functionality was done. I started a number of “pure” projects around this time, generic tools like imaging libraries that I had no direct need for. I built them for the sake of them, I guess because I felt like I was improving some niche… but of course I never finished any. It was always in areas I didn’t know that well in the first place, which is a fine way to learn if you have a specific concrete goal in mind — but it turns out that building a generic library for editing images means you have to know everything about images. Perhaps that ambition went a little haywire.

I’ve said before that this sort of (self-inflicted!) work was unfulfilling, in part because the best outcome would be that a few distant programmers’ lives are slightly easier. I do still think that, but I think there’s a deeper point here too.

In forgetting how to play, I’d stopped putting any of myself in most of the work I was doing. Yes, building an imaging library is kind of a slog that someone has to do, but… I assume the people who work on software like PIL and ImageMagick are actually interested in it. The few domains I tried to enter and revolutionize weren’t passions of mine; I just happened to walk through the neighborhood one day and decided I could obviously do it better.

Not coincidentally, this was the same era of my life that led me to write stuff like that PHP post, which you may notice I am conspicuously not even linking to. I don’t think I would write anything like it nowadays. I could see myself approaching the same subject, but purely from the point of view of language design, with more contrasts and tradeoffs and less going for volume. I certainly wouldn’t lead off with inflammatory puffery like “PHP is a community of amateurs”.

Act III

I think I’ve mellowed out a good bit in the last few years.

It turns out that being Right is much less important than being Not Wrong — i.e., rather than trying to make something perfect that can be adapted to any future case, just avoid as many pitfalls as possible. Code that does something useful has much more practical value than unfinished code with some pristine architecture.

Nowhere is this more apparent than in game development, where all code is doomed to be crap and the best you can hope for is to stem the tide. But there’s also a fixed goal that’s completely unrelated to how the code looks: does the game work, and is it fun to play? Yes? Ship the damn thing and forget about it.

Games are also nice because it’s very easy to pour my own feelings into them and evoke feelings in the people who play them. They’re mine, something with my fingerprints on them — even the games I’ve built with glip have plenty of my own hallmarks, little touches I added on a whim or attention to specific details that I care about.

Maybe a better example is the Doom map parser I started writing. It sounds like a “pure” problem again, except that I actually know an awful lot about the subject already! I also cleverly (accidentally) released some useful results of the work I’ve done thusfar — like statistics about Doom II maps and a few screenshots of flipped stock maps — even though I don’t think the parser itself is far enough along to release yet. The tool has served a purpose, one with my fingerprints on it, even without being released publicly. That keeps it fresh in my mind as something interesting I’d like to keep working on, eventually. (When I run into an architecture question, I step back for a while, or I do other work in the hopes that the solution will reveal itself.)

I also made two simple Pokémon ROM hacks this year, despite knowing nothing about Game Boy internals or assembly when I started. I just decided I wanted to do an open-ended thing beyond my reach, and I went to do it, not worrying about cleanliness and willing to accept a bumpy ride to get there. I played, but in a more experienced way, invoking the stuff I know (and the people I’ve met!) to help me get a running start in completely unfamiliar territory.


This feels like a really fine distinction that I’m not sure I’m doing justice. I don’t know if I could’ve appreciated it three or four years ago. But I missed making toys, and I’m glad I’m doing it again.

In short, I forgot how to have fun with programming for a little while, and I’ve finally started to figure it out again. And that’s far more important than whether you use PHP or not.

New – GPU-Powered Streaming Instances for Amazon AppStream 2.0

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-gpu-powered-streaming-instances-for-amazon-appstream-2-0/

We launched Amazon AppStream 2.0 at re:Invent 2016. This application streaming service allows you to deliver Windows applications to a desktop browser.

AppStream 2.0 is fully managed and provides consistent, scalable performance by running applications on general purpose, compute optimized, and memory optimized streaming instances, with delivery via NICE DCV – a secure, high-fidelity streaming protocol. Our enterprise and public sector customers have started using AppStream 2.0 in place of legacy application streaming environments that are installed on-premises. They use AppStream 2.0 to deliver both commercial and line of business applications to a desktop browser. Our ISV customers are using AppStream 2.0 to move their applications to the cloud as-is, with no changes to their code. These customers focus on demos, workshops, and commercial SaaS subscriptions.

We are getting great feedback on AppStream 2.0 and have been adding new features very quickly (even by AWS standards). So far this year we have added an image builder, federated access via SAML 2.0, CloudWatch monitoring, Fleet Auto Scaling, Simple Network Setup, persistent storage for user files (backed by Amazon S3), support for VPC security groups, and built-in user management including web portals for users.

New GPU-Powered Streaming Instances
Many of our customers have told us that they want to use AppStream 2.0 to deliver specialized design, engineering, HPC, and media applications to their users. These applications are generally graphically intensive and are designed to run on expensive, high-end PCs in conjunction with a GPU (Graphics Processing Unit). Due to the hardware requirements of these applications, cost considerations have traditionally kept them out of situations where part-time or occasional access would otherwise make sense. Recently, another requirement has come to the forefront. These applications almost always need shared, read-write access to large amounts of sensitive data that is best stored, processed, and secured in the cloud. In order to meet the needs of these users and applications, we are launching two new types of streaming instances today:

Graphics Desktop – Based on the G2 instance type, Graphics Desktop instances are designed for desktop applications that use the CUDA, DirectX, or OpenGL for rendering. These instances are equipped with 15 GiB of memory and 8 vCPUs. You can select this instance family when you build an AppStream image or configure an AppStream fleet:

Graphics Pro – Based on the brand-new G3 instance type, Graphics Pro instances are designed for high-end, high-performance applications that can use the NVIDIA APIs and/or need access to large amounts of memory. These instances are available in three sizes, with 122 to 488 GiB of memory and 16 to 64 vCPUs. Again, you can select this instance family when you configure an AppStream fleet:

To learn more about how to launch, run, and scale a streaming application environment, read Scaling Your Desktop Application Streams with Amazon AppStream 2.0.

As I noted earlier, you can use either of these two instance types to build an AppStream image. This will allow you to test and fine tune your applications and to see the instances in action.

Streaming Instances in Action
We’ve been working with several customers during a private beta program for the new instance types. Here are a few stories (and some cool screen shots) to show you some of the applications that they are streaming via AppStream 2.0:

AVEVA is a world leading provider of engineering design and information management software solutions for the marine, power, plant, offshore and oil & gas industries. As part of their work on massive capital projects, their customers need to bring many groups of specialist engineers together to collaborate on the creation of digital assets. In order to support this requirement, AVEVA is building SaaS solutions that combine the streamed delivery of engineering applications with access to a scalable project data environment that is shared between engineers across the globe. The new instances will allow AVEVA to deliver their engineering design software in SaaS form while maximizing quality and performance. Here’s a screen shot of their Everything 3D app being streamed from AppStream:

Nissan, a Japanese multinational automobile manufacturer, trains its automotive specialists using 3D simulation software running on expensive graphics workstations. The training software, developed by The DiSti Corporation, allows its specialists to simulate maintenance processes by interacting with realistic 3D models of the vehicles they work on. AppStream 2.0’s new graphics capability now allows Nissan to deliver these training tools in real time, with up to date content, to a desktop browser running on low-cost commodity PCs. Their specialists can now interact with highly realistic renderings of a vehicle that allows them to train for and plan maintenance operations with higher efficiency.

Cornell University is an American private Ivy League and land-grant doctoral university located in Ithaca, New York. They deliver advanced 3D tools such as AutoDesk AutoCAD and Inventor to students and faculty to support their course work, teaching, and research. Until now, these tools could only be used on GPU-powered workstations in a lab or classroom. AppStream 2.0 allows them to deliver the applications to a web browser running on any desktop, where they run as if they were on a local workstation. Their users are no longer limited by available workstations in labs and classrooms, and can bring their own devices and have access to their course software. This increased flexibility also means that faculty members no longer need to take lab availability into account when they build course schedules. Here’s a copy of Autodesk Inventor Professional running on AppStream at Cornell:

Now Available
Both of the graphics streaming instance families are available in the US East (Northern Virginia), US West (Oregon), EU (Ireland), and Asia Pacific (Tokyo) Regions and you can start streaming from them today. Your applications must run in a Windows 2012 R2 environment, and can make use of DirectX, OpenGL, CUDA, OpenCL, and Vulkan.

With prices in the US East (Northern Virginia) Region starting at $0.50 per hour for Graphics Desktop instances and $2.05 per hour for Graphics Pro instances, you can now run your simulation, visualization, and HPC workloads in the AWS Cloud on an economical, pay-by-the-hour basis. You can also take advantage of fast, low-latency access to Amazon Elastic Compute Cloud (EC2), Amazon Simple Storage Service (S3), AWS Lambda, Amazon Redshift, and other AWS services to build processing workflows that handle pre- and post-processing of your data.

Jeff;

 

Journey into Deep Learning with AWS

Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/journey-into-deep-learning-with-aws/

If you are anything like me, Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning are completely fascinating and exciting topics. As AI, ML, and Deep Learning become more widely used, for me it means that the science fiction written by Dr. Issac Asimov, the robotics and medical advancements in Star Wars, and the technologies that enabled Captain Kirk and his Star Trek crew “to boldly go where no man has gone before” can become achievable realities.

 

Most people interested in the aforementioned topics are familiar with the AI and ML solutions enabled by Deep Learning, such as Convolutional Neural Networks for Image and Video Classification, Speech Recognition, Natural Language interfaces, and Recommendation Engines. However, it is not always an easy task setting up the infrastructure, environment, and tools to enable data scientists, machine learning practitioners, research scientists, and deep learning hobbyists/advocates to dive into these technologies. Most developers desire to go quickly from getting started with deep learning to training models and developing solutions using deep learning technologies.

For these reasons, I would like to share some resources that will help to quickly build deep learning solutions whether you are an experienced data scientist or a curious developer wanting to get started.

Deep Learning Resources

The Apache MXNet is Amazon’s deep learning framework of choice. With the power of Apache MXNet framework and NVIDIA GPU computing, you can launch your scalable deep learning projects and solutions easily on the AWS Cloud. As you get started on your MxNet deep learning quest, there are a variety of self-service tutorials and datasets available to you:

  • Launch an AWS Deep Learning AMI: This guide walks you through the steps to launch the AWS Deep Learning AMI with Ubuntu
  • MXNet – Create a computer vision application: This hands-on tutorial uses a pre-built notebook to walk you through using neural networks to build a computer vision application to identify handwritten digits
  • AWS Machine Learning Datasets: AWS hosts datasets for Machine Learning on the AWS Marketplace that you can access for free. These large datasets are available for anyone to analyze the data without requiring the data to be downloaded or stored.
  • Predict and Extract – Learn to use pre-trained models for predictions: This hands-on tutorial will walk you through how to use pre-trained model for predicting and feature extraction using the full Imagenet dataset.

 

AWS Deep Learning AMIs

AWS offers Amazon Machine Images (AMIs) for use on Amazon EC2 for quick deployment of an infrastructure needed to start your deep learning journey. The AWS Deep Learning AMIs are pre-configured with popular deep learning frameworks built using Amazon EC2 instances on Amazon Linux, and Ubuntu that can be launched for AI targeted solutions and models. The deep learning frameworks supported and pre-configured on the deep learning AMI are:

  • Apache MXNet
  • TensorFlow
  • Microsoft Cognitive Toolkit (CNTK)
  • Caffe
  • Caffe2
  • Theano
  • Torch
  • Keras

Additionally, the AWS Deep Learning AMIs install preconfigured libraries for Jupyter notebooks with Python 2.7/3.4, AWS SDK for Python, and other data science related python packages and dependencies. The AMIs also come with NVIDIA CUDA and NVIDIA CUDA Deep Neural Network (cuDNN) libraries preinstalled with all the supported deep learning frameworks and the Intel Math Kernel Library is installed for Apache MXNet framework. You can launch any of the Deep Learning AMIs by visiting the AWS Marketplace using the Try the Deep Learning AMIs link.

Summary

It is a great time to dive into Deep Learning. You can accelerate your work in deep learning by using the AWS Deep Learning AMIs running on the AWS cloud to get your deep learning environment running quickly or get started learning more about Deep Learning on AWS with MXNet using the AWS self-service resources.  Of course, you can learn even more information about Deep Learning, Machine Learning, and Artificial Intelligence on AWS by reviewing the AWS Deep Learning page, the Amazon AI product page, and the AWS AI Blog.

May the Deep Learning Force be with you all.

Tara

No, Yahoo! isn’t changing its name

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/01/no-yahoo-isnt-changing-its-name.html

Trending on social media is how Yahoo is changing it’s name to “Altaba” and CEO Marissa Mayer is stepping down. This is false.

What is happening instead is that everything we know of as “Yahoo” (including the brand name) is being sold to Verizon. The bits that are left are a skeleton company that holds stock in Alibaba and a few other companies. Since the brand was sold to Verizon, that investment company could no longer use it, so chose “Altaba”. Since 83% of its investment is in Alibabi, “Altaba” makes sense. It’s not like this new brand name means anything — the skeleton investment company will be wound down in the next year, either as a special dividend to investors, sold off to Alibaba, or both.

Marissa Mayer is an operations CEO. Verizon didn’t want her to run their newly acquired operations, since the entire point of buying them was to take the web operations in a new direction (though apparently she’ll still work a bit with them through the transition). And of course she’s not an appropriate CEO for an investment company. So she had no job left — she made her own job disappear.

What happened today is an obvious consequence of Alibaba going IPO in September 2014. It meant that Yahoo’s stake of 16% in Alibaba was now liquid. All told, the investment arm of Yahoo was worth $36-billion while the web operations (Mail, Fantasy, Tumblr, etc.) was worth only $5-billion.

In other words, Yahoo became a Wall Street mutual fund who inexplicably also offered web mail and cat videos.

Such a thing cannot exist. If Yahoo didn’t act, shareholders would start suing the company to get their money back.That $36-billion in investments doesn’t belong to Yahoo, it belongs to its shareholders. Thus, the moment the Alibaba IPO closed, Yahoo started planning on how to separate the investment arm from the web operations.

Yahoo had basically three choices.

  • The first choice is simply give the Alibaba (and other investment) shares as a one time dividend to Yahoo shareholders. 
  • A second choice is simply split the company in two, one of which has the investments, and the other the web operations. 
  • The third choice is to sell off the web operations to some chump like Verizon.

Obviously, Marissa Mayer took the third choice. Without a slushfund (the investment arm) to keep it solvent, Yahoo didn’t feel it could run its operations profitably without integration with some other company. That meant it either had to buy a large company to integrate with Yahoo, or sell the Yahoo portion to some other large company.

Every company, especially Internet ones, have a legacy value. It’s the amount of money you’ll get from firing everyone, stop investing in the future, and just raking in year after year a stream of declining revenue. It’s the fate of early Internet companies like Earthlink and Slashdot. It’s like how I documented with Earthlink [*], which continues to offer email to subscribers, but spends only enough to keep the lights on, not even upgrading to the simplest of things like SSL.

Presumably, Verizon will try to make something of a few of the properties. Apparently, Yahoo’s Fantasy sports stuff is popular, and will probably be rebranded as some new Verizon thing. Tumblr is already it’s own brand name, independent of Yahoo, and thus will probably continue to exist as its own business unit.

One of the weird things is Yahoo Mail. It permanently bound to the “yahoo.com” domain, so you can’t do much with the “Yahoo” brand without bringing Mail along with it. Though at this point, the “Yahoo” brand is pretty tarnished. There’s not much new you can put under that brand anyway. I can’t see how Verizon would want to invest in that brand at all — just milk it for what it can over the coming years.

The investment company cannot long exist on its own. Investors want their money back, so they can make future investment decisions on their own. They don’t want the company to make investment choices for them.

Think about when Yahoo made its initial $1-billion investment for 40% of Alibaba in 2005, it did not do so because it was a good “investment opportunity”, but because Yahoo believed it was good strategic investment, such as providing an entry in the Chinese market, or providing an e-commerce arm to compete against eBay and Amazon. In other words, Yahoo didn’t consider as a good way of investing its money, but a good way to create a strategic partnership — one that just never materialized. From that point of view, the Alibaba investment was a failure.

In 2012, Marissa Mayer sold off 25% of Alibaba, netting $4-billion after taxes. She then lost all $4-billion on the web operations. That stake would be worth over $50-billion today. You can see the problem: companies with large slush funds just fritter them away keeping operations going. Marissa Mayer abused her position of trust, playing with money that belong to shareholders.

Thus, Altbaba isn’t going to play with shareholder’s money. It’s a skeleton company, so there’s no strategic value to investments. They can make no better investment choices than its shareholders can with their own money. Thus, the only purpose of the skeleton investment company is to return the money back to the shareholders. I suspect it’ll choose the most tax efficient way of doing this, like selling the whole thing to Alibaba, which just exchanges the Altaba shares for Alibaba shares, with a 15% bonus representing the value of the other Altaba investments. Either way, if Altaba is still around a year from now, it’s because it’s board is skimming money that doesn’t belong to them.


Key points:

  • Altaba is the name of the remaining skeleton investment company, the “Yahoo” brand was sold with the web operations to Verizon.
  • The name Altaba sucks because it’s not a brand name that will stick around for a while — the skeleton company is going to return all its money to its investors.
  • Yahoo had to spin off its investments — there’s no excuse for 90% of its market value to be investments and 10% in its web operations.
  • In particular, the money belongs to Yahoo’s investors, not Yahoo the company. It’s not some sort of slush fund Yahoo’s executives could use. Yahoo couldn’t use that money to keep its flailing web operations going, as Marissa Mayer was attempting to do.
  • Most of Yahoo’s web operations will go the way of Earthlink and Slashdot, as Verizon milks the slowly declining revenue while making no new investments in it.

Presenting an Open Source Toolkit for Lightweight Multilingual Entity Linking

Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/154168092396

yahooresearch:

By Aasish Pappu, Roi Blanco, and Amanda Stent

What’s the first thing you want to know about any kind of text document (like a Yahoo News or Yahoo Sports article)? What it’s about, of course! That means you want to know something about the people, organizations, and locations that are mentioned in the document. Systems that automatically surface this information are called named entity recognition and linking systems. These are one of the most useful components in text analytics as they are required for a wide variety of applications including search, recommender systems, question answering, and sentiment analysis.

Named entity recognition and linking systems use statistical models trained over large amounts of labeled text data. A major challenge is to be able to accurately detect entities, in new languages, at scale, with limited labeled data available, and while consuming a limited amount of resources (memory and processing power).

After researching and implementing solutions to enhance our own personalization technology, we are pleased to offer the open source community Fast Entity Linker, our unsupervised, accurate, and extensible multilingual named entity recognition and linking system, along with datapacks for English, Spanish, and Chinese.

For broad usability, our system links text entity mentions to Wikipedia. For example, in the sentence Yahoo is a company headquartered in Sunnyvale, CA with Marissa Mayer as CEO, our system would identify the following entities:

On the algorithmic side, we use entity embeddings, click-log data, and efficient clustering methods to achieve high precision. The system achieves a low memory footprint and fast execution times by using compressed data structures and aggressive hashing functions.

Entity embeddings are vector-based representations that capture how entities are referred to in context. We train entity embeddings using Wikipedia articles, and use hyperlinked terms in the articles to create canonical entities. The context of an entity and the context of a token are modeled using the neural network architecture in the figure below, where entity vectors are trained to predict not only their surrounding entities but also the global context of word sequences contained within them. In this way, one layer models entity context, and the other layer models token context. We connect these two layers using the same technique that (Quoc and Mikolov ‘14) used to train paragraph vectors.

image


Architecture for training word embeddings and entity embeddings simultaneously. Ent represents entities and W represents their context words.

Search click-log data gives very useful signals to disambiguate partial or ambiguous entity mentions. For example, if searchers for “Fox” tend to click on “Fox News” rather than “20th Century Fox,” we can use this data in order to identify “Fox” in a document. To disambiguate entity mentions and ensure a document has a consistent set of entities, our system supports three entity disambiguation algorithms:

*Currently, only the Forward Backward Algorithm is available in our open source release–the other two will be made available soon!

These algorithms are particularly helpful in accurately linking entities when a popular candidate is NOT the correct candidate for an entity mention. In the example below, these algorithms leverage the surrounding context to accurately link Manchester City, Swansea City, Liverpool, Chelsea, and Arsenal to their respective football clubs.



Ambiguous mentions that could refer to multiple entities are highlighted in red. For example, Chelsea could refer to Chelsea Football team or Chelsea neighborhood in New York or London. Unambiguous named entities are highlighted in green.


Examples of candidate retrieval process in Entity Linking for both ambiguous and unambiguous examples referred in the example above. The correct candidate is highlighted in green.


At this time, Fast Entity Linker is one of only three freely-available multilingual named entity recognition and linking systems (others are DBpedia Spotlight and Babelfy). In addition to a stand-alone entity linker, the software includes tools for creating and compressing word/entity embeddings and datapacks for different languages from Wikipedia data. As an example, the datapack containing information from all of English Wikipedia is only ~2GB.

The technical contributions of this system are described in two scientific papers:

There are numerous possible applications of the open-source toolkit. One of them is attributing sentiment to entities detected in the text, as opposed to the entire text itself. For example, consider the following actual review of the movie “Inferno” from a user on MetaCritic (revised for clarity): “While the great performance of Tom Hanks (wiki_Tom_Hanks) and company make for a mysterious and vivid movie, the plot is difficult to comprehend. Although the movie was a clever and fun ride, I expected more from Columbia (wiki_Columbia_Pictures).”  Though the review on balance is neutral, it conveys a positive sentiment about wiki_Tom_Hanks and a negative sentiment about wiki_Columbia_Pictures.

Many existing sentiment analysis tools collate the sentiment value associated with the text as a whole, which makes it difficult to track sentiment around any individual entity. With our toolkit, one could automatically extract “positive” and “negative” aspects within a given text, giving a clearer understanding of the sentiment surrounding its individual components.

Feel free to use the code, contribute to it, and come up with addtional applications; our system and models are available at https://github.com/yahoo/FEL.

Great work from our Yahoo Research team!

2016-11-22 Мрежата на OpenFest 2016

Post Syndicated from Vasil Kolev original https://vasil.ludost.net/blog/?p=3328

Нямахме голяма промяна от миналата година. Разликите бяха следните:

  • Тази година повечето switch-ове в опорната мрежа бяха tplink SG3210, имахме само 2 cisco-та. tplink-овете са по-тихи, по-малки, (буквално) железни и стават за странни deployment-и. Ако имаха и PoE, щяха да са направо невероятни.
  • Имахме още един switch, за NOC-а в мазето (който беше и единствения leaf в мрежата). Тази година стаичката за VOC беше оставена само на видео екипа, а мрежовия се ширеше в едно мазе.
  • Понеже имахме две зали за workshop-и, имахме малко повече user-ски switch-ове, в общи линии от кол и въже;

Ето тазгодишната схема.

С техника ни услужиха пак Светла от netissat (нейния switch вече 3-4 години е в опорната ни мрежа), Стефан Леков (noc switch-а и резервния сървър) и digger (два microtik-а за workshop switch-ове). Останалото беше от мен и initLab.

Голяма част от мрежовия setup беше организирана през github-ски issue-та, та лесно може да видите какво ни се е случвало по време на подготовката.

Тази година имаше повече наши "клиенти" на мрежата – интеркомът на залите беше по IP и имаше пръснати разни странни телефони, които бяха из мрежата. Като цяло wired мрежата не изглежда да се ползва от посетителите, но все повече се оказва полезна за нас.

Пак използвахме за сървърно подстълбищното пространство от другата страна на залата, и съответно имаме снимки преди и след. Не е от най-подходящите – всичко наоколо е дървено, тясно е и отдолу минава тръбата за парното (т.е. сме може би едно от малкото сървърни, които вместо климатик имат парно), но е точно до ел. таблото, далеч е от потоците хора и на достатъчно централна позиция, че да можем да пускаме от него независими трасета.

Тази година преизползвахме кабелите от миналата година и взехме един резервен кашон кабел, та нямахме никакви проблеми с изграждането на мрежата.

За uplink тази година ползвахме същата оптика на NetX, но с гигабитови конвертори и 300mbps, та не усетихме никакъв проблем със свързаността.

Използвахме и същия DL380G5 за сървър/router, като тази година Леков пусна още един такъв като backup. Пак го използвахме да encode-ва 1080p stream-а от зала България, въпреки че май тая година нормалния ни encoder щеше да се справи (за една година софтуерът е понапреднал).

Тази година се наложи да променим номерата на VLAN-ите, понеже една част от AP-тата (едни големи linksys-и) не поддържаха VLAN tag-ове над 64. Съответно адресният ни план изглеждаше по следния начин:

IPv4

id  range           name
10  185.108.141.104/30  external
20  10.20.0.0/24        mgmt
21  10.21.0.0/22        wired
22  10.22.0.0/22        wireless
23  10.23.0.0/24        video
24  10.24.0.0/24        overflow

IPv6

10  2a01:b760:1:2::/120
21  2a01:b760:2:4::/62
22  2a01:b760:2:5::/62

По firewall-а и forced forwarding-а нямахме разлика – пак пуснахме proxy_arp_pvlan за потребителските VLAN-и, филтрирахме 25ти порт и не се допускаше трафик до management/video/overflow VLAN-ите от нормални потребители.

Имахме пълна IPv6 поддръжка в потребителските VLAN-и (wired и wireless), като тази година нямахме проблемът с изчезващият IPv6 за random хора – явно най-накрая странният bug е бил ремонтиран.

Изобщо мрежата беше максимално стабилна и при събрания опит може да планираме догодина повече вътрешно-екипна комуникация върху нея, и всякакви странни екстри (например стационарни телефони, повече монитори с информация, някакви сигнализации, работни станции на рецепции и всякакви весели неща). За сега най-големите потребители са wireless-а и видео екипа.

Music theory for nerds

Post Syndicated from Eevee original https://eev.ee/blog/2016/09/15/music-theory-for-nerds/

Not music nerds, obviously.

I don’t know anything about music. I know there are letters but sometimes the letters have squiggles; I know an octave doubles in pitch; I know you can write a pop song with only four chords. That’s about it.

The rest has always seemed completely, utterly arbitrary. Why do we have twelve notes, but represent them with only seven letters? Where did the key signatures come from? Why is every Wikipedia article on this impossible to read without first having read all the others?

A few days ago, some of it finally clicked. I feel like an idiot for not getting it earlier, but I suppose it doesn’t help that everyone explains music using, well, musical notation, which doesn’t make any sense if you don’t know why it’s like that in the first place.

Here is what I gathered, from the perspective of someone whose only music class was learning to play four notes on a recorder in second grade. I stress that I don’t know anything about music and this post is terrible. If you you so much as know how to whistle, please don’t read this you will laugh at me.

Sound and waves

Music is a kind of sound. Sound is a pressure wave.

Imagine what happens when you beat a drum. The drumhead is elastic, so when you hit it, it deforms inwards, then rebounds outwards, then back inwards, and so on until it runs out of energy. If you watched a point in the center of the drumhead, its movement would look a lot like what you get when you hold a slinky by the top and let the bottom go.

When the drumhead rebounds outwards, it pushes air out of the way. That air pushes more air out of the way, which pushes more air out of the way, creating a 3D ripple leading away from the drum. Meanwhile, the drumhead has rebounded back inwards, leaving a vacuum which nearby air rushes to fill… which leaves another vacuum, and so on. The result is that any given air molecule is (roughly) drifting back and forth from its original position, just like the drumhead or the slinky.

Eventually this pressure wave reaches your eardrum, which vibrates in exactly the same way as the drumhead, and you interpret this as music. Or perhaps as noise, depending on your taste.

I would love to provide an illustration of this, but the trouble is that it would look like ripples on a pond, where the wave goes upwards. Sound happens in three dimensions, the movement is directed towards/away from the source, and I think that’s a pretty important distinction.

Instead, let’s jump straight to the graphs. Here’s a sine wave.

Graph of a sine wave.  Frequency is labeled as the horizontal distance between waves; amplitude is labeled as the vertical distance between the highest and lowest points.

It doesn’t matter what a sine wave is; it just happens to be a common wave that’s easy to make a graph of.

In graphs like this, time starts at zero and increases to the right, and the wave shows how much the air (or your eardrum, or whatever medium) has moved from its original position. Complete silence would be a straight line at zero, all the way across.

All sound you ever hear is a graph like this; nothing more. If you open up a song in Audacity and zoom in enough, you’ll see a wave. It’ll probably be a bit more complicated, but it’s still a wave.

Waves are defined by a couple of things: frequency, amplitude, and shape. The particular sound you hear — the thing that distinguishes a guitar from a violin — is the shape of the wave, which musicians call timbre.

A sine wave sounds something like this:

Amplitude is the distance between the lowest and highest points of the wave. Or, depending on who you ask, it might be half that — the distance between the highest point and zero. For sound, amplitude determines the volume of the sound you hear. This seems pretty reasonable, since in physical terms, amplitude is the furthest distance the medium moves. If you tap a drum lightly, it only moves very slightly, and the sound is quiet. If you wail on a drum, it moves quite a bit, and the sound is much louder.

Frequency is, quite literally, how frequent the wave is. If each wave is very skinny, then waves are more frequent, i.e. they have a higher frequency. If each wave is fairly wide, then waves are less frequent, and they have a lower frequency. Musicians refer to frequency as pitch. Non-musicians would probably just call it a note or tone, which musicians would scoff at, but what do they know anyway.

Frequency is measured in Hz (Hertz), which is a funny way of spelling “per second”. If it takes half a second to get from one point on a wave back to the same point on the next wave, that’s 2 Hz, because there are two waves per second. The sound above has a frequency of 440 Hz. (The graph, of course, does not; it’s a completely unchanged sine wave generated by wxMaxima, so its frequency is 1/τ = 1/(2π).)

A critical property of the human ear, which informs all the rest of this, is that if you double or halve the frequency of a sound, we consider it to be “the same” in some way. Obviously, the result will sound higher- or lower-pitched, but they “feel” very similar. I can take a few guesses at the physical reasons for this, but we can treat it as an arbitrary rule.

Compare these three sine waves, if you like. The first is the same as the sine wave from earlier; the second has 1.5× the frequency of the first; the third has 2× the frequency of the first. The first and third sound much more related than the second sounds to either.

Notes and octaves

The difficulty with music is that half of it is arbitrary and half of it is actually based on something, but you can’t tell the difference just by looking at it.

Let’s start with that fact about the human ear: doubling the pitch (= frequency) will sound “the same” in some indescribable way. For any starting pitch f, you can thus generate an infinite number of other pitches that sound “the same” to us: ½f, 2f, ¼f, 4f, and so on. (Of course, only so many of these will fit within the range of human hearing.) All of those pitches together have some common quality, so let’s refer to a group of them as a note. (“Note” can also refer to an individual pitch, whereas pitch class is unambiguous, but I’ll stick with “note” for now.)

If we say 440 Hz produces a note called A, then 880 Hz, 220 Hz, 1760 Hz, 110 Hz, and so forth will also produce a note called A. An important consequence of this is that all distinct notes we could possibly come up with must exist somewhere between 440 Hz and 880 Hz. Any other pitch could be doubled or halved until it lies in that range, and thus would produce a note in that range.

Such a range is called an octave, for reasons we’ll see in a moment. Every note exists exactly once within any given octave, no matter how you define it. As a special case, the lowest pitch f is the same note as the highest pitch 2f, so 2f is considered to be part of the next octave.

This is good news! It means we can choose some pitches within a small range — any small range, so long as it’s of the form f to 2f — and by doubling and halving those pitches, we’ll have a standard set of notes spanning the entire range of human hearing.

How, then, do we choose those pitches? You might say, well! Since we’re going from f to 2f anyway, let’s just choose pitches at f, 1.1f, 1.2f, 1.3f, 1.4f, and so on. Space them equally, so they’re as distinct as possible.

Good plan! Unfortunately, that won’t quite work — if you try it, you’ll find that the difference between f and 1.1f is almost twice the difference between 1.9f and 2f.

The human ear distinguishes pitch based on ratios, which is why the halving/doubling effect exists. f to 1.1f is an increase of 10%; 1.9f to 2f is an increase of about 5%.

We need a set of pitches that have the same ratio rather than the same difference. If we want n pitches, then we need a number that can be multiplied n times to get from f to 2f.

f × x × x × x × … × x = 2f
fxⁿ = 2f
xⁿ = 2
x = ⁿ√2

Ah. We need the n-th root of two. That’s a bit weird and awkward, since it’s guaranteed to be irrational for any n > 1.

Intervals in Western music

Western music has twelve distinct pitches. This is somewhat arbitrary — twelve has a few nice mathematical properties, but it’s not absolutely necessary. You could create your own set of notes with eleven pitches, or seventeen, or a hundred, or five. There are forms of music elsewhere in the world that do just that.

The ratio between successive pitches in Western music is thus the twelfth root of two, ¹²√2 ≈ 1.0594631. Starting at 440 Hz and repeatedly multiplying by this value produces twelve pitches before hitting 880 Hz.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
0    440 Hz
1    466.16 Hz
2    493.88 Hz
3    523.25 Hz
4    554.36 Hz
5    587.33 Hz
6    622.25 Hz
7    659.26 Hz
8    698.46 Hz
9    739.99 Hz
10   783.99 Hz
11   830.61 Hz
12   880 Hz

No one wants to actually work with these numbers — and when this system was invented, no one knew what these numbers were. Instead, music is effectively defined in terms of ratios.

The ratio between two pitches is called an interval, and an interval of one twelfth root of two is a semitone. This way, all the horrible irrational numbers go out the window, and we can mostly talk in whole numbers.

(What we’re really doing here is working on a logarithmic scale. I know “logarithm” strikes fear into the hearts of many, but all it means is that we say “add” to mean “multiply”.)

Now, remember, the human ear loves ratios. It particularly loves ratios of small whole numbers — that’s why doubling the pitch sounds “similar”, because it creates a ratio of 2:1, the smallest and whole-number-est you can get.

The twelfth root of two may be irrational, but it turns out to almost create several nice ratios. (I don’t know why twelve in particular has this effect, or if other roots do as well, but it’s probably why Western music settled on twelve.) Here are the pitches of all twelve notes, relative to the first. Some of them are very close to simple fractions.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
0    1.000          = 1:1   (unison)
1    1.059                  (semitone; minor second)
2    1.122   1.125 = 9:8   (whole tone; major second)
3    1.189                  (minor third)
4    1.260   1.250 = 5:4   (major third)
5    1.335   1.333 = 4:3   (perfect fourth)
6    1.414
7    1.498   1.500 = 3:2   (perfect fifth)
8    1.587                  (minor sixth)
9    1.682   1.667 = 5:3   (major sixth)
10   1.782                  (minor seventh)
11   1.888   1.889 = 17:9  (major seventh)
12   2              = 2:1   (octave)

Not counting the octave, there are seven fairly nice fractions here.

Hmm. Seven. What a conspicuous number.

Scales

Surprise! Those nice fractions make up the major scale. Starting with C produces the C major scale — the “natural” notes. Using ♯ to mean “one semitone up” and ♭ to mean “one semitone down”, we can name all of these notes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
0    1.000          = 1:1   C           (unison)
1    1.059                  C or D    (semitone; minor second)
2    1.122   1.125 = 9:8   D           (whole tone; major second)
3    1.189                  D or E    (minor third)
4    1.260   1.250 = 5:4   E           (major third)
5    1.335   1.333 = 4:3   F           (perfect fourth)
6    1.414                  F or G
7    1.498   1.500 = 3:2   G           (perfect fifth)
8    1.587                  G or A    (minor sixth)
9    1.682   1.667 = 5:3   A           (major sixth)
10   1.782                  A or B    (minor seventh)
11   1.888   1.889 = 17:9  B           (major seventh)
12   2              = 2:1   C           (octave)

I don’t know for sure if this is where the modern naming convention came from, but I sure wouldn’t be surprised.

You can now see where some of those interval names came from. A perfect fifth is the interval between the first and fifth notes of the scale. An octave spans eight notes in total. Likewise, the smallest interval is a semitone because most of the notes are two steps apart, so that distance became a whole tone.

The intervals between successive notes can be written as wwhwwwh, where w is a whole tone and h is a semitone or half tone. Because octaves repeat, you can rotate this sequence to produce seven different variations, depending on where you start. The resulting scales are all called diatonic scales, and the choice of starting point is called a mode. Here are all seven, marked by Roman numerals indicating the start point. I’ve also chosen different starting notes for each column, so that the resulting notes are all “natural”.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
                    I  II  III  IV  V  VI  VII
0    1.000  = 1:1  |C|  D   E   F   G  |A|  B
1    1.059         | |      F          | |  C
2    1.122  ≈ 9:8  |D|  E       G   A  |B|  
3    1.189         | |  F   G          |C|  D
4    1.260  ≈ 5:4  |E|          A   B  | |  
5    1.335  ≈ 4:3  |F|  G   A       C  |D|  E
6    1.414         | |          B      | |  F
7    1.498  ≈ 3:2  |G|  A   B   C   D  |E|  
8    1.587         | |      C          |F|  G
9    1.682  ≈ 5:3  |A|  B       D   E  | |  
10   1.782         | |  C   D       F  |G|  A
11   1.888  ≈ 17:9 |B|          E      | |  
12   2      = 2:1  |C|  D   E   F   G  |A|  B

I’ve, er, highlighted two columns. Column I produces the major scales, and column VI produces the natural minor scales. This explains the rest of the interval names: a minor third is the span between the first and third notes in a minor scale, whereas a major third is the span between the first and third notes in a major scale. The fourth and fifth notes are the same. (The second notes are the same, too, so I don’t know where “minor second” came from.)

You can start anywhere you want to produce a major or minor scale, as long as you follow the same patterns of intervals. With twelve notes, there are twenty-four major and minor scales altogether, which would make for a big boring diagram. Here are a few of the major scales.

1
2
3
4
5
6
7
A major:    A       B       C#  D       E       F#      G#  A
A# major:   A#      C       D   D#      F       G       A   A#
B major:    B       C#      D#  E       F#      G#      A#  B
C major:    C       D       E   F       G       A       B   C
C# major:   C#      D#      F   F#      G#      A#      C   C#
D major:    D       E       F#  G       A       B       C#  D
D# major:   D#      F       G   G#      A#      C       D   D#

If you rotate those major scales to start from C, they look like this.

1
2
3
4
5
6
7
A major:        C#  D       E       F#      G#  A       B       C#
A# major:   C       D   D#      F       G       A   A#      C
B major:        C#      D#  E       F#      G#      A#  B       C#
C major:    C       D       E   F       G       A       B   C
C# major:       C#      D#      F   F#      G#      A#      C   C#
D major:        C#  D       E       F#  G       A       B       C#
D# major:   C       D   D#      F       G   G#      A#      C

Here are some minor scales, written the same way.

1
2
3
4
5
6
7
F# minor:       C#  D       E       F#      G#  A       B       C#
G minor:    C       D   D#      F       G       A   A#      C
G# minor:       C#      D#  E       F#      G#      A#  B       C#
A minor:    C       D       E   F       G       A       B   C
A# minor:       C#      D#      F   F#      G#      A#      C   C#
B minor:        C#  D       E       F#  G       A       B       C#
C minor:    C       D   D#      F       G   G#      A#      C

Hmmmm. Yes, every major scale is equivalent to the minor scale starting from its second-to-last note, due to the way octaves wrap around. They’re called each other’s relative major and relative minor.

Also, this notation has a slight problem. That problem is that sheet music is terrible.

Sheet music and key signatures

If you know anything about sheet music, you may have noticed that there are no spaces for writing flat or sharp notes.

If you don’t know anything about sheet music, well, there are no spaces for writing flat or sharp notes.

If you want to put any of the other notes in, you keep them on the same line, but with a ♯ or ♭ next to them. So the notes in D major, which includes F♯ and C♯, are written as though they were F and C, but with some extra ♯s scattered around. That’s not very convenient, so instead, you can put a key signature at the very beginning — it takes the form of several ♯s or ♭s written in specific places to indicate which notes are sharp or flat. Then any such unadorned note is considered sharp or flat.

C major
D major, written without a key signature
D major, written with a key signature

(The notes may not actually be arranged this way, depending on the particular squiggle on the left and its vertical position.)

I… guess… that’s convenient? If your music mostly relies on the seven notes from a particular scale, then it’s more compact to only have room for seven notes in your sheet music, and adjust the meaning of those notes when necessary… right?

It completely obscures the relationship between the pitches, though. You can’t even easily tell which scale some sheet music is in, short of memorization. In the example above, there are ♯s for C and F; what about that is supposed to tell you “D”?

Anyway, I really only brought this up to make a point about notation. Look at C♯ major again.

1
C# major:   C#      D#      F   F#      G#      A#      C   C#

Two pairs of notes are using the same letter — C and C♯, F and F♯ — so they’d occupy the same position in sheet music. That won’t work with the scheme I just described.

To fix this, several scales fudge it a bit. C is one semitone above B, so you could also write it as B♯. F is one semitone above E, so you could also write it as E♯. Here’s how C♯ is actually written.

1
C# major:   C#      D#     (E#) F#      G#      A#     (B#) C#

Now all seven letters are used exactly once.

I don’t think I entirely understand this, because it still seems so convoluted to me. You have to mentally translate that C to a C♯, and then translate the C♯ to however that particular note is actually played on your instrument. What does this accomplish? It keeps sheet music more compact — seven notes to express per octave rather than twelve — but I can’t think of any better reason.

I suppose it’s possible to change the sound of an entire piece of music just by changing the key signature, sometimes without otherwise altering the sheet music at all. How would that work for music that also uses notes outside the scale, I wonder? These seem more like questions of composition, which I definitely don’t know anything about.

Something, something, chords

I’m getting out in the weeds a bit here.

As I said, major and minor scales come in pairs; every major scale has a relative minor scale with exactly the same set of notes, and vice versa. So C major is identical to A minor. Why do we need both? More importantly, since they both use the same key signature, how can you even say that a piece of music is definitively one or the other?

A lot of people have tried to explain this to me as being about mood and different sound and whatnot, but that moves the question rather than answering it. From what I can gather, the real answer is twofold.

One: music is written against a key, which includes both the scale and common chords and maybe some other stuff. A chord is multiple notes played together, or almost together. You can construct plenty of different chords, but some really big players are the major chords and minor chords, which are the first, third, and fifth notes in a scale. The C major chord (confusingly written “C”) is thus comprised of C, E, and G, whereas the A minor chord (written “Am”) is A, C, and E.

Major chords are notes 0, 4, and 7 out of the twelve notes in an octave. Minor chords are 0, 3, and 7. The first and last notes in both chords are seven semitones apart — a perfect fifth, that nice 3:2 ratio. They sound somewhat similar, but because the middle note is slightly lower in a minor chord, it often sounds a little more dramatic or moody.

Speaking of which, something slightly interesting happens when you compare the major and minor scales starting with the same note. They’re very similar, except that three notes are a semitone higher in the major scale.

1
2
C major:    C       D       E   F       G       A       B   C
C minor:    C       D   D#      F       G   G#      A#      C

Every major and minor scale has seven chords of this form, depending on where you start; the second chord in the C major scale, for instance, is D-F-A, which is D minor. Yes, minor; it’s the same arrangement of notes as the first chord you’ll find in the D minor scale.

Sometimes you’ll see chords written using Roman numerals, with capital letters for major chords and lowercase for minor chords. The chords of a major scale are I, ii, iii, IV, V, vi, and vii; the chords of a minor scale are i, ii, III, iv, v, VI, and VII. “I” just means the chord built from the first note, and so on. This lets you talk about, say, chord progressions without worrying about any particular key.

Anyway, getting back to why we have both A minor and C major, this brings me to…

Two: it’s just custom. Western music tends to be written according to certain conventions, and people versed in those conventions can identify which one was being used. Music written in C major will often begin and/or end with C or even a C major chord; music written in A minor will often begin and/or end in A or an A minor chord. As far as I can tell, the two sets of notes aren’t fundamentally different in any way, and there’s no hard requirement to follow these conventions.

I suppose the advantage is the same as with any convention: your work will be more accessible to others in the same field. Transposing music between keys, for example, only really makes sense if you can confidently say what the original key is. Incidentally, someone linked me an example of Für Elise being played in A major, rather than the A minor it was written in. (And if you played it in C major, it would sound like… uh… wait, I confused myself.)

It should come as no surprise that all of these conventions have myriad variants. A harmonic minor scale is a minor scale with its seventh note raised by a semitone. A melodic minor scale adjusts several notes, but only when “going up”, not down. There are augmented chords (and intervals) whose highest notes are raised by a semitone, and diminished chords (and intervals) whose highest notes are lowered by a semitone. And on it goes. All of these things messily overlap and create multiple conflicting names for the same things, because they’re attempts to describe human intention rather than an objective waveform.


The “circle of fifths” is a thing, showing all the major and minor scales in a circle — it turns out that if you name and arrange them the right way, each scale has a different number of sharp or flat notes in it, and no scale has both sharps and flats. The “right way” is to iterate the notes seven semitones at a time (hence “circle of fifths“), leading you from C to G to D and so on. A scale uses sharps or flats depending on which symbol will allow all the notes to be assigned to different letters and thus work nicely on sheet music. I’m sure there’s a modular arithmetic explanation for why this all works out as nicely as it does, but I don’t know it offhand.

Oh, and integer ratios probably appeal to the human ear because they make the combined waves line up every so often. Here’s what a perfect fifth looks like — the top two tones are A4 and E5, using the not-quite-perfect twelfth root of two. Because they’re in a nice ratio of 3:2, adding them together creates a repeating set of six waves, which itself resembles a wave.

Two waveforms add together to make a regular, repeating pattern

(A4 is the A note within octave 4. Octave 4 starts at middle C, and both are named based on the layout of a piano. A common reference point for tuning is to set A4 to 440 Hz.)

Finally, notes with the “same name” might not actually be the same note, depending on how the instrument is tuned; various schemes make certain chords have exact integer ratios, not just approximations. More “fake” notes exist than E♯, too; I hear rumor of such nonsense as G𝄪, “G double sharp”, which I would rather call “A”. I suspect these two trivia are related, but I don’t quite know how.

These are all the things that I know. I don’t know any more things.

In conclusion

This has got to be some of the worst jargon and notation for anything, ever.

I only looked into this because I want to compose some music, and I feel completely blocked when I just don’t understand a subject at all. I’m not sure any of this helped in any way, but at least now I’m not left wondering.

It seems that everything not expressible purely as math and waves is pretty much arbitrary. You can pick whatever set of the twelve notes you want and make music with those. Someone pointed out to me that if you just use the black keys of a piano (i.e. the non-natural notes), you get a pentatonic scale where nothing can possibly sound bad, because no two notes are closer together than a whole tone. You can also use pitches outside these twelve notes, as a lot of jazz and non-Western and other not-classically-inspired music does.

I get the feeling that treating the whole chord/key ecosystem as a set of rules is like studying Renaissance paintings and deciding that’s how art is. It’s not. Do what you want, if it sounds good. I’m gonna go try that. Consensus seems to be that the real heart of music is managing contrast — like every other form of art.

If you aren’t quite as ready to abandon the entire Western musical tradition, here’s some stuff people linked to me while I was figuring this out in real time on Twitter.

Scott Atran on Why People Become Terrorists

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2016/08/scott_atran_on_.html

Scott Atran has done some really interesting research on why ordinary people become terrorists.

Academics who study warfare and terrorism typically don’t conduct research just kilometers from the front lines of battle. But taking the laboratory to the fight is crucial for figuring out what impels people to make the ultimate sacrifice to, for example, impose Islamic law on others, says Atran, who is affiliated with the National Center for Scientific Research in Paris.

Atran’s war zone research over the last few years, and interviews during the last decade with members of various groups engaged in militant jihad (or holy war in the name of Islamic law), give him a gritty perspective on this issue. He rejects popular assumptions that people frequently join up, fight and die for terrorist groups due to mental problems, poverty, brainwashing or savvy recruitment efforts by jihadist organizations.

Instead, he argues, young people adrift in a globalized world find their own way to ISIS, looking to don a social identity that gives their lives significance. Groups of dissatisfied young adult friends around the world ­ often with little knowledge of Islam but yearning for lives of profound meaning and glory ­ typically choose to become volunteers in the Islamic State army in Syria and Iraq, Atran contends. Many of these individuals connect via the internet and social media to form a global community of alienated youth seeking heroic sacrifice, he proposes.

Preliminary experimental evidence suggests that not only global terrorism, but also festering state and ethnic conflicts, revolutions and even human rights movements — think of the U.S. civil rights movement in the 1960s — depend on what Atran refers to as devoted actors. These individuals, he argues, will sacrifice themselves, their families and anyone or anything else when a volatile mix of conditions are in play. First, devoted actors adopt values they regard as sacred and nonnegotiable, to be defended at all costs. Then, when they join a like-minded group of nonkin that feels like a family ­ a band of brothers ­ a collective sense of invincibility and special destiny overwhelms feelings of individuality. As members of a tightly bound group that perceives its sacred values under attack, devoted actors will kill and die for each other.

Paper.

EDITED TO ADD (8/13): Related paper, also by Atran.

Python FAQ: How do I port to Python 3?

Post Syndicated from Eevee original https://eev.ee/blog/2016/07/31/python-faq-how-do-i-port-to-python-3/

Part of my Python FAQ, which is doomed to never be finished.

Maybe you have a Python 2 codebase. Maybe you’d like to make it work with Python 3. Maybe you really wish someone would write a comically long article on how to make that happen.

I have good news! You’re already reading one.

(And if you’re not sure why you’d want to use Python 3 in the first place, perhaps you’d be interested in the companion article which delves into exactly that question?)

Don’t be intimidated

This article is quite long, but don’t take that as a sign that this is necessarily a Herculean task. I’m trying to cover every issue I can ever recall running across, which means a lot of small gotchas.

I’ve ported several codebases from Python 2 to Python 2+3, and most of them have gone pretty smoothly. If you have modern Python 2 code that handles Unicode responsibly, you’re already halfway there.

However… if you still haven’t ported by now, almost eight years after Python 3.0 was first released, chances are you have either a lumbering giant of an app or ancient and weird 2.2-era code. Or, perish the thought, a lumbering giant consisting largely of weird 2.2-era code. In that case, you’ll want to clean up the more obvious issues one at a time, then go back and start worrying about actually running parts of your code on Python 3.

On the other hand, if your Python 2 code is pretty small and you’ve just never gotten around to porting, good news! It’s not that bad, and much of the work can be done automatically. Python 3 is ultimately the same language as Python 2, just with some sharp bits filed off.

Making some tough decisions

We say “porting from 2 to 3”, but what we usually mean is “porting code from 2 to both 2 and 3”. That ends up being more difficult (and ugly), since rather than writing either 2 or 3, you have to write the common subset of 2 and 3. As nifty as some of the features in 3 are, you can’t actually use any of them if you have to remain compatible with Python 2.

The first thing you need to do, then, is decide exactly which versions of Python you’re targeting. For 2, your options are:

  • Python 2.5+ is possible, but very difficult, and this post doesn’t really discuss it. Even something as simple as exception handling becomes painful, because the only syntax that works in Python 3 was first introduced in Python 2.6. I wouldn’t recommend doing this.

  • Python 2.6+ used to be fairly common, and is well-tread ground. However, Python 2.6 reached end-of-life in 2013, and some common libraries have been dropping support for it. If you want to preserve Python 2.6 compatibility for the sake of making a library more widely-available, well, I’d urge you to reconsider. If you want to preserve Python 2.6 compatibility because you’re running a proprietary app on it, you should stop reading this right now and go upgrade to 2.7 already.

  • Python 2.7 is the last release of the Python 2 series, but is guaranteed to be supported until at least 2020. The major focus of the release was backporting a lot of minor Python 3 features, making it the best possible target for code that’s meant to run on both 2 and 3.

  • There is, of course, also the choice of dropping Python 2 support, in which case this process will be much easier. Python 2 is still very widely-used, though, so library authors probably won’t want to do this. App authors do have the option, but unless your app is trivial, it’s much easier to maintain Python 2 support during the port — that way you can port iteratively, and the app will still function on Python 2 in the interim, rather than being a 2/3 hybrid that can’t run on either.

Most of this post assumes you’re targeting Python 2.7, though there are mentions of 2.6 as well.

You also have to decide which version of Python 3 to target.

  • Python 3.0 and 3.1 are forgettable. Python 3 was still stabilizing for its first couple minor versions, and from what I hear, compatibility with both 2.7 and 3.0 is a huge pain. Both versions are also past end-of-life.

  • Python 3.2 and 3.3 are a common minimum version to target. Python 3.3 reinstated support for u'...' literals (redundant in Python 3, where normal strings are already Unicode), which makes supporting both 2 and 3 much easier. I bundle it with Python 3.2 because the latest version that stable PyPy supports is 3.2, but it also supports u'...' literals. You’ll support the biggest surface area by targeting that, a sort of 3.2½. (There’s an alpha PyPy supporting 3.3, but as of this writing it’s not released as stable yet.)

  • Python 3.4 and 3.5 add shiny new features, but you can only really use them if you’re dropping support for Python 2. Again, I’d suggest targeting Python 2.7 + Python 3.2½ first, then dropping the Python 2 support and adding whatever later Python 3 trinkets you want.

Another consideration is what attitude you want your final code to take. Do you want Python 2 code with enough band-aids that it also works on Python 3, or Python 3 code that’s carefully written so it still works on Python 2? The differences are subtle! Consider code like x = map(a, b). map returns a list in Python 2, but a lazy iterable in Python 3. Which way do you want to port this code?

1
2
3
4
5
6
7
8
9
# Python 2 style: force eager evaluation, even on Python 3
x = list(map(a, b))

# Python 3 style: use lazy evaluation, even on Python 2
try:
    from future_builtins import map
except ImportError:
    pass
x = map(a, b)

The answer may depend on which Python you primarily use for development, your target audience, or even case-by-case based on how x is used.

Personally, I’d err on the side of preserving Python 3 semantics and porting them to Python 2 when possible. I’m pretty used to Python 3, though, and you or your team might be thrown for a loop by changing Python 2’s behavior.

At the very least, prefer if PY2 to if not PY3. The former stresses that Python 2 is the special case, which is increasingly true going forward. Eventually there’ll be a Python 4, and perhaps even a Python 5, and those future versions will want the “Python 3” behavior.

Some helpful tools

The good news is that you don’t have to do all of this manually.

2to3 is a standard library module (since 2.6) that automatically modifies Python 2 source code to change some common Python 2 constructs to the Python 3 equivalent. (It also doubles as a framework for making arbitrary changes to Python code.)

Unfortunately, it ports 2 to 3, not 2 to 2+3. For libraries, it’s possible to rig 2to3 to run automatically on your code just before it’s installed on Python 3, so you can keep writing Python 2 code — but 2to3 isn’t perfect, and this makes it impossible to develop with your library on Python 3, so Python 3 ends up as a second-class citizen. I wouldn’t recommend it.

The more common approach is to use something like six, a library that wraps many of the runtime differences between 2 and 3, so you can run the same codebase on both 2 and 3.

Of course, that still leaves you making the changes yourself. A more recent innovation is the python-future project, which combines both of the above. It has a future library of renames and backports of Python 3 functionality that goes further than six and is designed to let you write Python 3-esque code that still runs on Python 2. It also includes a futurize script, based on the 2to3 plumbing, that rewrites your code to target 2+3 (using python-future’s library) rather than just 3.

The nice thing about python-future is that it explicitly takes the stance of writing code against Python 3 semantics and backporting them to Python 2. It’s very dedicated to this: it has a future.builtins module that includes not only easy cases like map, but also entire pure-Python reimplementations of types like bytes. (Naturally, this adds some significant overhead as well.) I do like the overall attitude, but I’m not totally sold on all the changes, and you might want to leaf through them to see which ones you like.

futurize isn’t perfect, but it’s probably the best starting point. The 2to3 design splits the various edits into a variety of “fixers” that each make a single style of change, and futurize works the same way, inheriting many of the fixers from 2to3. The nice thing about futurize is that it groups the fixers into “stages”, where stage 1 (futurize --stage1) only makes fairly straightforward changes, like fixing the except syntax. More importantly, it doesn’t add any dependencies on the future library, so it’s useful for making the easy changes even if you’d prefer to use six. You’re also free to choose individual fixes to apply, if you discover that some particular change breaks your code.

Another advantage of this approach is that you can tackle the porting piecemeal, which is great for very large projects. Run one fixer at a time, starting with the very simple ones like updating to except ... as ... syntax, and convince yourself that everything is fine before you do the next one. You can make some serious strides towards 3 compatibility just by eliminating behavior that already has cromulent alternatives in Python 2.

If you expect your Python 3 port to take a very long time — say, if you have a large project with numerous developers and a frantic release schedule — then you might want to prevent older syntax from creeping in with a tool like autopep8, which can automatically fix some deprecated features with a much lighter touch. If you’d like to automatically enforce that, say, from __future__ import absolute_import is at the top of every Python file, that’s a bit beyond the scope of this article, but I’ve had pre-commit + reorder_python_imports thrust upon me in the past to fairly good effect.

Anyway! For each of the issues below, I’ll mention whether futurize can fix it, the name of the responsible fixer, and whether six has anything relevant. If the name of the fixer begins with lib2to3, that means it’s part of the standard library, and you can use it with 2to3 without installing python-future.

Here we go!

Things you shouldn’t even be doing

These are ancient, ancient practices, and even Python 2 programmers may be surprised by them. Some of them are arguably outright bugs in the language; others are just old and forgotten. They generally have equivalents that work even in older versions of Python 2.

Old-style classes

1
2
class Foo:
    ...

In Python 3, this code creates a class that inherits from object. In Python 2, it creates a completely different kind of thing entirely: an “old-style” class, which worked a little differently from built-in types. The differences are generally subtle:

  • Old-style classes don’t support __getattribute__, __slots__

  • Old-style classes don’t correctly support data descriptors, i.e. the assignment behavior of @property.

  • Old-style classes had a __coerce__ method, which would attempt to turn a value into a built-in numeric type before performing a math operation.

  • Old-style classes didn’t use the C3 MRO, so in the case of diamond inheritance, a class could be skipped entirely by super().

  • Old-style instances check the instance for a special method name; new-style instances check the type. Additionally, if a special method isn’t found on an old-style instance, the lookup falls back to __getattr__; this is not the case for new-style classes (which makes proxying more complicated).

That last one is the only thing old-style classes can do that new-style classes cannot, and if you’re relying on it, you have a bit of refactoring to do. (The really curious thing is that there doesn’t seem to be a particularly good reason for the limitation on new-style classes, and it doesn’t even make things faster. Maybe that’ll be fixed in Python 4?)

If you have no idea what any of that means or why you should care, chances are you’re either not using old-style classes at all, or you’re only using them because you forgot to write (object) somewhere. In that case, futurize --stage2 will happily change class Foo: to class Foo(object): for you, using the libpasteurize.fixes.fix_newstyle fixer. (Strictly speaking, this is a Python 2 compatibility issue, since the old syntax still works fine in Python 3 — it just means something else now.)

cmp

Python 2 originally used the C approach for sorting. Given two things A and B, a comparison would produce a negative number if A < B, zero if A == B, and a positive number if A > B. This was the only way to customize sorting; there’s a cmp() built-in function, a __cmp__ special method, and cmp arguments to list.sort() and sorted().

This is a little cumbersome, as you may have noticed if you’ve ever tried to do custom sorting in Perl or JavaScript. Even a case-insensitive sort involves repeating yourself. Most custom sorts will have the same basic structure of cmp(op(a), op(b)), when the only thing you really care about is op.

1
names.sort(cmp=lambda a, b: cmp(a.lower(), b.lower()))

But more importantly, the C approach is flat-out wrong for some types. Consider sets, which use comparison to indicate subsets versus supersets:

1
2
3
4
{1, 2} < {1, 2, 3}  # True
{1, 2, 3} > {1, 2}  # True
{1, 2} < {1, 2}  # False
{1, 2} <= {1, 2}  # True

So what to do with {1, 2} < {3, 4}, where none of the three possible answers is correct?

Early versions of Python 2 added “rich comparisons”, which introduced methods for all six possible comparisons: __eq__, __ne__, __lt__, __le__, __gt__, and __ge__. You’re free to return False for all six, or even True for all six, or return NotImplemented to allow deferring to the other operand. The cmp argument became key instead, which allows mapping the original values to a different item to use for comparison:

1
names.sort(key=lambda a: a.lower())

(This is faster, too, since there are fewer calls to the lambda, fewer calls to .lower(), and no calls to cmp.)


So, fixing all this. Luckily, Python 2 supports all of the new stuff, so you don’t need compatibility hacks.

To replace simple implementations of __cmp__, you need only write the appropriate rich comparison methods. You could even do this the obvious way:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
class Foo(object):
    def __cmp__(self, other):
        return cmp(self.prop, other.prop)

    def __eq__(self, other):
        return self.__cmp__(other) == 0

    def __ne__(self, other):
        return self.__cmp__(other) != 0

    def __lt__(self, other):
        return self.__cmp__(other) < 0

    ...

You would also have to change the use of cmp to a manual if tree, since cmp is gone in Python 3. I don’t recommend this.

A lazier alternative would be to use functools.total_ordering (backported from 3.0 into 2.7), which generates four of the comparison methods, given a class that implements __eq__ and one other:

1
2
3
4
5
6
7
@functools.total_ordering
class Foo(object):
    def __eq__(self, other):
        return self.prop == other.prop

    def __lt__(self, other):
        return self.prop < other.prop

There are a couple problems with this code. For one, it’s still pretty repetitive, accessing .prop four times (and imagine if you wanted to compare several properties). For another, it’ll either cause an error or do entirely the wrong thing if you happen to compare with an object of a different type. You should return NotImplemented in this case, but total_ordering doesn’t handle that correctly until Python 3.4. If those bother you, you might enjoy my own classtools.keyed_ordering, which uses a __key__ method (much like the key argument) to generate all six methods:

1
2
3
4
@classtools.keyed_ordering
class Foo(object):
    def __key__(self):
        return self.prop

Replacing uses of key arguments should be straightforward: a cmp argument of cmp(op(a), op(b)) becomes a key argument of op. If you’re doing something more elaborate, there’s a functools.cmp_to_key function (also backported from 3.0 to 2.7), which converts a cmp function to one usable as a key. (The implementation is much like the first Foo example above: it involves a class that calls the wrapped function from its comparison methods, and returns True or False depending on the return value.)

Finally, if you’re using cmp directly, don’t do that. If you really, really need it for something other than Python’s own sorting, just use an if.

The only help futurize offers is in futurize --stage2, via libfuturize.fixes.fix_cmp, which adds an import of past.builtins.cmp if it detects you’re using the cmp function anywhere.

Comparing incompatible types

Python 2’s use of C-style ordering also means that any two objects, of any types, must be either equal or occur in some defined order. Python’s answer to this problem is to sort on the names of the types. So None < 3 < "1", because "NoneType" < "int" < "str".

Python 3 removes this fallback rule; if two values don’t know how to compare against each other (i.e. both return NotImplemented), you just get a TypeError.

This might affect you in subtle ways, such as if you’re sorting a list of objects that may contain Nones and expecting it to silently work. The fix depends entirely on the type of data you have, and no automated tool can handle that for you. Most likely, you didn’t mean to be sorting a heterogenous list in the first place.

Of course, you could always sort on type(x).__name__, but I don’t know why you would do that.

The sets module

Python 2.3 introduced its set types as Set and ImmutableSet in the sets module. Since Python 2.4, they’ve been built-in types, set and frozenset. The sets module is gone in Python 3, so just use the built-in names.

Creating exceptions

Python 2 allows you to do this:

1
raise RuntimeError, "an error happened at runtime!!"

There’s not really any good reason to do this, since you can just as well do:

1
raise RuntimeError("an error happened at runtime!!")

futurize --stage1 will rewrite the two-arg form to a regular object creation via the libfuturize.fixes.fix_raise fixer. It’ll also fix this alternative way of specifying an exception type, which is so bizarre and obscure that I did not know about it until I read the fixer’s source code:

1
raise (((A, B), C), ...)  # equivalent to `raise A` (?!)

Additionally, exceptions act like sequences in Python 2, but not in Python 3. You can just operate on the .args sequence directly, in either version. Alas, there’s no automated way to fix this.

Backticks

Did you know that `x` is equivalent to repr(x) in Python 2? Yeah, most people don’t. It’s super weird. futurize --stage1 will fix this with the lib2to3.fixes.fix_repr fixer.

has_key

Very old code may still be using somedict.has_key("foo"). "foo" in somedict has worked since Python 2.2. What are you doing. futurize --stage1 will fix this with the lib2to3.fixes.fix_has_key fixer.

<>

<> is equivalent to != in Python 2! This is an ancient, ancient holdover, and there’s no reason to still be using it. futurize --stage1 will fix this with the lib2to3.fixes.fix_ne fixer.

(You could also use from __future__ import barry_as_FLUFL, which restores <> in Python 3. It’s an easter egg. I’m joking. Please don’t actually do this.)

Things with easy Python 2 equivalents

These aren’t necessarily ancient, but they have an alternative you can just as well express in Python 2, so there’s no need to juggle 2 and 3.

Other ancient builtins

apply() is gone. Use the built-in syntax, f(*args, **kwargs).

callable() was briefly gone, but then came back in Python 3.2.

coerce() is gone; it was only used for old-style classes.

execfile() is gone. Read the file and pass its contents to exec() instead.

file() is gone; Python 3 has multiple file types, and a hierarchy of interfaces defined in the io module. Occasionally, code uses this as a synonym for open(), but you should really be using open() anyway.

intern() has been moved into the sys module, though I have no earthly idea why you’d be using it.

raw_input() has been renamed to input(), and the old ludicrous input() is gone. If you really need input(), please stop.

reduce() has been moved into the functools module, but it’s there in Python 2.6 as well.

reload() has been moved into the imp module. It’s unreliable garbage and you shouldn’t be using it anyway.

futurize --stage1 can fix several of these:

  • apply, via lib2to3.fixes.fix_apply
  • intern, via lib2to3.fixes.fix_intern
  • reduce, via lib2to3.fixes.fix_reduce

futurize --stage2 can also fix execfile via the libfuturize.fixes.fix_execfile fixer, which imports past.builtins.execfile. The 2to3 fixer uses an open() call, but the true correct fix is to use a with block.

futurize --stage2 has a couple of fixers for raw_input, but you can just as well import future.builtins.input or six.moves.input.

Nothing can fix coerce, which has no equivalent. Curiously, I don’t see a fixer for file, which is trivially fixed by replacing it with open. Nothing for reload, either.

Catching exceptions

Historically, the way to say “if there’s a ValueError, store it in e and run some code” was:

1
2
3
4
try:
    ...
except ValueError, e:
    ...

Unfortunately, that’s very easy to confuse with the syntax for catching two different types of exception:

1
2
except (ValueError, TypeError):
    ...

If you forget the parentheses, you’ll only catch ValueError, and the exception will be assigned to a variable called, er, TypeError. Whoops!

Python 3.0 introduced clearer syntax, which was also backported to Python 2.6:

1
2
except ValueError as e:
    ...

Python 3.0 finally removed the old syntax, so you must use the as form. futurize --stage1 will fix this with the lib2to3.fixes.fix_except fixer.

As an additional wrinkle, the extra variable e is deleted at the end of the block in Python 3, but not in Python 2. If you really need to refer to it after the block, just assign it to a different name.

(The reason for this is that captured exceptions contain a traceback in Python 3, and tracebacks contain the locals for the current frame, and those locals will contain the captured exception. The resulting cycle would keep all local variables alive until the cycle detector dealt with it, at least in CPython. Scrapping the exception as soon as it’s been dealt with was a simple way to keep this from accidentally happening all over the place. It usually doesn’t make sense to refer to a captured exception after the except block, anyway, since the variable may or may not even exist, and that’s generally weird and bad in Python.)

Octals

It’s not uncommon for a new programmer to try to zero-pad a set of numbers:

1
2
3
4
a = 07
b = 08
c = 09
d = 10

Of course, this will have the rather bizarre result that 08 is a SyntaxError, even though 07 works fine — because numbers starting with a 0 are parsed as octal.

This is a holdover from C, and it’s fairly surprising, since there’s virtually no reason to ever use octal. The only time I can ever remember using it is for passing file modes to chmod.

Python 3.0 requires octal literals to be prefixed with 0o, in line with 0x for hex and 0b for binary; literal integers starting with only a 0 are a syntax error. Python 2.6 supports both forms.

futurize --stage1 will fix this with the lib2to3.fixes.fix_numliterals fixer.

pickle

If you’re using the pickle module (which you shouldn’t be), and you intend to pass pickles back and forth between Python 2 and Python 3, there’s a small issue to be aware of. pickle has several different “protocol” versions, and the default version used in Python 3 is protocol 3, which Python 2 cannot read.

The fix is simple: just find where you’re calling pickle.dump() or pickle.dumps(), and pass a protocol argument of 2. Protocol 2 is the highest version supported by Python 2, and you probably want to be using it anyway, since it’s much more compact and faster to read/write than Python 2’s default, protocol 0.

You may be already using HIGHEST_PROTOCOL, but you’ll have the same problem: the highest protocol supported in any version of Python 3 is unreadable by Python 2.


A somewhat bigger problem is that if you pickle an instance of a user-defined class on Python 2, the pickle will record all its attributes as bytestrings, because that’s what they are in Python 2. Python 3 will then dutifully load the pickle and populate your object’s __dict__ with keys like b'foo'. obj.foo will then not actually exist, because obj.foo looks for the string 'foo', and 'foo' != b'foo' in Python 3.

Don’t use pickle, kids.

It’s possible to fix this, but also a huge pain in the ass. If you don’t know how, you definitely shouldn’t be using pickle.

Things that have a __future__ import

Occasionally, the syntax changed in an incompatible way, but the new syntax was still backported and hidden behind a __future__ import — Python’s mechanism for opting into syntax changes. You have to put such an import at the top of the file, optionally after a docstring, like this:

1
2
"""My super important module."""
from __future__ import with_statement

Ugh! Parentheses! Why, Guido, why?

The reason is that the print statement has incredibly goofy syntax, unlike anything else in the language:

1
print >>a, b, c,

You might not even recognize the >> bit, but it lets you print to a file other than sys.stdout. It’s baked specifically into the print syntax. Python 3 replaces this with a straightforward built-in function with a couple extra bells and whistles. The above would be written:

1
print(b, c, end='', file=a)

It’s slightly more verbose, but it’s also easier to tell what’s going on, and that teeny little comma at the end is now a more obvious keyword argument.

from __future__ import print_function will forget about the print statement for the rest of the file, and make the builtin print function available instead. futurize --stage1 will fix all uses of print and add the __future__ import, with the libfuturize.fixes.fix_print_with_import fixer. (There’s also a 2to3 fixer, but it doesn’t add the __future__ import, since it’s unnecessary in Python 3.)

A word of warning: do not just use print with parentheses without adding the __future__ import. This may appear to work in stock Python 2:

1
print("See, what's the problem?  This works fine!")

However, that’s parsed as the print statement followed by an expression in parentheses. It becomes more obvious if you try to print two values:

1
2
print("The answer is:", 3)
# ("The answer is:", 3)

Now you have a comma inside parentheses, which is a tuple, so the old print statement prints its repr.

Division always produces a float

Quick, what’s the answer here?

1
5 / 2

If you’re a normal human being, you’ll say 2.5 or 2½. Unfortunately, if you’re like Python and have been afflicted by C, you might say the answer is 2, because this is “integer division” — a bizarre and alien concept probably invented because CPUs didn’t have FPUs when C was first invented.

Python 3.0 decided that maybe contorting fundamental arithmetic to match the inadequacies of 1970s hardware is not the best idea, and so it changed division to always produce a float.

Since Python 2.6, from __future__ import division will alter the division operator to always do true division. If you want to do floor division, there’s a separate // operator, which has existed for ages; you can use it in Python 2 with or without the __future__ import.

Note that true division always produces a float, even if the result is integral: 6 / 3 is 2.0. On the other hand, floor division uses the same typing rules as C-style division: 5 // 2 is 2, but 5 // 2.0 is 2.0.

futurize --stage2 will “fix” this with the libfuturize.fixes.fix_division fixer, but unfortunately that just adds the __future__ import. With the --conservative option, it uses the libfuturize.fixes.fix_division_safe fixer instead, which imports past.utils.old_div, a forward-port of Python 2’s division operator.

The trouble here is that the new / always produces a float, and the new // always floors, but the old / sometimes did one and sometimes did the other. futurize can’t just replace all uses of / with //, because 5/2.0 is 2.5 but 5//2.0 is 2.0, and it can’t generally know what types the operands are.

You might be best off fixing this one manually — perhaps using fix_division_safe to find all the places you do division, then changing them to use the right operator.

Of course, the __div__ magic method is gone in Python 3, replaced explicitly by __floordiv__ (//) and __truediv__ (/). Both of those methods already exist in Python 2, and __truediv__ is even called when you use / in the presence of the future import, so being compatible is a simple matter of implementing all three and deferring to one of the others from __div__.

Relative imports

In Python 2, if you’re in the module foo.bar and say import quux, Python will look for a foo.quux before it looks for a top-level quux. The former behavior is called a relative import, though it might be more clearly called a sibling import. It’s troublesome for several reasons.

  • If you have a sibling called quux, and there’s also a top-level or standard library module called quux, you can’t import the latter. (There used to be a py.std module for providing indirect access to the standard library, for this very reason!)

  • If you import the top-level quux module, and then later add a foo.quux module, you’ll suddenly be importing a different module.

  • When reading the source code, it’s not clear which imports are siblings and which are top-level. In fact, the modules you get depend on the module you’re in, so moving or renaming a file may change its imports in non-obvious ways.

Python 3 eliminates this behavior: import quux always means the top-level module. It also adds syntax for “explicit relative” or “absolute relative” (yikes) imports: from . import quux or from .quux import somefunc explicitly means to look for a sibling named quux. (You can also use ..quux to look in the parent package, three dots to look in the grandparent, etc.)

The explicit syntax is supported since Python 2.5. The old sibling behavior can be disabled since Python 2.5 with from __future__ import absolute_import.

futurize --stage1 has a libfuturize.fixes.fix_absolute_import fixer, which attempts to detect sibling imports and convert them to explicit relative imports. If it finds any sibling imports, it’ll also add the __future__ line, though honestly you should make an effort to to put that line in all of your Python 2 code.

It’s possible for the futurize fixer to guess wrong about a sibling import, but in general it works pretty well.

(There is one case I’ve run across where simply replacing import sibling with from . import sibling didn’t work. Unfortunately, it was Yelp code that I no longer have access to, and I can’t remember the precise details. It involved having several sibling imports inside a __init__.py, where the siblings also imported from each other in complex ways. The sibling imports worked, but the explicit relative imports failed, for some really obscure timing reason. It’s even possible this was a 2.6 bug that’s been fixed in 2.7. If you see it, please let me know!)

Things that require some effort

These problems are a little more obscure, but many of them are also more difficult to fix automatically. If you have a massive codebase, these are where the problems start to appear.

The grand module shuffle

A whole bunch of modules were deleted, merged, or removed. A full list is in PEP 3108, but you’ll never have heard of most of them. Here are the ones that might affect you.

  • __builtin__ has been renamed to builtins. Note that this is a module, not the __builtins__ attribute of modules, which is exactly why it was renamed. Incidentally, you should be using the builtins module rather than __builtins__ anyway. Or, wait, no, just don’t use either, please don’t mess with the built-in scope.

  • ConfigParser has been renamed to configparser.

  • Queue has been renamed to queue.

  • SocketServer has been renamed to socketserver.

  • cStringIO and StringIO are gone; instead, use StringIO or BytesIO from the io module. Note that these also exist in Python 2, but are pure-Python rather than the C versions in current Python 3.

  • cPickle is gone. Importing pickle in Python 3 now gives you the C implementation automatically.

  • cProfile is gone. Importing profile in Python 3 gives you the C implementation automatically.

  • copy_reg has been renamed to copyreg.

  • anydbm, dbhash, dbm, dumbdm, gdbm, and whichdb have all been merged into a dbm package.

  • dummy_thread has become _dummy_thread. It’s an implementation of the _thread module that doesn’t actually do any threading. You should be using dummy_threading instead, I guess?

  • httplib has become http.client. BaseHTTPServer, CGIHTTPServer, and SimpleHTTPServer have been merged into a single http.server module. Cookie has become http.cookies. cookielib has become http.cookiejar.

  • repr has been renamed to reprlib. (The module, not the built-in function.)

  • thread has been renamed to _thread, and you should really be using the threading module instead.

  • A whole mess of top-level Tk modules have been combined into a tkinter package.

  • The contents of urllib, urllib2, and urlparse have been consolidated and then split into urllib.error, urllib.parse, and urllib.request.

  • xmlrpclib has become xmlrpc.client. DocXMLRPCServer and SimpleXMLRPCServer have been merged into xmlrpc.server.

futurize --stage2 will fix this with the somewhat invasive libfuturize.fixes.fix_future_standard_library fixer, which uses a mechanism from future that adds aliases to Python 2 to make all the Python 3 standard library names work. It’s an interesting idea, but it didn’t actually work for all cases when I tried it (though now I can’t recall what was broken), so YMMV.

Alternative, you could manually replace any affected imports with imports from six.moves, which provides aliases that work on either version.

Or as a last resort, you can just sprinkle try ... except ImportError around.

Built-in iterators are now lazy

filter, map, range, and zip are all lazy in Python 3. You can still iterate over their return values (once), but if you have code that expects to be able to index them or traverse them more than once, it’ll break in Python 3. (Well, not range, that’s fine.) The lazy equivalents — xrange and the functions in itertools — are of course gone in Python 3.

In either case, the easiest thing to do is force eager evaluation by wrapping the call in list() or tuple(), which you’ll occasionally need to do in Python 3 regardless.

For the sake of consistency, you may want to import the lazy versions from the standard library future_builtins module. It only exists in Python 2, so be sure to wrap the import in a try.

futurize --stage2 tries to address this with several of lib2to3s fixers, but the results aren’t particularly pleasing: calls to all four are unconditionally wrapped in list(), even in an obviously safe case like a for block. I’d just look through your uses of them manually.

A more subtle point: if you pass a string or tuple to Python 2’s filter, the return value will be the same type. Blindly wrapping the call in list() will of course change the behavior. Filtering a string is not a particularly common thing to do, but I’ve seen someone complain about it before, so take note.

Also, Python 3’s map stops at the shortest input sequence, whereas Python 2 extends shorter sequences with Nones. You can fix this with itertools.zip_longest (which in Python 2 is izip_longest!), but honestly, I’ve never even seen anyone pass multiple sequences to map.

Relatedly, dict.iteritems (plus its friends, iterkeys and itervalues) is gone in Python 3, as the plain items (plus keys and values) is already lazy. The dict.view* methods are also gone, as they were only backports of Python 3’s normal behavior.

Both six and future.utils contain functions called iteritems, etc., which provide a lazy iterator in both Python 2 and 3. They also offer view* functions, which are closer to the Python 3 behavior, though I can’t say I’ve ever seen anyone actually use dict.viewitems in real code.

Of course, if you explicitly want a list of dictionary keys (or items or values), list(d) and list(d.items()) do the same thing in both versions.

buffer is gone

The buffer type has been replaced by memoryview (also in Python 2.7), which is similar but not identical. If you’ve even heard of either of these types, you probably know more about the subtleties involved than I do. There’s a lib2to3.fixes.fix_buffer fixer that blindly replaces buffer with memoryview, but futurize doesn’t use it in either stage.

Several special methods were renamed

Where Python 2 has __str__ and __unicode__, Python 3 has __bytes__ and __str__. The trick is that __str__ should return the native str type for each version: a bytestring for Python 2, but a Unicode string for Python 3. Also, you almost certainly don’t want a __bytes__ method in Python 3, where bytes is no longer used for text.

Both six and python-future have a python_2_unicode_compatible class decorator that tries to do the right thing. You write only a single __str__ method that returns a Unicode string. In Python 3, that’s all you need, so the decorator does nothing; in Python 2, the decorator will rename your method to __unicode__ and add a __str__ that returns the same value encoded as UTF-8. If you need different behavior, you’ll have to roll it yourself with if PY2.


Python 2’s next method is more appropriately __next__ in Python 3. The easy way to address this is to call your method __next__, then alias it with next = __next__. Be sure you never call it directly as a method, only with the built-in next() function.

Alternatively, future.builtins contains an alternative next which always calls __next__, but on Python 2, it falls back to trying next if __next__ doesn’t exist.

futurize --stage1 changes all use of obj.next() to next(obj) via the libfuturize.fixes.fix_next_call fixer. futurize --stage2 renames next methods to __next__ via the lib2to3.fixes.fix_next fixer (which also fixes calls). Note that there’s a remote chance of false positives, if for some reason you happened to use next as a regular method name.


Python 2’s __nonzero__ is Python 3’s __bool__. Again, you can just alias it manually. Or futurize --stage2 will rename it with the lib2to3.fixes.fix_nonzero fixer.

Renaming it will of course break it in Python 2, but futurize --stage2 also has a libfuturize.fixes.fix_object fixer that imports python-future’s own builtins.object. The replacement object class has a few methods for making Python 3’s __str__, __next__, and __bool__ work on Python 2.

This is one of the mildly invasive things python-future does, and it may or may not sit well. Up to you.


__long__ is completely gone, as there is no long type in Python 3.

__getslice__, __setslice__, and __delslice__ are gone. Instead, slice objects are passed to __getitem__ and friends. On the off chance you use these, you’ll have to do something clever in the item methods to defer to your slice logic on Python 3.

__oct__ and __hex__ are gone; oct() and hex() now consult __index__. I seriously doubt this will impact anyone.

__div__ is gone, as mentioned previously.

Unbound methods are gone; function attributes renamed

Say you have this useless class.

1
2
3
class Foo(object):
    def bar(self):
        pass

In Python 2, Foo.bar is an “unbound method”, a type that’s generally unseen and unexposed other than as types.MethodType. In Python 3, Foo.bar is just a regular function.

Offhand, I can only think of one time this would matter: if you want to get at attributes on the function, perhaps for the sake of a method decorator. In Python 2, you have to go through the unbound method’s .im_func attribute to get the original function, but in Python 3, you already have the original function and can get the attributes directly.

If you’re doing this anywhere, an easy way to make it work in both versions is:

1
2
method = Foo.bar
method = getattr(method, 'im_func', method)

As for bound methods (the objects you get from accessing methods but not calling them, like [].append), the im_self and im_func attributes have been renamed to __self__ and __func__. Happily, these names also work in Python 2.6, so no compatibility hacks are necessary.

im_class is completely gone in Python 3. Methods have no interest in which class they’re attached to. They can’t, since the same function could easily be attached to more than one class. If you’re relying on im_class somehow, for some reason… well, don’t do that, maybe.

Relatedly, the func_* function attributes have been renamed to dunder names in Python 3, since assigning function attributes is a fairly common practice and Python doesn’t like to clog namespaces with its own builtin names. func_closure, func_code, func_defaults, func_dict, func_doc, func_globals, and func_name are now __closure__, __code__, etc. (Note that func_doc and func_name were already aliases for __doc__ and __name__, and func_defaults is much more easily inspected with the inspect module.) The new names are not available in Python 2, so you’ll need to do a getattr dance, or use the get_function_* functions from six.

Metaclass syntax has changed

In Python 2, a metaclass is declared by assigning to a special name in the class body:

1
2
3
class Foo(object):
    __metaclass__ = FooMeta
    ...

Admittedly, this doesn’t make a lot of sense. The metaclass affects how a class is created, and the class body is evaluated as part of that creation, so this is sort of a goofy hack.

Python 3 changed this, opening the door to a few new neat tricks in the process, which you can find out about in the companion article.

1
2
class Foo(object, metaclass=FooMeta):
    ...

The catch is finding a way to express this idea in both Python 2 and Python 3 — the old syntax is ignored in Python 3, and the new syntax is a syntax error in Python 2.

It’s a bit of a pain, but the class statement is really just a lot of sugar for calling the type() constructor; after all, Python classes are just instances of type. All you have to do is manually create an instance of your metaclass, rather than of type.

Fortunately, other people have already made this work for you. futurize --stage2 will fix this using the libfuturize.fixes.fix_metaclass fixer, which imports future.utils.with_metaclass and produces the following:

1
2
3
4
from future.utils import with_metaclass

class Foo(with_metaclass(object)):
    ...

This creates an intermediate dummy class with the right metaclass, which you then inherit from. Classes use the same metaclass as their parents, so this works fine in any Python.

If you don’t want to depend on python-future, the same function exists in the six module.

Re-raising exceptions has different syntax

raise with no arguments does the same thing in Python 2 and Python 3: it re-raises the exception currently being handled, preserving the original traceback.

The problem comes in with the three-argument form of raise, which is for preserving the traceback while raising a different exception. It might look like this:

1
2
3
4
try:
    some_fragile_function()
except Exception as e:
    raise MyLibraryError, MyLibraryError("Failed to do a thing: " + str(e)), sys.exc_info()[2]

sys.exc_info()[2] is, of course, the only way to get the current traceback in Python 2. You may have noticed that the three arguments to raise are the same three things that sys.exc_info() returns: the type, the value, and the traceback.

Python 3 introduces exception chaining. If something raises an exception from within an except block, Python will remember the original exception, attach it to the new one, and show both exceptions when printing a traceback — including both exceptions’ types, messages, and where they happened. So to wrap and rethrow an exception, you don’t need to do anything special at all.

1
2
3
4
try:
    some_fragile_function()
except Exception:
    raise MyLibraryError("Failed to do a thing")

For more complicated handling, you can also explicitly say raise new_exception from old_exception. Exceptions contain their associated tracebacks as a __traceback__ attribute in Python 3, so there’s no need to muck around getting the traceback manually. If you really want to give an explicit traceback, you can use the .with_traceback() method, which just assigns to __traceback__ and then returns self.

1
raise MyLibraryError("Failed to do a thing").with_traceback(some_traceback)

It’s hard to say what it even means to write code that works “equivalently” in both versions, because Python 3 handles this problem largely automatically, and Python 2 code tends to have a variety of ad-hoc solutions. Note that you cannot simply do this:

1
2
3
4
if PY3:
    raise MyLibraryError("Beep boop") from exc
else:
    raise MyLibraryError, MyLibraryError("Beep boop"), sys.exc_info()[2]

The first raise is a syntax error in Python 2, and the second is a syntax error in Python 3. if won’t protect you from parse errors. (On the other hand, you can hide .with_traceback() behind an if, since that’s just a regular method call and will parse with no issues.)

six has a reraise function that will smooth out the differences for you (probably by using exec). The drawback is that it’s of course Python 2-oriented syntax, and on Python 3 the final traceback will include more context than expected.

Alternatively, there’s a six.raise_from, which is designed around the raise X from Y syntax of Python 3. The drawback is that Python 2 has no obvious equivalent, so you just get raise X, losing the old exception and its traceback.

There’s no clear right approach here; it depends on how you’re handling re-raising. Code that just blindly raises new exceptions doesn’t need any changes, and will get exception chaining for free on Python 3. Code that does more elaborate things, like implementing its own form of chaining or storing exc_info tuples to be re-raised later, may need a little more care.

Bytestrings are sequences of integers

In Python 2, bytes is a synonym for str, the default string type. Iterating or indexing a bytes/str produces 1-character strs.

1
2
3
4
list(b'hello')  # ['h', 'e', 'l', 'l', 'o']
b'hello'[0:4]  # 'hell'
b'hello'[0]  # 'h'
b'hello'[0][0][0][0][0]  # 'h' -- it's turtles all the way down

In Python 3, bytes is a specialized type for handling binary data, not text. As such, iterating or indexing a bytes produces integers.

1
2
3
4
list(b'hello')  # [104, 101, 108, 108, 111]
b'hello'[0:4]  # b'hell'
b'hello'[0]  # 104
b'hello'[0][0][0][0]  # TypeError, since you can't index 104

If you have explicitly binary data that want to be bytes in Python 3, this may pose a bit of a problem. Aside from just checking the version explicitly and making heavy use of chr/ord, there are two approaches.

One is to use bytearray instead. This is like bytes, but mutable. More importantly, since it was introduced as a new type in Python 2.6 — after Python 3.0 came out — it has the same iterating and indexing behavior as Python 3’s bytes, even in Python 2.

1
bytearray(b'hello')[0]  # 104, on either Python 2 or 3

The other is to slice rather than index, since slicing always produces a new iterable of the same type. If you want to extract a single character from a bytes, just take a one-element slice.

1
2
b'hello'[0]  # 104
b'hello'[0:1]  # b'h'

Things that are just a royal pain in the ass

Unicode

Saving the best for last, almost!

Honestly, if your Python 2 code is already careful with Unicode — working with unicode internally, and encoding/decoding only at the “boundaries” of your code — then you shouldn’t have too many problems. If your code is not so careful, you should really try to make it a little more careful before you worry about Python 3, since Python 3’s whole jam is to force you to be careful.

See, in Python 2, you can combine bytestrings (str) and text strings (unicode) more or less freely. Python will automatically try to convert between the two using the “default encoding”, which is generally ascii. Python 3 makes text strings the default string type, demotes bytestrings, and forbids ever converting between them.

Most obviously, Python 2’s str and unicode have been renamed to bytes and str in Python 3. If you happen to be using the names anywhere, you’ll probably need to change them! six offers text_type and binary_type, though you can just use bytes to mean the same thing in either version. python-future also has backports for both Python 3’s bytes and str types, which seems like an extreme approach to me. Changing str to mean a text type even in Python 2 might be a good idea, though.

b'' and u'' work the same way in either Python 2 or 3, but unadorned strings like '' are always the str type, which has different behavior. There is a from __future__ import unicode_literals, which will cause unadorned strings to be unicode in Python 2, and this might work for you. However, this prevents you from writing literal “native” strings — strings of the same type Python uses for names, keyword arguments, etc. Usually this won’t matter, since Python 2 will silently convert between bytes and text, but it’s caused me the occasional problem.

The right thing to do is just explicitly mark every single string with either a b or u sigil as necessary. That just, you know, sucks. But you should be doing it even if you’re not porting to Python 3.

basestring is completely gone in Python 3. str and bytes have no common base type, and their semantics are different enough that it rarely makes sense to treat them the same way. If you’re using basestring in Python 2, it’s probably to allow code to work on either form of “text”, and you’ll only want to use str in Python 3 (where bytes are completely unsuitable for text). six.string_types provides exactly this. futurize --stage2 also runs the lib2to3.fixes.fix_basestring fixer, but this replaces basestring with str, which will almost certainly break your code in Python. If you intend to use stage 2, definitely audit your uses of basestring first.

As mentioned above, bytestrings are sequences of integers, which may affect code trying to work with explicitly binary data.

Python 2 has both .decode() and .encode() on both bytes and text; if you try to encode bytes or decode text, Python will try to implicitly convert to the right type first. In Python 3, only text has an .encode() and only bytes have a .decode().

Relatedly, Python 2 allows you to do some cute tricks with “encodings” that aren’t really encodings; for example, "hi".encode('hex') produces '6869'. In Python 3, encoding must produce bytes, and decoding must produce text, so these sorts of text-to-text or bytes-to-bytes translations aren’t allowed. You can still do them explicitly with the codecs module, e.g. codecs.encode(b'hi', 'hex'), which also works in Python 2, despite being undocumented. (Note that Python 3 specifically requires bytes for the hex codec, alas. If it’s any consolation, there’s a bytes.hex() method to do this directly, which you can’t use anyway if you’re targeting Python 2.)

Python 3’s open decodes as UTF-8 by default (a vast oversimplification, but usually), so if you’re manually decoding after reading, you’ll get an error in Python 3. You could explicitly open the file in binary mode (preserving the Python 2 behavior), or you could use codecs.open to decode transparently on read (preserving the Python 3 behavior). The same goes for writing.

sys.stdin, sys.stdout, and sys.stderr are all text streams in Python 3, so they have the same caveats as above, with the additional wrinkle that you didn’t actually open them yourself. Their .buffer attribute gives a handle opened in binary mode (Python 2 behavior), or you can adapt them to transcode transparently (Python 3 behavior):

1
2
3
4
if six.PY2:
    sys.stdin = codecs.getreader('utf-8')(sys.stdin)
    sys.stdout = codecs.getwriter('utf-8')(sys.stdout)
    sys.stderr = codecs.getwriter('utf-8')(sys.stderr)

A text-mode file’s .tell() in Python 3 still returns a number that can be passed back to .seek(), but the number is not necessarily meaningful, and in particular can’t be used to estimate progress through a file. (Python uses a few very high bits as flags to indicate the state of the decoder; if you mask them off, what’s left is probably the byte position in the file as you’d expect, but this is pretty definitively a hack.)

Python 3 likes to treat filenames as text, but most of the functions in os and os.path will accept either text or bytes as their arguments (and return a value of the same type), so you should be okay there.

os.environ has text keys and values in Python 3. If you direly need bytes, you can use os.environb (and os.getenvb()).

I think that covers most of the obvious basics. This is a whole sprawling topic that I can’t hope to cover off the top of my head. I’ve seen it be both fairly painful and completely pain-free, depending entirely on the state of the Python 2 codebase.

Oh, one final note: there’s a module for Python 2 called unicode-nazi (sorry, I didn’t name it) that will produce a warning anytime a bytestring is implicitly converted to a text string, or vice versa. It might help you root out places you’re accidentally slopping types back and forth, which will certainly break in Python 3. I’ve only tried it on a comically large project where it found thousands of violations, including plenty in surprising places in the standard library, so it may or may not be of any practical help.

Things that are not actually gone

String formatting with %

There’s a widespread belief that str % ... is deprecated, since there’s a newer and shinier str.format() method.

Well, it’s not. It’s not gone; it’s not deprecated; it still works just fine. I don’t like to use it, myself, since it’s easy to make accidentally ambiguous — "%s" % foo can crash if foo is a tuple! — but it’s not going anywhere. In fact, as of Python 3.5, bytes and bytearray support % but not .format.

optparse

argparse is certainly better, but the optparse module still exists in Python 3. It has been deprecated since Python 3.2, though.

Things that are preposterously obscure but that I have seen cause problems nonetheless

Tuple unpacking

A little-used feature of Python 2 is tuple unpacking in function arguments:

1
2
3
4
5
def foo(a, (b, c)):
    print a, b, c

x = (2, 3)
foo(1, x)

This syntax is gone in Python 3. I’ve rarely seen anyone use it, except in two cases. One was a parsing library that relied pretty critically on using it in every parsing function you wrote; whoops.

The other is when sorting a dict’s items:

1
sorted(d.items(), key=lambda (k, v): k + v)

In Python 3, you have to write that as lambda kv: kv[0] + kv[1]. Boo.

long is gone

Python 3 merged its long type with int, so now there’s only one integral type, called int.

Python 2 promotes int to long pretty much transparently, and longs aren’t very common in the first place, so it’s fairly unlikely that this will make a difference. On the off chance you’re type-checking for integers with isinstance(x, (int, long)) (and really, why are you doing that), you can just use six.integer_types instead.

Note that futurize --stage2 applies the lib2to3.fixes.fix_long fixer, which blindly renames long to int, leaving you with inappropriate code like isinstance(x, (int, int)).

However…

I have seen some very obscure cases where a hand-rolled binary protocol would encode ints and longs differently. My advice would be to not do that.

Oh, and a little-known feature of Python 2’s syntax is that you can have long literals by suffixing them with an L:

1
2
123  # int
123L  # long

You can write 1267650600228229401496703205376 directly in Python 2 code, and it’ll automatically create a long, so the only reason to do this is if you explicitly need a long with a small value like 1. If that’s the case, something has gone catastrophically wrong.

repr changes

These should really only affect you if you’re using reprs as expected test output (or, god forbid, as cache keys or something). Some notable changes:

  • Unicode strings have a u prefix in Python 2. In Python 3, of course, Unicode strings are just strings, so there’s no prefix.

  • Conversely, bytestrings have a b prefix in Python 3, but not in Python 2 (though the b prefix is allowed in source code).

  • Python 2 escapes all non-ASCII characters, even in the repr of a Unicode string. Python 3 only escapes control characters and codepoints considered non-printing.

  • Large integers and explicit longs have an L suffix in Python 2, but not in Python 3, where there is no separate long type.

  • A set becomes set([1, 2, 3]) in Python 2, but {1, 2, 3} in Python 3. The set literal syntax is allowed in source code in Python 2.7, but the repr wasn’t changed until 3.0.

  • floats stringify to the shortest possible representation that has the same underlying value — e.g., str(1.1) is '1.1' rather than '1.1000000000000001'. This change was backported to Python 2.7 as well, but I have seen it break tests.

Hash randomization

Python has traditionally had a predictable hashing mechanism: repr(dict(a=1, b=2, c=3)) will always produce the same string. (On the same platform with the same Python version, at least.) Unfortunately this opens the door to an obscure DoS exploit that was known to Perl long ago: if you know a web application is written in Python, you can construct a query string that will become a dict whose keys all go in the same hash bucket. If your query string is long enough and you send enough requests, you can tie up all the Python processes in dealing with hash collisions.

The fix is hash randomization, which seeds the hashing algorithm in such a way that items are bucketed differently every time Python runs. It’s available in Python 2.7 via an environment variable or the -R argument, but it wasn’t turned on by default until Python 3.3.

The fear was that it might break things. Naturally, it has broken things. Mostly, reprs in tests. But it also changes the iteration order of dicts between Python runs. I have seen code using dicts whose keys happened to always be sorted in alphabetical or insertion order before, but with hash randomization, the keys were of course in a different order every time the code ran. The author assumed that Python had somehow broken dict sorting (which it has never had).

nonlocal

Python 3 introduces the nonlocal keyword, which is like global except it looks through all outer scopes in the expected order. It fixes this mild annoyance:

1
2
3
4
5
6
7
def make_function():
    counter = 0
    def function():
        nonlocal counter
        counter += 1  # without 'nonlocal', this declares a new local!
        print("I've been called", counter, "times!")
    return function

The problem is that any use of assignment within a function automatically creates a new local, and locals are known statically for the entire body of the function. (They actually affect how functions are compiled, in CPython.) So without nonlocal, the above code would see counter += 1, but counter is a new local that has never been assigned a value, so Python cannot possibly add 1 to it, and you get an UnboundLocalError.

nonlocal tells Python that when it sees an assignment of a name that exists in some outer scope, it should reuse that outer variable rather than shadowing it. Great, right? Purely a new feature. No problem.

Unfortunately, I’ve worked on a codebase that needed this feature in Python 2, and decided to fake it with a class… named nonlocal.

1
2
3
4
5
6
7
def make_function():
    class nonlocal:
        counter = 0
    def function():
        nonlocal.counter += 1  # this alters an outer value in-place, so it's fine
        print("I've been called", counter, "times!")
    return function

The class here is used purely as a dummy container. Assigning to an attribute doesn’t create any locals, because it’s equivalent to a method call, so the operand must already exist. This is a slightly quirky approach, but it works fine.

Except that, of course, nonlocal is a keyword in Python 3, so this becomes complete gibberish. It’s such gibberish that (if I remember correctly) 2to3 actually cannot parse it, even though it’s perfectly valid Python 2 code.

I don’t have a magical fix for this one. Just, uh, don’t name things nonlocal.

List comprehensions no longer leak

Python 2 has the slightly inconsistent behavior that loop variables in a generator expression ((...)) are scoped to the generator expression, but loop variables in a list comprehension ([...]) belong to the enclosing scope.

The only reason is in implementation details: a list comprehension acts like a for loop, which has the same behavior, whereas a generator expression actually creates a generator internally.

Python 3 brings these cases into line: loop variables in list comprehensions (or dict or set comprehensions) are also scoped to the comprehension itself.

I cannot imagine any possible reason why this would affect you negatively, and yet, I can swear I’ve seen it happen. I wish I could remember where, because I’m sure it’s an exciting story.

cStringIO.h is gone

cStringIO.h is a private and undocumented C interface to Python 2’s cStringIO.StringIO type. It was removed in Python 3, or at least is somewhere I can’t find it.

This was one of the reasons Thrift’s Python 3 port took almost 3 years: Thrift has a “fast” C module that makes use of this private interface, and it’s not obvious how to replace it. I think they ended up just having the module not exist on Python 3, so Python 3 will just be mysteriously slower.

Some troublesome libraries

MySQLdb is some ancient, clunky, noncompliant, underdocumented trash, much like the database it connects to. It’s nigh abandoned, though it still promises Python 3 support in the MySQLdb 2.0 vaporware. I would suggest not using MySQL, but barring that, try mysqlclient, a fork of MySQLdb that continues development and adds Python 3 support. (The same people also maintain an earlier project, pymysql, which strives to be a pure-Python drop-in replacement for MySQLdb — it’s not quite perfect, but its existence is interesting and it’s sure easier to read than MySQLdb.)

At a glance, Thrift still hasn’t had a release since it merged Python 3 support, eight months ago. It’s some enterprise nightmare, anyway, and bizarrely does code generation for a bunch of dynamic languages. Might I suggest just using the pure-Python thriftpy, which parses Thrift definitions on the fly?

Twisted is, ah, large and complex. Parts of it now support Python 3; parts of it do not. If you need the parts that don’t, well, maybe you could give them a hand?

M2Crypto is working on it, though I’m pretty sure most Python crypto nerds would advise you to use cryptography instead.

And so on

You may find any number of other obscure compatibility problems, just as you might when upgrading from 2.6 to 2.7. The Python community has a lot of clever people willing to help you out, though, and they’ve probably even seen your super duper niche problem before.

Don’t let that, or this list of gotchas in general, dissaude you! Better to start now than later; even fixing an integer division gets you one step closer to having your code run on Python 3 as well.