Tag Archives: Position

AWS Resources Addressing Argentina’s Personal Data Protection Law and Disposition No. 11/2006

Post Syndicated from Leandro Bennaton original https://aws.amazon.com/blogs/security/aws-and-resources-addressing-argentinas-personal-data-protection-law-and-disposition-no-112006/

We have two new resources to help customers address their data protection requirements in Argentina. These resources specifically address the needs outlined under the Personal Data Protection Law No. 25.326, as supplemented by Regulatory Decree No. 1558/2001 (“PDPL”), including Disposition No. 11/2006. For context, the PDPL is an Argentine federal law that applies to the protection of personal data, including during transfer and processing.

A new webpage focused on data privacy in Argentina features FAQs, helpful links, and whitepapers that provide an overview of PDPL considerations, as well as our security assurance frameworks and international certifications, including ISO 27001, ISO 27017, and ISO 27018. You’ll also find details about our Information Request Report and the high bar of security at AWS data centers.

Additionally, we’ve released a new workbook that offers a detailed mapping as to how customers can operate securely under the Shared Responsibility Model while also aligning with Disposition No. 11/2006. The AWS Disposition 11/2006 Workbook can be downloaded from the Argentina Data Privacy page or directly from this link. Both resources are also available in Spanish from the Privacidad de los datos en Argentina page.

Want more AWS Security news? Follow us on Twitter.

 

Hiring a Director of Sales

Post Syndicated from Yev original https://www.backblaze.com/blog/hiring-a-director-of-sales/

Backblaze is hiring a Director of Sales. This is a critical role for Backblaze as we continue to grow the team. We need a strong leader who has experience in scaling a sales team and who has an excellent track record for exceeding goals by selling Software as a Service (SaaS) solutions. In addition, this leader will need to be highly motivated, as well as able to create and develop a highly-motivated, success oriented sales team that has fun and enjoys what they do.

The History of Backblaze from our CEO
In 2007, after a friend’s computer crash caused her some suffering, we realized that with every photo, video, song, and document going digital, everyone would eventually lose all of their information. Five of us quit our jobs to start a company with the goal of making it easy for people to back up their data.

Like many startups, for a while we worked out of a co-founder’s one-bedroom apartment. Unlike most startups, we made an explicit agreement not to raise funding during the first year. We would then touch base every six months and decide whether to raise or not. We wanted to focus on building the company and the product, not on pitching and slide decks. And critically, we wanted to build a culture that understood money comes from customers, not the magical VC giving tree. Over the course of 5 years we built a profitable, multi-million dollar revenue business — and only then did we raise a VC round.

Fast forward 10 years later and our world looks quite different. You’ll have some fantastic assets to work with:

  • A brand millions recognize for openness, ease-of-use, and affordability.
  • A computer backup service that stores over 500 petabytes of data, has recovered over 30 billion files for hundreds of thousands of paying customers — most of whom self-identify as being the people that find and recommend technology products to their friends.
  • Our B2 service that provides the lowest cost cloud storage on the planet at 1/4th the price Amazon, Google or Microsoft charges. While being a newer product on the market, it already has over 100,000 IT and developers signed up as well as an ecosystem building up around it.
  • A growing, profitable and cash-flow positive company.
  • And last, but most definitely not least: a great sales team.

You might be saying, “sounds like you’ve got this under control — why do you need me?” Don’t be misled. We need you. Here’s why:

  • We have a great team, but we are in the process of expanding and we need to develop a structure that will easily scale and provide the most success to drive revenue.
  • We just launched our outbound sales efforts and we need someone to help develop that into a fully successful program that’s building a strong pipeline and closing business.
  • We need someone to work with the marketing department and figure out how to generate more inbound opportunities that the sales team can follow up on and close.
  • We need someone who will work closely in developing the skills of our current sales team and build a path for career growth and advancement.
  • We want someone to manage our Customer Success program.

So that’s a bit about us. What are we looking for in you?

Experience: As a sales leader, you will strategically build and drive the territory’s sales pipeline by assembling and leading a skilled team of sales professionals. This leader should be familiar with generating, developing and closing software subscription (SaaS) opportunities. We are looking for a self-starter who can manage a team and make an immediate impact of selling our Backup and Cloud Storage solutions. In this role, the sales leader will work closely with the VP of Sales, marketing staff, and service staff to develop and implement specific strategic plans to achieve and exceed revenue targets, including new business acquisition as well as build out our customer success program.

Leadership: We have an experienced team who’s brought us to where we are today. You need to have the people and management skills to get them excited about working with you. You need to be a strong leader and compassionate about developing and supporting your team.

Data driven and creative: The data has to show something makes sense before we scale it up. However, without creativity, it’s easy to say “the data shows it’s impossible” or to find a local maximum. Whether it’s deciding how to scale the team, figuring out what our outbound sales efforts should look like or putting a plan in place to develop the team for career growth, we’ve seen a bit of creativity get us places a few extra dollars couldn’t.

Jive with our culture: Strong leaders affect culture and the person we hire for this role may well shape, not only fit into, ours. But to shape the culture you have to be accepted by the organism, which means a certain set of shared values. We default to openness with our team, our customers, and everyone if possible. We love initiative — without arrogance or dictatorship. We work to create a place people enjoy showing up to work. That doesn’t mean ping pong tables and foosball (though we do try to have perks & fun), but it means people are friendly, non-political, working to build a good service but also a good place to work.

Do the work: Ideas and strategy are critical, but good execution makes them happen. We’re looking for someone who can help the team execute both from the perspective of being capable of guiding and organizing, but also someone who is hands-on themselves.

Additional Responsibilities needed for this role:

  • Recruit, coach, mentor, manage and lead a team of sales professionals to achieve yearly sales targets. This includes closing new business and expanding upon existing clientele.
  • Expand the customer success program to provide the best customer experience possible resulting in upsell opportunities and a high retention rate.
  • Develop effective sales strategies and deliver compelling product demonstrations and sales pitches.
  • Acquire and develop the appropriate sales tools to make the team efficient in their daily work flow.
  • Apply a thorough understanding of the marketplace, industry trends, funding developments, and products to all management activities and strategic sales decisions.
  • Ensure that sales department operations function smoothly, with the goal of facilitating sales and/or closings; operational responsibilities include accurate pipeline reporting and sales forecasts.
  • This position will report directly to the VP of Sales and will be staffed in our headquarters in San Mateo, CA.

Requirements:

  • 7 – 10+ years of successful sales leadership experience as measured by sales performance against goals.
    Experience in developing skill sets and providing career growth and opportunities through advancement of team members.
  • Background in selling SaaS technologies with a strong track record of success.
  • Strong presentation and communication skills.
  • Must be able to travel occasionally nationwide.
  • BA/BS degree required

Think you want to join us on this adventure?
Send an email to jobscontact@backblaze.com with the subject “Director of Sales.” (Recruiters and agencies, please don’t email us.) Include a resume and answer these two questions:

  1. How would you approach evaluating the current sales team and what is your process for developing a growth strategy to scale the team?
  2. What are the goals you would set for yourself in the 3 month and 1-year timeframes?

Thank you for taking the time to read this and I hope that this sounds like the opportunity for which you’ve been waiting.

Backblaze is an Equal Opportunity Employer.

The post Hiring a Director of Sales appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Директивата за авторско право: ход на ревизията: да се действа сега

Post Syndicated from nellyo original https://nellyo.wordpress.com/2018/05/26/copyright-5/

Ново развитие в ревизията на авторското право в ЕС – става ясно от  съобщенията на българското председателство, участници в ревизията и Юлия Реда – защото тя имаше много ясен възглед какво иска да се промени в правната рамка (общ режим на изключенията, актуализиране – за да имаме правна рамка, адекватна на технологичното развитие) – и сега следи ангажирано законодателния процес.

Правителствата на държавите от ЕС  са приели позиция  относно реформата на авторските права  без съществени промени по чл.11 (новото право за издателите)  и чл.13 (филтрите на входа), проектът е на сайта на Реда,  Politico дава измененията, засягащи правото на издателите, в цвят.

Сега Парламентът трябва да ги спре, пише Реда.

 Сега имате шанса да окажете влияние – шанс, който ще изчезне след две години, когато всички “изведнъж” ще се сблъскат с предизвикателството да се  внедряват филтри   и link tax.  Експертите почти единодушно се съгласяват, че проектът за реформата на авторското право е наистина лош.

Update: Member State governments have just adopted their position on #copyright, with no significant changes to the #CensorshipMachines and #LinkTax provisions. It is now up to Parliament to stop them and #FixCopyright. https://t.co/1JwNvQn24n pic.twitter.com/KAgqV3YYG1

https://platform.twitter.com/widgets.js

Две графики от сайта на Реда – за двата текста,  против които се събира подкрепа (вж и преподавателите) – за  отношението по държави и по партии в ЕП:

 

 

The Benefits of Side Projects

Post Syndicated from Bozho original https://techblog.bozho.net/the-benefits-of-side-projects/

Side projects are the things you do at home, after work, for your own “entertainment”, or to satisfy your desire to learn new stuff, in case your workplace doesn’t give you that opportunity (or at least not enough of it). Side projects are also a way to build stuff that you think is valuable but not necessarily “commercialisable”. Many side projects are open-sourced sooner or later and some of them contribute to the pool of tools at other people’s disposal.

I’ve outlined one recommendation about side projects before – do them with technologies that are new to you, so that you learn important things that will keep you better positioned in the software world.

But there are more benefits than that – serendipitous benefits, for example. And I’d like to tell some personal stories about that. I’ll focus on a few examples from my list of side projects to show how, through a sort-of butterfly effect, they helped shape my career.

The computoser project, no matter how cool algorithmic music composition, didn’t manage to have much of a long term impact. But it did teach me something apart from niche musical theory – how to read a bulk of scientific papers (mostly computer science) and understand them without being formally trained in the particular field. We’ll see how that was useful later.

Then there was the “State alerts” project – a website that scraped content from public institutions in my country (legislation, legislation proposals, decisions by regulators, new tenders, etc.), made them searchable, and “subscribable” – so that you get notified when a keyword of interest is mentioned in newly proposed legislation, for example. (I obviously subscribed for “information technologies” and “electronic”).

And that project turned out to have a significant impact on the following years. First, I chose a new technology to write it with – Scala. Which turned out to be of great use when I started working at TomTom, and on the 3rd day I was transferred to a Scala project, which was way cooler and much more complex than the original one I was hired for. It was a bit ironic, as my colleagues had just read that “I don’t like Scala” a few weeks earlier, but nevertheless, that was one of the most interesting projects I’ve worked on, and it went on for two years. Had I not known Scala, I’d probably be gone from TomTom much earlier (as the other project was restructured a few times), and I would not have learned many of the scalability, architecture and AWS lessons that I did learn there.

But the very same project had an even more important follow-up. Because if its “civic hacking” flavour, I was invited to join an informal group of developers (later officiated as an NGO) who create tools that are useful for society (something like MySociety.org). That group gathered regularly, discussed both tools and policies, and at some point we put up a list of policy priorities that we wanted to lobby policy makers. One of them was open source for the government, the other one was open data. As a result of our interaction with an interim government, we donated the official open data portal of my country, functioning to this day.

As a result of that, a few months later we got a proposal from the deputy prime minister’s office to “elect” one of the group for an advisor to the cabinet. And we decided that could be me. So I went for it and became advisor to the deputy prime minister. The job has nothing to do with anything one could imagine, and it was challenging and fascinating. We managed to pass legislation, including one that requires open source for custom projects, eID and open data. And all of that would not have been possible without my little side project.

As for my latest side project, LogSentinel – it became my current startup company. And not without help from the previous two mentioned above – the computer science paper reading was of great use when I was navigating the crypto papers landscape, and from the government job I not only gained invaluable legal knowledge, but I also “got” a co-founder.

Some other side projects died without much fanfare, and that’s fine. But the ones above shaped my “story” in a way that would not have been possible otherwise.

And I agree that such serendipitous chain of events could have happened without side projects – I could’ve gotten these opportunities by meeting someone at a bar (unlikely, but who knows). But we, as software engineers, are capable of tilting chance towards us by utilizing our skills. Side projects are our “extracurricular activities”, and they often lead to unpredictable, but rather positive chains of events. They would rarely be the only factor, but they are certainly great at unlocking potential.

The post The Benefits of Side Projects appeared first on Bozho's tech blog.

Connect, collaborate, and learn at AWS Global Summits in 2018

Post Syndicated from Tina Kelleher original https://aws.amazon.com/blogs/big-data/connect-collaborate-and-learn-at-aws-global-summits-in-2018/

Regardless of your career path, there’s no denying that attending industry events can provide helpful career development opportunities — not only for improving and expanding your skill sets, but for networking as well. According to this article from PayScale.com, experts estimate that somewhere between 70-85% of new positions are landed through networking.

Narrowing our focus to networking opportunities with cloud computing professionals who’re working on tackling some of today’s most innovative and exciting big data solutions, attending big data-focused sessions at an AWS Global Summit is a great place to start.

AWS Global Summits are free events that bring the cloud computing community together to connect, collaborate, and learn about AWS. As the name suggests, these summits are held in major cities around the world, and attract technologists from all industries and skill levels who’re interested in hearing from AWS leaders, experts, partners, and customers.

In addition to networking opportunities with top cloud technology providers, consultants and your peers in our Partner and Solutions Expo, you’ll also hone your AWS skills by attending and participating in a multitude of education and training opportunities.

Here’s a brief sampling of some of the upcoming sessions relevant to big data professionals:

May 31st : Big Data Architectural Patterns and Best Practices on AWS | AWS Summit – Mexico City

June 6th-7th: Various (click on the “Big Data & Analytics” header) | AWS Summit – Berlin

June 20-21st : [email protected] | Public Sector Summit – Washington DC

June 21st: Enabling Self Service for Data Scientists with AWS Service Catalog | AWS Summit – Sao Paulo

Be sure to check out the main page for AWS Global Summits, where you can see which cities have AWS Summits planned for 2018, register to attend an upcoming event, or provide your information to be notified when registration opens for a future event.

Creating a 1.3 Million vCPU Grid on AWS using EC2 Spot Instances and TIBCO GridServer

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/creating-a-1-3-million-vcpu-grid-on-aws-using-ec2-spot-instances-and-tibco-gridserver/

Many of my colleagues are fortunate to be able to spend a good part of their day sitting down with and listening to our customers, doing their best to understand ways that we can better meet their business and technology needs. This information is treated with extreme care and is used to drive the roadmap for new services and new features.

AWS customers in the financial services industry (often abbreviated as FSI) are looking ahead to the Fundamental Review of Trading Book (FRTB) regulations that will come in to effect between 2019 and 2021. Among other things, these regulations mandate a new approach to the “value at risk” calculations that each financial institution must perform in the four hour time window after trading ends in New York and begins in Tokyo. Today, our customers report this mission-critical calculation consumes on the order of 200,000 vCPUs, growing to between 400K and 800K vCPUs in order to meet the FRTB regulations. While there’s still some debate about the magnitude and frequency with which they’ll need to run this expanded calculation, the overall direction is clear.

Building a Big Grid
In order to make sure that we are ready to help our FSI customers meet these new regulations, we worked with TIBCO to set up and run a proof of concept grid in the AWS Cloud. The periodic nature of the calculation, along with the amount of processing power and storage needed to run it to completion within four hours, make it a great fit for an environment where a vast amount of cost-effective compute power is available on an on-demand basis.

Our customers are already using the TIBCO GridServer on-premises and want to use it in the cloud. This product is designed to run grids at enterprise scale. It runs apps in a virtualized fashion, and accepts requests for resources, dynamically provisioning them on an as-needed basis. The cloud version supports Amazon Linux as well as the PostgreSQL-compatible edition of Amazon Aurora.

Working together with TIBCO, we set out to create a grid that was substantially larger than the current high-end prediction of 800K vCPUs, adding a 50% safety factor and then rounding up to reach 1.3 million vCPUs (5x the size of the largest on-premises grid). With that target in mind, the account limits were raised as follows:

  • Spot Instance Limit – 120,000
  • EBS Volume Limit – 120,000
  • EBS Capacity Limit – 2 PB

If you plan to create a grid of this size, you should also bring your friendly local AWS Solutions Architect into the loop as early as possible. They will review your plans, provide you with architecture guidance, and help you to schedule your run.

Running the Grid
We hit the Go button and launched the grid, watching as it bid for and obtained Spot Instances, each of which booted, initialized, and joined the grid within two minutes. The test workload used the Strata open source analytics & market risk library from OpenGamma and was set up with their assistance.

The grid grew to 61,299 Spot Instances (1.3 million vCPUs drawn from 34 instance types spanning 3 generations of EC2 hardware) as planned, with just 1,937 instances reclaimed and automatically replaced during the run, and cost $30,000 per hour to run, at an average hourly cost of $0.078 per vCPU. If the same instances had been used in On-Demand form, the hourly cost to run the grid would have been approximately $93,000.

Despite the scale of the grid, prices for the EC2 instances did not move during the bidding process. This is due to the overall size of the AWS Cloud and the smooth price change model that we launched late last year.

To give you a sense of the compute power, we computed that this grid would have taken the #1 position on the TOP 500 supercomputer list in November 2007 by a considerable margin, and the #2 position in June 2008. Today, it would occupy position #360 on the list.

I hope that you enjoyed this AWS success story, and that it gives you an idea of the scale that you can achieve in the cloud!

Jeff;

Converting a Kodak Box Brownie into a digital camera

Post Syndicated from Rob Zwetsloot original https://www.raspberrypi.org/blog/kodak-brownie-camera/

In this article from The MagPi issue 69, David Crookes explains how Daniel Berrangé took an old Kodak Brownie from the 1950s and turned it into a quirky digital camera. Get your copy of The MagPi magazine in stores now, or download it as a free PDF here.

Daniel Berrangé Kodak Brownie Raspberry Pi Camera

The Kodak Box Brownie

When Kodak unveiled its Box Brownie in 1900, it did so with the slogan ‘You press the button, we do the rest.’ The words referred to the ease-of-use of what was the world’s first mass-produced camera. But it could equally apply to Daniel Berrangé’s philosophy when modifying it for the 21st century. “I wanted to use the Box Brownie’s shutter button to trigger image capture, and make it simple to use,” he tells us.

Daniel Berrangé Kodak Brownie Raspberry Pi Camera

Daniel’s project grew from a previous effort in which he placed a pinhole webcam inside a ladies’ powder compact case. “The Box Brownie project is essentially a repeat of that design but with a normal lens instead of a pinhole, a real camera case, and improved software to enable a shutter button. Ideally, it would look unchanged from when it was shooting film.”

Webcam woes

At first, Daniel looked for a cheap webcam, intending to spend no more than the price of a Pi Zero. This didn’t work out too well. “The low-light performance of the webcam was not sufficient to make a pinhole camera so I just decided to make a ‘normal’ digital camera instead,” he reveals.
To that end, he began removing some internal components from the Box Brownie. “With the original lens removed, the task was to position the webcam’s electronic light sensor (the CCD) and lens as close to the front of the camera as possible,” Daniel explains. “In the end, the CCD was about 15 mm away from the front aperture of the camera, giving a field of view that was approximately the same as the unmodified camera would achieve.”

Daniel Berrangé Kodak Brownie Raspberry Pi Camera
Daniel Berrangé Kodak Brownie Raspberry Pi Camera
Daniel Berrangé Kodak Brownie Raspberry Pi Camera

It was then time for him to insert the Raspberry Pi, upon which was a custom ‘init’ binary that loads a couple of kernel modules to run the webcam, mount the microSD file system, and launch the application binary. Here, Daniel found he was in luck. “I’d noticed that the size of a 620 film spool (63 mm) was effectively the same as the width of a Raspberry Pi Zero (65 mm), so it could be held in place between the film spool grips,” he recalls. “It was almost as if it was designed with this in mind.”

Shutter success

In order to operate the camera, Daniel had to work on the shutter button. “The Box Brownie’s shutter button is entirely mechanical, driven by a handful of levers and springs,” Daniel explains. “First, the Pi Zero needs to know when the shutter button is pressed and second, the physical shutter has to be open while the webcam is capturing the image. Rather than try to synchronise image capture with the fraction of a second that the physical shutter is open, a bit of electrical tape was used on the shutter mechanism to keep it permanently open.”

Daniel Berrangé Kodak Brownie Raspberry Pi Camera

Daniel made use of the Pi Zero’s GPIO pins to detect the pressing of the shutter button. It determines if each pin is at 0 or 5 volts. “My thought was that I could set a GPIO pin high to 5 V, and then use the action of the shutter button to short it to ground, and detect this change in level from software.”

This initially involved using a pair of bare wires and some conductive paint, although the paint was later replaced by a piece of tinfoil. But with the button pressed, the GPIO pin level goes to zero and the device constantly captures still images until the button is released. All that’s left to do is smile and take the perfect snap.

The post Converting a Kodak Box Brownie into a digital camera appeared first on Raspberry Pi.

Hard Drive Stats for Q1 2018

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-stats-for-q1-2018/

Backblaze Drive Stats Q1 2018

As of March 31, 2018 we had 100,110 spinning hard drives. Of that number, there were 1,922 boot drives and 98,188 data drives. This review looks at the quarterly and lifetime statistics for the data drive models in operation in our data centers. We’ll also take a look at why we are collecting and reporting 10 new SMART attributes and take a sneak peak at some 8 TB Toshiba drives. Along the way, we’ll share observations and insights on the data presented and we look forward to you doing the same in the comments.

Background

Since April 2013, Backblaze has recorded and saved daily hard drive statistics from the drives in our data centers. Each entry consists of the date, manufacturer, model, serial number, status (operational or failed), and all of the SMART attributes reported by that drive. Currently there are about 97 million entries totaling 26 GB of data. You can download this data from our website if you want to do your own research, but for starters here’s what we found.

Hard Drive Reliability Statistics for Q1 2018

At the end of Q1 2018 Backblaze was monitoring 98,188 hard drives used to store data. For our evaluation below we remove from consideration those drives which were used for testing purposes and those drive models for which we did not have at least 45 drives. This leaves us with 98,046 hard drives. The table below covers just Q1 2018.

Q1 2018 Hard Drive Failure Rates

Notes and Observations

If a drive model has a failure rate of 0%, it only means there were no drive failures of that model during Q1 2018.

The overall Annualized Failure Rate (AFR) for Q1 is just 1.2%, well below the Q4 2017 AFR of 1.65%. Remember that quarterly failure rates can be volatile, especially for models that have a small number of drives and/or a small number of Drive Days.

There were 142 drives (98,188 minus 98,046) that were not included in the list above because we did not have at least 45 of a given drive model. We use 45 drives of the same model as the minimum number when we report quarterly, yearly, and lifetime drive statistics.

Welcome Toshiba 8TB drives, almost…

We mentioned Toshiba 8 TB drives in the first paragraph, but they don’t show up in the Q1 Stats chart. What gives? We only had 20 of the Toshiba 8 TB drives in operation in Q1, so they were excluded from the chart. Why do we have only 20 drives? When we test out a new drive model we start with the “tome test” and it takes 20 drives to fill one tome. A tome is the same drive model in the same logical position in each of the 20 Storage Pods that make up a Backblaze Vault. There are 60 tomes in each vault.

In this test, we created a Backblaze Vault of 8 TB drives, with 59 of the tomes being Seagate 8 TB drives and 1 tome being the Toshiba drives. Then we monitored the performance of the vault and its member tomes to see if, in this case, the Toshiba drives performed as expected.

Q1 2018 Hard Drive Failure Rate — Toshiba 8TB

So far the Toshiba drive is performing fine, but they have been in place for only 20 days. Next up is the “pod test” where we fill a Storage Pod with Toshiba drives and integrate it into a Backblaze Vault comprised of like-sized drives. We hope to have a better look at the Toshiba 8 TB drives in our Q2 report — stay tuned.

Lifetime Hard Drive Reliability Statistics

While the quarterly chart presented earlier gets a lot of interest, the real test of any drive model is over time. Below is the lifetime failure rate chart for all the hard drive models which have 45 or more drives in operation as of March 31st, 2018. For each model, we compute their reliability starting from when they were first installed.

Lifetime Hard Drive Failure Rates

Notes and Observations

The failure rates of all of the larger drives (8-, 10- and 12 TB) are very good, 1.2% AFR (Annualized Failure Rate) or less. Many of these drives were deployed in the last year, so there is some volatility in the data, but you can use the Confidence Interval to get a sense of the failure percentage range.

The overall failure rate of 1.84% is the lowest we have ever achieved, besting the previous low of 2.00% from the end of 2017.

Our regular readers and drive stats wonks may have noticed a sizable jump in the number of HGST 8 TB drives (model: HUH728080ALE600), from 45 last quarter to 1,045 this quarter. As the 10 TB and 12 TB drives become more available, the price per terabyte of the 8 TB drives has gone down. This presented an opportunity to purchase the HGST drives at a price in line with our budget.

We purchased and placed into service the 45 original HGST 8 TB drives in Q2 of 2015. They were our first Helium-filled drives and our only ones until the 10 TB and 12 TB Seagate drives arrived in Q3 2017. We’ll take a first look into whether or not Helium makes a difference in drive failure rates in an upcoming blog post.

New SMART Attributes

If you have previously worked with the hard drive stats data or plan to, you’ll notice that we added 10 more columns of data starting in 2018. There are 5 new SMART attributes we are tracking each with a raw and normalized value:

  • 177 – Wear Range Delta
  • 179 – Used Reserved Block Count Total
  • 181- Program Fail Count Total or Non-4K Aligned Access Count
  • 182 – Erase Fail Count
  • 235 – Good Block Count AND System(Free) Block Count

The 5 values are all related to SSD drives.

Yes, SSD drives, but before you jump to any conclusions, we used 10 Samsung 850 EVO SSDs as boot drives for a period of time in Q1. This was an experiment to see if we could reduce boot up time for the Storage Pods. In our case, the improved boot up speed wasn’t worth the SSD cost, but it did add 10 new columns to the hard drive stats data.

Speaking of hard drive stats data, the complete data set used to create the information used in this review is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose, all we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone. It is free.

If you just want the summarized data used to create the tables and charts in this blog post, you can download the ZIP file containing the MS Excel spreadsheet.

Good luck and let us know if you find anything interesting.

[Ed: 5/1/2018 – Updated Lifetime chart to fix error in confidence interval for HGST 4TB drive, model: HDS5C4040ALE630]

The post Hard Drive Stats for Q1 2018 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Secure Build with AWS CodeBuild and LayeredInsight

Post Syndicated from Asif Khan original https://aws.amazon.com/blogs/devops/secure-build-with-aws-codebuild-and-layeredinsight/

This post is written by Asif Awan, Chief Technology Officer of Layered InsightSubin Mathew – Software Development Manager for AWS CodeBuild, and Asif Khan – Solutions Architect

Enterprises adopt containers because they recognize the benefits: speed, agility, portability, and high compute density. They understand how accelerating application delivery and deployment pipelines makes it possible to rapidly slipstream new features to customers. Although the benefits are indisputable, this acceleration raises concerns about security and corporate compliance with software governance. In this blog post, I provide a solution that shows how Layered Insight, the pioneer and global leader in container-native application protection, can be used with seamless application build and delivery pipelines like those available in AWS CodeBuild to address these concerns.

Layered Insight solutions

Layered Insight enables organizations to unify DevOps and SecOps by providing complete visibility and control of containerized applications. Using the industry’s first embedded security approach, Layered Insight solves the challenges of container performance and protection by providing accurate insight into container images, adaptive analysis of running containers, and automated enforcement of container behavior.

 

AWS CodeBuild

AWS CodeBuild is a fully managed build service that compiles source code, runs tests, and produces software packages that are ready to deploy. With CodeBuild, you don’t need to provision, manage, and scale your own build servers. CodeBuild scales continuously and processes multiple builds concurrently, so your builds are not left waiting in a queue. You can get started quickly by using prepackaged build environments, or you can create custom build environments that use your own build tools.

 

Problem Definition

Security and compliance concerns span the lifecycle of application containers. Common concerns include:

Visibility into the container images. You need to verify the software composition information of the container image to determine whether known vulnerabilities associated with any of the software packages and libraries are included in the container image.

Governance of container images is critical because only certain open source packages/libraries, of specific versions, should be included in the container images. You need support for mechanisms for blacklisting all container images that include a certain version of a software package/library, or only allowing open source software that come with a specific type of license (such as Apache, MIT, GPL, and so on). You need to be able to address challenges such as:

·       Defining the process for image compliance policies at the enterprise, department, and group levels.

·       Preventing the images that fail the compliance checks from being deployed in critical environments, such as staging, pre-prod, and production.

Visibility into running container instances is critical, including:

·       CPU and memory utilization.

·       Security of the build environment.

·       All activities (system, network, storage, and application layer) of the application code running in each container instance.

Protection of running container instances that is:

·       Zero-touch to the developers (not an SDK-based approach).

·       Zero touch to the DevOps team and doesn’t limit the portability of the containerized application.

·       This protection must retain the option to switch to a different container stack or orchestration layer, or even to a different Container as a Service (CaaS ).

·       And it must be a fully automated solution to SecOps, so that the SecOps team doesn’t have to manually analyze and define detailed blacklist and whitelist policies.

 

Solution Details

In AWS CodeCommit, we have three projects:
●     “Democode” is a simple Java application, with one buildspec to build the app into a Docker container (run by build-demo-image CodeBuild project), and another to instrument said container (instrument-image CodeBuild project). The resulting container is stored in ECR repo javatestasjavatest:20180415-layered. This instrumented container is running in AWS Fargate cluster demo-java-appand can be seen in the Layered Insight runtime console as the javatestapplication in us-east-1.
●     aws-codebuild-docker-imagesis a clone of the official aws-codebuild-docker-images repo on GitHub . This CodeCommit project is used by the build-python-builder CodeBuild project to build the python 3.3.6 codebuild image and is stored at the codebuild-python ECR repo. We then manually instructed the Layered Insight console to instrument the image.
●     scan-java-imagecontains just a buildspec.yml file. This file is used by the scan-java-image CodeBuild project to instruct Layered Assessment to perform a vulnerability scan of the javatest container image built previously, and then run the scan results through a compliance policy that states there should be no medium vulnerabilities. This build fails — but in this case that is a success: the scan completes successfully, but compliance fails as there are medium-level issues found in the scan.

This build is performed using the instrumented version of the Python 3.3.6 CodeBuild image, so the activity of the processes running within the build are recorded each time within the LI console.

Build container image

Create or use a CodeCommit project with your application. To build this image and store it in Amazon Elastic Container Registry (Amazon ECR), add a buildspec file to the project and build a container image and create a CodeBuild project.

Scan container image

Once the image is built, create a new buildspec in the same project or a new one that looks similar to below (update ECR URL as necessary):

version: 0.2
phases:
  pre_build:
    commands:
      - echo Pulling down LI Scan API client scripts
      - git clone https://github.com/LayeredInsight/scan-api-example-python.git
      - echo Setting up LI Scan API client
      - cd scan-api-example-python
      - pip install layint_scan_api
      - pip install -r requirements.txt
  build:
    commands:
      - echo Scanning container started on `date`
      - IMAGEID=$(./li_add_image --name <aws-region>.amazonaws.com/javatest:20180415)
      - ./li_wait_for_scan -v --imageid $IMAGEID
      - ./li_run_image_compliance -v --imageid $IMAGEID --policyid PB15260f1acb6b2aa5b597e9d22feffb538256a01fbb4e5a95

Add the buildspec file to the git repo, push it, and then build a CodeBuild project using with the instrumented Python 3.3.6 CodeBuild image at <aws-region>.amazonaws.com/codebuild-python:3.3.6-layered. Set the following environment variables in the CodeBuild project:
●     LI_APPLICATIONNAME – name of the build to display
●     LI_LOCATION – location of the build project to display
●     LI_API_KEY – ApiKey:<key-name>:<api-key>
●     LI_API_HOST – location of the Layered Insight API service

Instrument container image

Next, to instrument the new container image:

  1. In the Layered Insight runtime console, ensure that the ECR registry and credentials are defined (click the Setup icon and the ‘+’ sign on the top right of the screen to add a new container registry). Note the name given to the registry in the console, as this needs to be referenced in the li_add_imagecommand in the script, below.
  2. Next, add a new buildspec (with a new name) to the CodeCommit project, such as the one shown below. This code will download the Layered Insight runtime client, and use it to instruct the Layered Insight service to instrument the image that was just built:
    version: 0.2
    phases:
    pre_build:
    commands:
    echo Pulling down LI API Runtime client scripts
    git clone https://github.com/LayeredInsight/runtime-api-example-python
    echo Setting up LI API client
    cd runtime-api-example-python
    pip install layint-runtime-api
    pip install -r requirements.txt
    build:
    commands:
    echo Instrumentation started on `date`
    ./li_add_image --registry "Javatest ECR" --name IMAGE_NAME:TAG --description "IMAGE DESCRIPTION" --policy "Default Policy" --instrument --wait --verbose
  3. Commit and push the new buildspec file.
  4. Going back to CodeBuild, create a new project, with the same CodeCommit repo, but this time select the new buildspec file. Use a Python 3.3.6 builder – either the AWS or LI Instrumented version.
  5. Click Continue
  6. Click Save
  7. Run the build, again on the master branch.
  8. If everything runs successfully, a new image should appear in the ECR registry with a -layered suffix. This is the instrumented image.

Run instrumented container image

When the instrumented container is now run — in ECS, Fargate, or elsewhere — it will log data back to the Layered Insight runtime console. It’s appearance in the console can be modified by setting the LI_APPLICATIONNAME and LI_LOCATION environment variables when running the container.

Conclusion

In the above blog we have provided you steps needed to embed governance and runtime security in your build pipelines running on AWS CodeBuild using Layered Insight.

 

 

 

A Day in the Life of Michele, Human Resources Coordinator at Backblaze

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/day-in-life-human-resources-coordinator/

Michele, HR Coordinator at Backblaze

Most of the time this blog is dedicated to cloud storage and computer backup topics, but we also want our readers to understand the culture and people at Backblaze who all contribute to keeping our company running and making it an enjoyable place to work. We invited our HR Coordinator, Michele, to talk about how she spends her day searching for great candidates to fill employment positions at Backblaze.

What’s a Typical Day for Michele at Backblaze?

After I’ve had a yummy cup of coffee — maybe with a honey and splash of half and half, I’ll generally start my day reviewing resumes and contacting potential candidates to set up an initial phone screen.

When I start the process of filling a position, I’ll spend a lot of time on the phone speaking with potential candidates. During a phone screen call we’ll chat about their experience, background and what they are ideally looking for in their next position. I also ask about what they like to do outside of work, and most importantly, how they feel about office dogs. A candidate may not always look great on paper, but could turn out to be a great cultural fit after speaking with them about their previous experience and what they’re passionate about.

Next, I push strong candidates to the subsequent steps with the hiring managers, which range from setting up a second phone screen, to setting up a Google hangout for completing coding tasks, to scheduling in-person interviews with the team.

At the end of the day after an in-person interview, I’ll check in with all the interviewers to debrief and decide how to proceed with the candidate. Everyone that interviewed the candidate will get together to give feedback. Is there a good cultural fit? Are they someone we’d like to work with? Keeping in contact with the candidates throughout the process and making sure they are organized and informed is a big part of my job. No one likes to wait around and wonder where they are in the process.

In between all the madness, I’ll put together offer letters, send out onboarding paperwork and links, and get all the necessary signatures to move forward.

On the candidate’s first day, I’ll go over benefits and the handbook and make sure everything is going smoothly in their overall orientation as they transition into their new role here at Backblaze!

What Makes Your Job Exciting?

  • I get to speak with many different types of people and see what makes them tick and if they’d be a good fit at Backblaze
  • The fast pace of the job
  • Being constantly kept busy with different tasks including supporting the FUN committee by researching venues and ideas for family day and the holiday party
  • I work on enjoyable projects like creating a people wall for new hires so we are able to put a face to the name
  • Getting to take a mini road trip up to Sacramento each month to check in with the data center employees
  • Constantly learning more and more about the job, the people, and the company

We’re growing rapidly and always looking for great people to join our team at Backblaze. Our team places a premium on open communications, being cleverly unconventional, and helping each other out.

Oh! We also offer competitive salaries, stock options, and amazing benefits.

Which Job Openings are You Currently Trying to Fill?

We are currently looking for the following positions. If you’re interested, please review the job description on our jobs page and then contact me at jobscontact@backblaze.com.

  • Engineering Director
  • Senior Java Engineer
  • Senior Software Engineer
  • Desktop and Laptop Windows Client Programmer
  • Senior Systems Administrator
  • Sales Development Representative

Thanks Michele!

The post A Day in the Life of Michele, Human Resources Coordinator at Backblaze appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

OMG The Stupid It Burns

Post Syndicated from Robert Graham original https://blog.erratasec.com/2018/04/omg-stupid-it-burns.html

This article, pointed out by @TheGrugq, is stupid enough that it’s worth rebutting.

The article starts with the question “Why did the lessons of Stuxnet, Wannacry, Heartbleed and Shamoon go unheeded?“. It then proceeds to ignore the lessons of those things.
Some of the actual lessons should be things like how Stuxnet crossed air gaps, how Wannacry spread through flat Windows networking, how Heartbleed comes from technical debt, and how Shamoon furthers state aims by causing damage.
But this article doesn’t cover the technical lessons. Instead, it thinks the lesson should be the moral lesson, that we should take these things more seriously. But that’s stupid. It’s the sort of lesson people teach you that know nothing about the topic. When you have nothing of value to contribute to a topic you can always take the moral high road and criticize everyone for being morally weak for not taking it more seriously. Obviously, since doctors haven’t cured cancer yet, it’s because they don’t take the problem seriously.
The article continues to ignore the lesson of these cyber attacks and instead regales us with a list of military lessons from WW I and WW II. This makes the same flaw that many in the military make, trying to understand cyber through analogies with the real world. It’s not that such lessons could have no value, it’s that this article contains a poor list of them. It seems to consist of a random list of events that appeal to the author rather than events that have bearing on cybersecurity.
Then, in case we don’t get the point, the article bullies us with hyperbole, cliches, buzzwords, bombastic language, famous quotes, and citations. It’s hard to see how most of them actually apply to the text. Rather, it seems like they are included simply because he really really likes them.
The article invests much effort in discussing the buzzword “OODA loop”. Most attacks in cyberspace don’t have one. Instead, attackers flail around, trying lots of random things, overcoming defense with brute-force rather than an understanding of what’s going on. That’s obviously the case with Wannacry: it was an accident, with the perpetrator experimenting with what would happen if they added the ETERNALBLUE exploit to their existing ransomware code. The consequence was beyond anybody’s ability to predict.
You might claim that this is just the first stage, that they’ll loop around, observe Wannacry’s effects, orient themselves, decide, then act upon what they learned. Nope. Wannacry burned the exploit. It’s essentially removed any vulnerable systems from the public Internet, thereby making it impossible to use what they learned. It’s still active a year later, with infected systems behind firewalls busily scanning the Internet so that if you put a new system online that’s vulnerable, it’ll be taken offline within a few hours, before any other evildoer can take advantage of it.
See what I’m doing here? Learning the actual lessons of things like Wannacry? The thing the above article fails to do??
The article has a humorous paragraph on “defense in depth”, misunderstanding the term. To be fair, it’s the cybersecurity industry’s fault: they adopted then redefined the term. That’s why there’s two separate articles on Wikipedia: one for the old military term (as used in this article) and one for the new cybersecurity term.
As used in the cybersecurity industry, “defense in depth” means having multiple layers of security. Many organizations put all their defensive efforts on the perimeter, and none inside a network. The idea of “defense in depth” is to put more defenses inside the network. For example, instead of just one firewall at the edge of the network, put firewalls inside the network to segment different subnetworks from each other, so that a ransomware infection in the customer support computers doesn’t spread to sales and marketing computers.
The article talks about exploiting WiFi chips to bypass the defense in depth measures like browser sandboxes. This is conflating different types of attacks. A WiFi attack is usually considered a local attack, from somebody next to you in bar, rather than a remote attack from a server in Russia. Moreover, far from disproving “defense in depth” such WiFi attacks highlight the need for it. Namely, phones need to be designed so that successful exploitation of other microprocessors (namely, the WiFi, Bluetooth, and cellular baseband chips) can’t directly compromise the host system. In other words, once exploited with “Broadpwn”, a hacker would need to extend the exploit chain with another vulnerability in the hosts Broadcom WiFi driver rather than immediately exploiting a DMA attack across PCIe. This suggests that if PCIe is used to interface to peripherals in the phone that an IOMMU be used, for “defense in depth”.
Cybersecurity is a young field. There are lots of useful things that outsider non-techies can teach us. Lessons from military history would be well-received.
But that’s not this story. Instead, this story is by an outsider telling us we don’t know what we are doing, that they do, and then proceeds to prove they don’t know what they are doing. Their argument is based on a moral suasion and bullying us with what appears on the surface to be intellectual rigor, but which is in fact devoid of anything smart.
My fear, here, is that I’m going to be in a meeting where somebody has read this pretentious garbage, explaining to me why “defense in depth” is wrong and how we need to OODA faster. I’d rather nip this in the bud, pointing out if you found anything interesting from that article, you are wrong.

Build a house in Minecraft using Python

Post Syndicated from Rob Zwetsloot original https://www.raspberrypi.org/blog/build-minecraft-house-using-python/

In this tutorial from The MagPi issue 68, Steve Martin takes us through the process of house-building in Minecraft Pi. Get your copy of The MagPi in stores now, or download it as a free PDF here.

Minecraft Pi is provided for free as part of the Raspbian operating system. To start your Minecraft: Pi Edition adventures, try our free tutorial Getting started with Minecraft.

Minecraft Raspberry Pi

Writing programs that create things in Minecraft is not only a great way to learn how to code, but it also means that you have a program that you can run again and again to make as many copies of your Minecraft design as you want. You never need to worry about your creation being destroyed by your brother or sister ever again — simply rerun your program and get it back! Whilst it might take a little longer to write the program than to build one house, once it’s finished you can build as many houses as you want.

Co-ordinates in Minecraft

Let’s start with a review of the coordinate system that Minecraft uses to know where to place blocks. If you are already familiar with this, you can skip to the next section. Otherwise, read on.

Minecraft Raspberry Pi Edition

Plan view of our house design

Minecraft shows us a three-dimensional (3D) view of the world. Imagine that the room you are in is the Minecraft world and you want to describe your location within that room. You can do so with three numbers, as follows:

  • How far across the room are you? As you move from side to side, you change this number. We can consider this value to be our X coordinate.
  • How high off the ground are you? If you are upstairs, or if you jump, this value increases. We can consider this value to be our Y coordinate.
  • How far into the room are you? As you walk forwards or backwards, you change this number. We can consider this value to be our Z coordinate.

You might have done graphs in school with X going across the page and Y going up the page. Coordinates in Minecraft are very similar, except that we have an extra value, Z, for our third dimension. Don’t worry if this still seems a little confusing: once we start to build our house, you will see how these three dimensions work in Minecraft.

Designing our house

It is a good idea to start with a rough design for our house. This will help us to work out the values for the coordinates when we are adding doors and windows to our house. You don’t have to plan every detail of your house right away. It is always fun to enhance it once you have got the basic design written. The image above shows the plan view of the house design that we will be creating in this tutorial. Note that because this is a plan view, it only shows the X and Z co-ordinates; we can’t see how high anything is. Hopefully, you can imagine the house extending up from the screen.

We will build our house close to where the Minecraft player is standing. This a good idea when creating something in Minecraft with Python, as it saves us from having to walk around the Minecraft world to try to find our creation.

Starting our program

Type in the code as you work through this tutorial. You can use any editor you like; we would suggest either Python 3 (IDLE) or Thonny Python IDE, both of which you can find on the Raspberry Pi menu under Programming. Start by selecting the File menu and creating a new file. Save the file with a name of your choice; it must end with .py so that the Raspberry Pi knows that it is a Python program.

It is important to enter the code exactly as it is shown in the listing. Pay particular attention to both the spelling and capitalisation (upper- or lower-case letters) used. You may find that when you run your program the first time, it doesn’t work. This is very common and just means there’s a small error somewhere. The error message will give you a clue about where the error is.

It is good practice to start all of your Python programs with the first line shown in our listing. All other lines that start with a # are comments. These are ignored by Python, but they are a good way to remind us what the program is doing.

The two lines starting with from tell Python about the Minecraft API; this is a code library that our program will be using to talk to Minecraft. The line starting mc = creates a connection between our Python program and the game. Then we get the player’s location broken down into three variables: x, y, and z.

Building the shell of our house

To help us build our house, we define three variables that specify its width, height, and depth. Defining these variables makes it easy for us to change the size of our house later; it also makes the code easier to understand when we are setting the co-ordinates of the Minecraft bricks. For now, we suggest that you use the same values that we have; you can go back and change them once the house is complete and you want to alter its design.

It’s now time to start placing some bricks. We create the shell of our house with just two lines of code! These lines of code each use the setBlocks command to create a complete block of bricks. This function takes the following arguments:

setBlocks(x1, y1, z1, x2, y2, z2, block-id, data)

x1, y1, and z1 are the coordinates of one corner of the block of bricks that we want to create; x1, y1, and z1 are the coordinates of the other corner. The block-id is the type of block that we want to use. Some blocks require another value called data; we will see this being used later, but you can ignore it for now.

We have to work out the values that we need to use in place of x1, y1, z1, x1, y1, z1 for our walls. Note that what we want is a larger outer block made of bricks and that is filled with a slightly smaller block of air blocks. Yes, in Minecraft even air is actually just another type of block.

Once you have typed in the two lines that create the shell of your house, you almost ready to run your program. Before doing so, you must have Minecraft running and displaying the contents of your world. Do not have a world loaded with things that you have created, as they may get destroyed by the house that we are building. Go to a clear area in the Minecraft world before running the program. When you run your program, check for any errors in the ‘console’ window and fix them, repeatedly running the code again until you’ve corrected all the errors.

You should see a block of bricks now, as shown above. You may have to turn the player around in the Minecraft world before you can see your house.

Adding the floor and door

Now, let’s make our house a bit more interesting! Add the lines for the floor and door. Note that the floor extends beyond the boundary of the wall of the house; can you see how we achieve this?

Hint: look closely at how we calculate the x and z attributes as compared to when we created the house shell above. Also note that we use a value of y-1 to create the floor below our feet.

Minecraft doors are two blocks high, so we have to create them in two parts. This is where we have to use the data argument. A value of 0 is used for the lower half of the door, and a value of 8 is used for the upper half (the part with the windows in it). These values will create an open door. If we add 4 to each of these values, a closed door will be created.

Before you run your program again, move to a new location in Minecraft to build the house away from the previous one. Then run it to check that the floor and door are created; you will need to fix any errors again. Even if your program runs without errors, check that the floor and door are positioned correctly. If they aren’t, then you will need to check the arguments so setBlock and setBlocks are exactly as shown in the listing.

Adding windows

Hopefully you will agree that your house is beginning to take shape! Now let’s add some windows. Looking at the plan for our house, we can see that there is a window on each side; see if you can follow along. Add the four lines of code, one for each window.

Now you can move to yet another location and run the program again; you should have a window on each side of the house. Our house is starting to look pretty good!

Adding a roof

The final stage is to add a roof to the house. To do this we are going to use wooden stairs. We will do this inside a loop so that if you change the width of your house, more layers are added to the roof. Enter the rest of the code. Be careful with the indentation: I recommend using spaces and avoiding the use of tabs. After the if statement, you need to indent the code even further. Each indentation level needs four spaces, so below the line with if on it, you will need eight spaces.

Since some of these code lines are lengthy and indented a lot, you may well find that the text wraps around as you reach the right-hand side of your editor window — don’t worry about this. You will have to be careful to get those indents right, however.

Now move somewhere new in your world and run the complete program. Iron out any last bugs, then admire your house! Does it look how you expect? Can you make it better?

Customising your house

Now you can start to customise your house. It is a good idea to use Save As in the menu to save a new version of your program. Then you can keep different designs, or refer back to your previous program if you get to a point where you don’t understand why your new one doesn’t work.

Consider these changes:

  • Change the size of your house. Are you able also to move the door and windows so they stay in proportion?
  • Change the materials used for the house. An ice house placed in an area of snow would look really cool!
  • Add a back door to your house. Or make the front door a double-width door!

We hope that you have enjoyed writing this program to build a house. Now you can easily add a house to your Minecraft world whenever you want to by simply running this program.

Get the complete code for this project here.

Continue your Minecraft journey

Minecraft Pi’s programmable interface is an ideal platform for learning Python. If you’d like to try more of our free tutorials, check out:

You may also enjoy Martin O’Hanlon’s and David Whale’s Adventures in Minecraft, and the Hacking and Making in Minecraft MagPi Essentials guide, which you can download for free or buy in print here.

The post Build a house in Minecraft using Python appeared first on Raspberry Pi.

Engineering deep dive: Encoding of SCTs in certificates

Post Syndicated from Let's Encrypt - Free SSL/TLS Certificates original https://letsencrypt.org/2018/04/04/sct-encoding.html

<p>Let&rsquo;s Encrypt recently <a href="https://community.letsencrypt.org/t/signed-certificate-timestamps-embedded-in-certificates/57187">launched SCT embedding in
certificates</a>.
This feature allows browsers to check that a certificate was submitted to a
<a href="https://en.wikipedia.org/wiki/Certificate_Transparency">Certificate Transparency</a>
log. As part of the launch, we did a thorough review
that the encoding of Signed Certificate Timestamps (SCTs) in our certificates
matches the relevant specifications. In this post, I&rsquo;ll dive into the details.
You&rsquo;ll learn more about X.509, ASN.1, DER, and TLS encoding, with references to
the relevant RFCs.</p>

<p>Certificate Transparency offers three ways to deliver SCTs to a browser: In a
TLS extension, in stapled OCSP, or embedded in a certificate. We chose to
implement the embedding method because it would just work for Let&rsquo;s Encrypt
subscribers without additional work. In the SCT embedding method, we submit
a &ldquo;precertificate&rdquo; with a <a href="#poison">poison extension</a> to a set of
CT logs, and get back SCTs. We then issue a real certificate based on the
precertificate, with two changes: The poison extension is removed, and the SCTs
obtained earlier are added in another extension.</p>

<p>Given a certificate, let&rsquo;s first look for the SCT list extension. According to CT (<a href="https://tools.ietf.org/html/rfc6962#section-3.3">RFC 6962
section 3.3</a>),
the extension OID for a list of SCTs is <code>1.3.6.1.4.1.11129.2.4.2</code>. An <a href="http://www.hl7.org/Oid/information.cfm">OID (object
ID)</a> is a series of integers, hierarchically
assigned and globally unique. They are used extensively in X.509, for instance
to uniquely identify extensions.</p>

<p>We can <a href="https://acme-v01.api.letsencrypt.org/acme/cert/031f2484307c9bc511b3123cb236a480d451">download an example certificate</a>,
and view it using OpenSSL (if your OpenSSL is old, it may not display the
detailed information):</p>

<pre><code>$ openssl x509 -noout -text -inform der -in Downloads/031f2484307c9bc511b3123cb236a480d451

CT Precertificate SCTs:
Signed Certificate Timestamp:
Version : v1(0)
Log ID : DB:74:AF:EE:CB:29:EC:B1:FE:CA:3E:71:6D:2C:E5:B9:
AA:BB:36:F7:84:71:83:C7:5D:9D:4F:37:B6:1F:BF:64
Timestamp : Mar 29 18:45:07.993 2018 GMT
Extensions: none
Signature : ecdsa-with-SHA256
30:44:02:20:7E:1F:CD:1E:9A:2B:D2:A5:0A:0C:81:E7:
13:03:3A:07:62:34:0D:A8:F9:1E:F2:7A:48:B3:81:76:
40:15:9C:D3:02:20:65:9F:E9:F1:D8:80:E2:E8:F6:B3:
25:BE:9F:18:95:6D:17:C6:CA:8A:6F:2B:12:CB:0F:55:
FB:70:F7:59:A4:19
Signed Certificate Timestamp:
Version : v1(0)
Log ID : 29:3C:51:96:54:C8:39:65:BA:AA:50:FC:58:07:D4:B7:
6F:BF:58:7A:29:72:DC:A4:C3:0C:F4:E5:45:47:F4:78
Timestamp : Mar 29 18:45:08.010 2018 GMT
Extensions: none
Signature : ecdsa-with-SHA256
30:46:02:21:00:AB:72:F1:E4:D6:22:3E:F8:7F:C6:84:
91:C2:08:D2:9D:4D:57:EB:F4:75:88:BB:75:44:D3:2F:
95:37:E2:CE:C1:02:21:00:8A:FF:C4:0C:C6:C4:E3:B2:
45:78:DA:DE:4F:81:5E:CB:CE:2D:57:A5:79:34:21:19:
A1:E6:5B:C7:E5:E6:9C:E2
</code></pre>

<p>Now let&rsquo;s go a little deeper. How is that extension represented in
the certificate? Certificates are expressed in
<a href="https://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One">ASN.1</a>,
which generally refers to both a language for expressing data structures
and a set of formats for encoding them. The most common format,
<a href="https://en.wikipedia.org/wiki/X.690#DER_encoding">DER</a>,
is a tag-length-value format. That is, to encode an object, first you write
down a tag representing its type (usually one byte), then you write
down a number expressing how long the object is, then you write down
the object contents. This is recursive: An object can contain multiple
objects within it, each of which has its own tag, length, and value.</p>

<p>One of the cool things about DER and other tag-length-value formats is that you
can decode them to some degree without knowing what they mean. For instance, I
can tell you that 0x30 means the data type &ldquo;SEQUENCE&rdquo; (a struct, in ASN.1
terms), and 0x02 means &ldquo;INTEGER&rdquo;, then give you this hex byte sequence to
decode:</p>

<pre><code>30 06 02 01 03 02 01 0A
</code></pre>

<p>You could tell me right away that decodes to:</p>

<pre><code>SEQUENCE
INTEGER 3
INTEGER 10
</code></pre>

<p>Try it yourself with this great <a href="https://lapo.it/asn1js/#300602010302010A">JavaScript ASN.1
decoder</a>. However, you wouldn&rsquo;t know
what those integers represent without the corresponding ASN.1 schema (or
&ldquo;module&rdquo;). For instance, if you knew that this was a piece of DogData, and the
schema was:</p>

<pre><code>DogData ::= SEQUENCE {
legs INTEGER,
cutenessLevel INTEGER
}
</code></pre>

<p>You&rsquo;d know this referred to a three-legged dog with a cuteness level of 10.</p>

<p>We can take some of this knowledge and apply it to our certificates. As a first
step, convert the above certificate to hex with
<code>xxd -ps &lt; Downloads/031f2484307c9bc511b3123cb236a480d451</code>. You can then copy
and paste the result into
<a href="https://lapo.it/asn1js">lapo.it/asn1js</a> (or use <a href="https://lapo.it/asn1js/#3082062F30820517A0030201020212031F2484307C9BC511B3123CB236A480D451300D06092A864886F70D01010B0500304A310B300906035504061302555331163014060355040A130D4C6574277320456E6372797074312330210603550403131A4C6574277320456E637279707420417574686F72697479205833301E170D3138303332393137343530375A170D3138303632373137343530375A302D312B3029060355040313223563396137662E6C652D746573742E686F66666D616E2D616E64726577732E636F6D30820122300D06092A864886F70D01010105000382010F003082010A0282010100BCEAE8F504D9D91FCFC69DB943254A7FED7C6A3C04E2D5C7DDD010CBBC555887274489CA4F432DCE6D7AB83D0D7BDB49C466FBCA93102DC63E0EB1FB2A0C50654FD90B81A6CB357F58E26E50F752BF7BFE9B56190126A47409814F59583BDD337DFB89283BE22E81E6DCE13B4E21FA6009FC8A7F903A17AB05C8BED85A715356837E849E571960A8999701EAE9CE0544EAAB936B790C3C35C375DB18E9AA627D5FA3579A0FB5F8079E4A5C9BE31C2B91A7F3A63AFDFEDB9BD4EA6668902417D286BE4BBE5E43CD9FE1B8954C06F21F5C5594FD3AB7D7A9CBD6ABF19774D652FD35C5718C25A3BA1967846CED70CDBA95831CF1E09FF7B8014E63030CE7A776750203010001A382032A30820326300E0603551D0F0101FF0404030205A0301D0603551D250416301406082B0601050507030106082B06010505070302300C0603551D130101FF04023000301D0603551D0E041604148B3A21ABADF50C4B30DCCD822724D2C4B9BA29E3301F0603551D23041830168014A84A6A63047DDDBAE6D139B7A64565EFF3A8ECA1306F06082B0601050507010104633061302E06082B060105050730018622687474703A2F2F6F6373702E696E742D78332E6C657473656E63727970742E6F7267302F06082B060105050730028623687474703A2F2F636572742E696E742D78332E6C657473656E63727970742E6F72672F302D0603551D110426302482223563396137662E6C652D746573742E686F66666D616E2D616E64726577732E636F6D3081FE0603551D200481F63081F33008060667810C0102013081E6060B2B0601040182DF130101013081D6302606082B06010505070201161A687474703A2F2F6370732E6C657473656E63727970742E6F72673081AB06082B0601050507020230819E0C819B54686973204365727469666963617465206D6179206F6E6C792062652072656C6965642075706F6E2062792052656C79696E67205061727469657320616E64206F6E6C7920696E206163636F7264616E636520776974682074686520436572746966696361746520506F6C69637920666F756E642061742068747470733A2F2F6C657473656E63727970742E6F72672F7265706F7369746F72792F30820104060A2B06010401D6790204020481F50481F200F0007500DB74AFEECB29ECB1FECA3E716D2CE5B9AABB36F7847183C75D9D4F37B61FBF64000001627313EB19000004030046304402207E1FCD1E9A2BD2A50A0C81E713033A0762340DA8F91EF27A48B3817640159CD30220659FE9F1D880E2E8F6B325BE9F18956D17C6CA8A6F2B12CB0F55FB70F759A419007700293C519654C83965BAAA50FC5807D4B76FBF587A2972DCA4C30CF4E54547F478000001627313EB2A0000040300483046022100AB72F1E4D6223EF87FC68491C208D29D4D57EBF47588BB7544D32F9537E2CEC10221008AFFC40CC6C4E3B24578DADE4F815ECBCE2D57A579342119A1E65BC7E5E69CE2300D06092A864886F70D01010B0500038201010095F87B663176776502F792DDD232C216943C7803876FCBEB46393A36354958134482E0AFEED39011618327C2F0203351758FEB420B73CE6C797B98F88076F409F3903F343D1F5D9540F41EF47EB39BD61B62873A44F00B7C8B593C6A416458CF4B5318F35235BC88EABBAA34F3E3F81BD3B047E982EE1363885E84F76F2F079F2B6EEB4ECB58EFE74C8DE7D54DE5C89C4FB5BB0694B837BD6F02BAFD5A6C007D1B93D25007BDA9B2BDBF82201FE1B76B628CE34E2D974E8E623EC57A5CB53B435DD4B9993ADF6BA3972F2B29D259594A94E17BBE06F34AAE5CF0F50297548C4DFFC5566136F78A3D3B324EAE931A14EB6BE6DA1D538E48CF077583C67B52E7E8">this handy link</a>). You can also run <code>openssl asn1parse -i -inform der -in Downloads/031f2484307c9bc511b3123cb236a480d451</code> to use OpenSSL&rsquo;s parser, which is less easy to use in some ways, but easier to copy and paste.</p>

<p>In the decoded data, we can find the OID <code>1.3.6.1.4.1.11129.2.4.2</code>, indicating
the SCT list extension. Per <a href="https://tools.ietf.org/html/rfc5280#page-17">RFC 5280, section
4.1</a>, an extension is defined:</p>

<pre><code>Extension ::= SEQUENCE {
extnID OBJECT IDENTIFIER,
critical BOOLEAN DEFAULT FALSE,
extnValue OCTET STRING
— contains the DER encoding of an ASN.1 value
— corresponding to the extension type identified
— by extnID
}
</code></pre>

<p>We&rsquo;ve found the <code>extnID</code>. The &ldquo;critical&rdquo; field is omitted because it has the
default value (false). Next up is the <code>extnValue</code>. This has the type
<code>OCTET STRING</code>, which has the tag &ldquo;0x04&rdquo;. <code>OCTET STRING</code> means &ldquo;here&rsquo;s
a bunch of bytes!&rdquo; In this case, as described by the spec, those bytes
happen to contain more DER. This is a fairly common pattern in X.509
to deal with parameterized data. For instance, this allows defining a
structure for extensions without knowing ahead of time all the structures
that a future extension might want to carry in its value. If you&rsquo;re a C
programmer, think of it as a <code>void*</code> for data structures. If you prefer Go,
think of it as an <code>interface{}</code>.</p>

<p>Here&rsquo;s that <code>extnValue</code>:</p>

<pre><code>04 81 F5 0481F200F0007500DB74AFEECB29ECB1FECA3E716D2CE5B9AABB36F7847183C75D9D4F37B61FBF64000001627313EB19000004030046304402207E1FCD1E9A2BD2A50A0C81E713033A0762340DA8F91EF27A48B3817640159CD30220659FE9F1D880E2E8F6B325BE9F18956D17C6CA8A6F2B12CB0F55FB70F759A419007700293C519654C83965BAAA50FC5807D4B76FBF587A2972DCA4C30CF4E54547F478000001627313EB2A0000040300483046022100AB72F1E4D6223EF87FC68491C208D29D4D57EBF47588BB7544D32F9537E2CEC10221008AFFC40CC6C4E3B24578DADE4F815ECBCE2D57A579342119A1E65BC7E5E69CE2
</code></pre>

<p>That&rsquo;s tag &ldquo;0x04&rdquo;, meaning <code>OCTET STRING</code>, followed by &ldquo;0x81 0xF5&rdquo;, meaning
&ldquo;this string is 245 bytes long&rdquo; (the 0x81 prefix is part of <a href="#variable-length">variable length
number encoding</a>).</p>

<p>According to <a href="https://tools.ietf.org/html/rfc6962#section-3.3">RFC 6962, section
3.3</a>, &ldquo;obtained SCTs
can be directly embedded in the final certificate, by encoding the
SignedCertificateTimestampList structure as an ASN.1 <code>OCTET STRING</code>
and inserting the resulting data in the TBSCertificate as an X.509v3
certificate extension&rdquo;</p>

<p>So, we have an <code>OCTET STRING</code>, all&rsquo;s good, right? Except if you remove the
tag and length from extnValue to get its value, you&rsquo;re left with:</p>

<pre><code>04 81 F2 00F0007500DB74AFEEC…
</code></pre>

<p>There&rsquo;s that &ldquo;0x04&rdquo; tag again, but with a shorter length. Why
do we nest one <code>OCTET STRING</code> inside another? It&rsquo;s because the
contents of extnValue are required by RFC 5280 to be valid DER, but a
SignedCertificateTimestampList is not encoded using DER (more on that
in a minute). So, by RFC 6962, a SignedCertificateTimestampList is wrapped in an
<code>OCTET STRING</code>, which is wrapped in another <code>OCTET STRING</code> (the extnValue).</p>

<p>Once we decode that second <code>OCTET STRING</code>, we&rsquo;re left with the contents:</p>

<pre><code>00F0007500DB74AFEEC…
</code></pre>

<p>&ldquo;0x00&rdquo; isn&rsquo;t a valid tag in DER. What is this? It&rsquo;s TLS encoding. This is
defined in <a href="https://tools.ietf.org/html/rfc5246#section-4">RFC 5246, section 4</a>
(the TLS 1.2 RFC). TLS encoding, like ASN.1, has both a way to define data
structures and a way to encode those structures. TLS encoding differs
from DER in that there are no tags, and lengths are only encoded when necessary for
variable-length arrays. Within an encoded structure, the type of a field is determined by
its position, rather than by a tag. This means that TLS-encoded structures are
more compact than DER structures, but also that they can&rsquo;t be processed without
knowing the corresponding schema. For instance, here&rsquo;s the top-level schema from
<a href="https://tools.ietf.org/html/rfc6962#section-3.3">RFC 6962, section 3.3</a>:</p>

<pre><code> The contents of the ASN.1 OCTET STRING embedded in an OCSP extension
or X509v3 certificate extension are as follows:

opaque SerializedSCT&lt;1..2^16-1&gt;;

struct {
SerializedSCT sct_list &lt;1..2^16-1&gt;;
} SignedCertificateTimestampList;

Here, &quot;SerializedSCT&quot; is an opaque byte string that contains the
serialized TLS structure.
</code></pre>

<p>Right away, we&rsquo;ve found one of those variable-length arrays. The length of such
an array (in bytes) is always represented by a length field just big enough to
hold the max array size. The max size of an <code>sct_list</code> is 65535 bytes, so the
length field is two bytes wide. Sure enough, those first two bytes are &ldquo;0x00
0xF0&rdquo;, or 240 in decimal. In other words, this <code>sct_list</code> will have 240 bytes. We
don&rsquo;t yet know how many SCTs will be in it. That will become clear only by
continuing to parse the encoded data and seeing where each struct ends (spoiler
alert: there are two SCTs!).</p>

<p>Now we know the first SerializedSCT starts with <code>0075…</code>. SerializedSCT
is itself a variable-length field, this time containing <code>opaque</code> bytes (much like <code>OCTET STRING</code>
back in the ASN.1 world). Like SignedCertificateTimestampList, it has a max size
of 65535 bytes, so we pull off the first two bytes and discover that the first
SerializedSCT is 0x0075 (117 decimal) bytes long. Here&rsquo;s the whole thing, in
hex:</p>

<pre><code>00DB74AFEECB29ECB1FECA3E716D2CE5B9AABB36F7847183C75D9D4F37B61FBF64000001627313EB19000004030046304402207E1FCD1E9A2BD2A50A0C81E713033A0762340DA8F91EF27A48B3817640159CD30220659FE9F1D880E2E8F6B325BE9F18956D17C6CA8A6F2B12CB0F55FB70F759A419
</code></pre>

<p>This can be decoded using the TLS encoding struct defined in <a href="https://tools.ietf.org/html/rfc6962#page-13">RFC 6962, section
3.2</a>:</p>

<pre><code>enum { v1(0), (255) }
Version;

struct {
opaque key_id[32];
} LogID;

opaque CtExtensions&lt;0..2^16-1&gt;;

struct {
Version sct_version;
LogID id;
uint64 timestamp;
CtExtensions extensions;
digitally-signed struct {
Version sct_version;
SignatureType signature_type = certificate_timestamp;
uint64 timestamp;
LogEntryType entry_type;
select(entry_type) {
case x509_entry: ASN.1Cert;
case precert_entry: PreCert;
} signed_entry;
CtExtensions extensions;
};
} SignedCertificateTimestamp;
</code></pre>

<p>Breaking that down:</p>

<pre><code># Version sct_version v1(0)
00
# LogID id (aka opaque key_id[32])
DB74AFEECB29ECB1FECA3E716D2CE5B9AABB36F7847183C75D9D4F37B61FBF64
# uint64 timestamp (milliseconds since the epoch)
000001627313EB19
# CtExtensions extensions (zero-length array)
0000
# digitally-signed struct
04030046304402207E1FCD1E9A2BD2A50A0C81E713033A0762340DA8F91EF27A48B3817640159CD30220659FE9F1D880E2E8F6B325BE9F18956D17C6CA8A6F2B12CB0F55FB70F759A419
</code></pre>

<p>To understand the &ldquo;digitally-signed struct,&rdquo; we need to turn back to <a href="https://tools.ietf.org/html/rfc5246#section-4.7">RFC 5246,
section 4.7</a>. It says:</p>

<pre><code>A digitally-signed element is encoded as a struct DigitallySigned:

struct {
SignatureAndHashAlgorithm algorithm;
opaque signature&lt;0..2^16-1&gt;;
} DigitallySigned;
</code></pre>

<p>And in <a href="https://tools.ietf.org/html/rfc5246#section-7.4.1.4.1">section
7.4.1.4.1</a>:</p>

<pre><code>enum {
none(0), md5(1), sha1(2), sha224(3), sha256(4), sha384(5),
sha512(6), (255)
} HashAlgorithm;

enum { anonymous(0), rsa(1), dsa(2), ecdsa(3), (255) }
SignatureAlgorithm;

struct {
HashAlgorithm hash;
SignatureAlgorithm signature;
} SignatureAndHashAlgorithm;
</code></pre>

<p>We have &ldquo;0x0403&rdquo;, which corresponds to sha256(4) and ecdsa(3). The next two
bytes, &ldquo;0x0046&rdquo;, tell us the length of the &ldquo;opaque signature&rdquo; field, 70 bytes in
decimal. To decode the signature, we reference <a href="https://tools.ietf.org/html/rfc4492#page-20">RFC 4492 section
5.4</a>, which says:</p>

<pre><code>The digitally-signed element is encoded as an opaque vector &lt;0..2^16-1&gt;, the
contents of which are the DER encoding corresponding to the
following ASN.1 notation.

Ecdsa-Sig-Value ::= SEQUENCE {
r INTEGER,
s INTEGER
}
</code></pre>

<p>Having dived through two layers of TLS encoding, we are now back in ASN.1 land!
We
<a href="https://lapo.it/asn1js/#304402207E1FCD1E9A2BD2A50A0C81E713033A0762340DA8F91EF27A48B3817640159CD30220659FE9F1D880E2E8F6B325BE9F18956D17C6CA8A6F2B12CB0F55FB70F759A419">decode</a>
the remaining bytes into a SEQUENCE containing two INTEGERS. And we&rsquo;re done! Here&rsquo;s the whole
extension decoded:</p>

<pre><code># Extension SEQUENCE – RFC 5280
30
# length 0x0104 bytes (260 decimal)
820104
# OBJECT IDENTIFIER
06
# length 0x0A bytes (10 decimal)
0A
# value (1.3.6.1.4.1.11129.2.4.2)
2B06010401D679020402
# OCTET STRING
04
# length 0xF5 bytes (245 decimal)
81F5
# OCTET STRING (embedded) – RFC 6962
04
# length 0xF2 bytes (242 decimal)
81F2
# Beginning of TLS encoded SignedCertificateTimestampList – RFC 5246 / 6962
# length 0xF0 bytes
00F0
# opaque SerializedSCT&lt;1..2^16-1&gt;
# length 0x75 bytes
0075
# Version sct_version v1(0)
00
# LogID id (aka opaque key_id[32])
DB74AFEECB29ECB1FECA3E716D2CE5B9AABB36F7847183C75D9D4F37B61FBF64
# uint64 timestamp (milliseconds since the epoch)
000001627313EB19
# CtExtensions extensions (zero-length array)
0000
# digitally-signed struct – RFC 5426
# SignatureAndHashAlgorithm (ecdsa-sha256)
0403
# opaque signature&lt;0..2^16-1&gt;;
# length 0x0046
0046
# DER-encoded Ecdsa-Sig-Value – RFC 4492
30 # SEQUENCE
44 # length 0x44 bytes
02 # r INTEGER
20 # length 0x20 bytes
# value
7E1FCD1E9A2BD2A50A0C81E713033A0762340DA8F91EF27A48B3817640159CD3
02 # s INTEGER
20 # length 0x20 bytes
# value
659FE9F1D880E2E8F6B325BE9F18956D17C6CA8A6F2B12CB0F55FB70F759A419
# opaque SerializedSCT&lt;1..2^16-1&gt;
# length 0x77 bytes
0077
# Version sct_version v1(0)
00
# LogID id (aka opaque key_id[32])
293C519654C83965BAAA50FC5807D4B76FBF587A2972DCA4C30CF4E54547F478
# uint64 timestamp (milliseconds since the epoch)
000001627313EB2A
# CtExtensions extensions (zero-length array)
0000
# digitally-signed struct – RFC 5426
# SignatureAndHashAlgorithm (ecdsa-sha256)
0403
# opaque signature&lt;0..2^16-1&gt;;
# length 0x0048
0048
# DER-encoded Ecdsa-Sig-Value – RFC 4492
30 # SEQUENCE
46 # length 0x46 bytes
02 # r INTEGER
21 # length 0x21 bytes
# value
00AB72F1E4D6223EF87FC68491C208D29D4D57EBF47588BB7544D32F9537E2CEC1
02 # s INTEGER
21 # length 0x21 bytes
# value
008AFFC40CC6C4E3B24578DADE4F815ECBCE2D57A579342119A1E65BC7E5E69CE2
</code></pre>

<p>One surprising thing you might notice: In the first SCT, <code>r</code> and <code>s</code> are twenty
bytes long. In the second SCT, they are both twenty-one bytes long, and have a
leading zero. Integers in DER are two&rsquo;s complement, so if the leftmost bit is
set, they are interpreted as negative. Since <code>r</code> and <code>s</code> are positive, if the
leftmost bit would be a 1, an extra byte has to be added so that the leftmost
bit can be 0.</p>

<p>This is a little taste of what goes into encoding a certificate. I hope it was
informative! If you&rsquo;d like to learn more, I recommend &ldquo;<a href="http://luca.ntop.org/Teaching/Appunti/asn1.html">A Layman&rsquo;s Guide to a
Subset of ASN.1, BER, and DER</a>.&rdquo;</p>

<p><a name="poison"></a>Footnote 1: A &ldquo;poison extension&rdquo; is defined by <a href="https://tools.ietf.org/html/rfc6962#section-3.1">RFC 6962
section 3.1</a>:</p>

<pre><code>The Precertificate is constructed from the certificate to be issued by adding a special
critical poison extension (OID `1.3.6.1.4.1.11129.2.4.3`, whose
extnValue OCTET STRING contains ASN.1 NULL data (0x05 0x00))
</code></pre>

<p>In other words, it&rsquo;s an empty extension whose only purpose is to ensure that
certificate processors will not accept precertificates as valid certificates. The
specification ensures this by setting the &ldquo;critical&rdquo; bit on the extension, which
ensures that code that doesn&rsquo;t recognize the extension will reject the whole
certificate. Code that does recognize the extension specifically as poison
will also reject the certificate.</p>

<p><a name="variable-length"></a>Footnote 2: Lengths from 0-127 are represented by
a single byte (short form). To express longer lengths, more bytes are used (long form).
The high bit (0x80) on the first byte is set to distinguish long form from short
form. The remaining bits are used to express how many more bytes to read for the
length. For instance, 0x81F5 means &ldquo;this is long form because the length is
greater than 127, but there&rsquo;s still only one byte of length (0xF5) to decode.&rdquo;</p>

Subverting Backdoored Encryption

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/04/subverting_back.html

This is a really interesting research result. This paper proves that two parties can create a secure communications channel using a communications system with a backdoor. It’s a theoretical result, so it doesn’t talk about how easy that channel is to create. And the assumptions on the adversary are pretty reasonable: that each party can create his own randomness, and that the government isn’t literally eavesdropping on every single part of the network at all times.

This result reminds me a lot of the work about subliminal channels from the 1980s and 1990s, and the notions of how to build an anonymous communications system on top of an identified system. Basically, it’s always possible to overlay a system around and outside any closed system.

How to Subvert Backdoored Encryption: Security Against Adversaries that Decrypt All Ciphertexts,” by Thibaut Horel and Sunoo Park and Silas Richelson and Vinod Vaikuntanathan.

Abstract: In this work, we examine the feasibility of secure and undetectable point-to-point communication in a world where governments can read all the encrypted communications of their citizens. We consider a world where the only permitted method of communication is via a government-mandated encryption scheme, instantiated with government-mandated keys. Parties cannot simply encrypt ciphertexts of some other encryption scheme, because citizens caught trying to communicate outside the government’s knowledge (e.g., by encrypting strings which do not appear to be natural language plaintexts) will be arrested. The one guarantee we suppose is that the government mandates an encryption scheme which is semantically secure against outsiders: a perhaps reasonable supposition when a government might consider it advantageous to secure its people’s communication against foreign entities. But then, what good is semantic security against an adversary that holds all the keys and has the power to decrypt?

We show that even in the pessimistic scenario described, citizens can communicate securely and undetectably. In our terminology, this translates to a positive statement: all semantically secure encryption schemes support subliminal communication. Informally, this means that there is a two-party protocol between Alice and Bob where the parties exchange ciphertexts of what appears to be a normal conversation even to someone who knows the secret keys and thus can read the corresponding plaintexts. And yet, at the end of the protocol, Alice will have transmitted her secret message to Bob. Our security definition requires that the adversary not be able to tell whether Alice and Bob are just having a normal conversation using the mandated encryption scheme, or they are using the mandated encryption scheme for subliminal communication.

Our topics may be thought to fall broadly within the realm of steganography: the science of hiding secret communication within innocent-looking messages, or cover objects. However, we deal with the non-standard setting of an adversarially chosen distribution of cover objects (i.e., a stronger-than-usual adversary), and we take advantage of the fact that our cover objects are ciphertexts of a semantically secure encryption scheme to bypass impossibility results which we show for broader classes of steganographic schemes. We give several constructions of subliminal communication schemes under the assumption that key exchange protocols with pseudorandom messages exist (such as Diffie-Hellman, which in fact has truly random messages). Each construction leverages the assumed semantic security of the adversarially chosen encryption scheme, in order to achieve subliminal communication.

Why the crypto-backdoor side is morally corrupt

Post Syndicated from Robert Graham original https://blog.erratasec.com/2018/04/why-crypto-backdoor-side-is-morally.html

Crypto-backdoors for law enforcement is a reasonable position, but the side that argues for it adds things that are either outright lies or morally corrupt. Every year, the amount of digital evidence law enforcement has to solve crimes increases, yet they outrageously lie, claiming they are “going dark”, losing access to evidence. A weirder claim is that  those who oppose crypto-backdoors are nonetheless ethically required to make them work. This is morally corrupt.

That’s the point of this Lawfare post, which claims:

What I am saying is that those arguing that we should reject third-party access out of hand haven’t carried their research burden. … There are two reasons why I think there hasn’t been enough research to establish the no-third-party access position. First, research in this area is “taboo” among security researchers. … the second reason why I believe more research needs to be done: the fact that prominent non-government experts are publicly willing to try to build secure third-party-access solutions should make the information-security community question the consensus view. 

This is nonsense. It’s like claiming we haven’t cured the common cold because researchers haven’t spent enough effort at it. When researchers claim they’ve tried 10,000 ways to make something work, it’s like insisting they haven’t done enough because they haven’t tried 10,001 times.
Certainly, half the community doesn’t want to make such things work. Any solution for the “legitimate” law enforcement of the United States means a solution for illegitimate states like China and Russia which would use the feature to oppress their own people. Even if I believe it’s a net benefit to the United States, I would never attempt such research because of China and Russia.
But computer scientists notoriously ignore ethics in pursuit of developing technology. That describes the other half of the crypto community who would gladly work on the problem. The reason they haven’t come up with solutions is because the problem is hard, really hard.
The second reason the above argument is wrong: it says we should believe a solution is possible because some outsiders are willing to try. But as Yoda says, do or do not, there is no try. Our opinions on the difficulty of the problem don’t change simply because people are trying. Our opinions change when people are succeeding. People are always trying the impossible, that’s not evidence it’s possible.
The paper cherry picks things, like Intel CPU features, to make it seem like they are making forward progress. No. Intel’s SGX extensions are there for other reasons. Sure, it’s a new development, and new developments may change our opinion on the feasibility of law enforcement backdoors. But nowhere in talking about this new development have they actually proposes a solution to the backdoor problem. New developments happen all the time, and the pro-backdoor side is going to seize upon each and every one to claim that this, finally, solves the backdoor problem, without showing exactly how it solves the problem.

The Lawfare post does make one good argument, that there is no such thing as “absolute security”, and thus the argument is stupid that “crypto-backdoors would be less than absolute security”. Too often in the cybersecurity community we reject solutions that don’t provide “absolute security” while failing to acknowledge that “absolute security” is impossible.
But that’s not really what’s going on here. Cryptographers aren’t certain we’ve achieved even “adequate security” with current crypto regimes like SSL/TLS/HTTPS. Every few years we find horrible flaws in the old versions and have to develop new versions. If you steal somebody’s iPhone today, it’s so secure you can’t decrypt anything on it. But then if you hold it for 5 years, somebody will eventually figure out a hole and then you’ll be able to decrypt it — a hole that won’t affect Apple’s newer phones.
The reason we think we can’t get crypto-backdoors correct is simply because we can’t get crypto completely correct. It’s implausible that we can get the backdoors working securely when we still have so much trouble getting encryption working correctly in the first place.
Thus, we aren’t talking about “insignificantly less security”, we are talking about going from “barely adequate security” to “inadequate security”. Negotiating keys between you and a website is hard enough without simultaneously having to juggle keys with law enforcement organizations.

And finally, even if cryptographers do everything correctly law enforcement themselves haven’t proven themselves reliable. The NSA exposed its exploits (like the infamous ETERNALBLUE), and OPM lost all its security clearance records. If they can’t keep those secrets, it’s unreasonable to believe they can hold onto backdoor secrets. One of the problems cryptographers are expected to solve is partly this, to make it work in a such way that makes it unlikely law enforcement will lose its secrets.

Summary

This argument by the pro-backdoor side, that we in the crypto-community should do more to solve backdoors, it simply wrong. We’ve spent a lot of effort at this already. Many continue to work on this problem — the reason you haven’t heard much from them is because they haven’t had much success. It’s like blaming doctors for not doing more to work on interrogation drugs (truth serums). Sure, a lot of doctors won’t work on this because it’s distasteful, but at the same time, there are many drug companies who would love to profit by them. The reason they don’t exist is not because they aren’t spending enough money researching them, it’s because there is no plausible solution in sight.
Crypto-backdoors designed for law-enforcement will significantly harm your security. This may change in the future, but that’s the state of crypto today. You should trust the crypto experts on this, not lawyers.

A geometric Rust adventure

Post Syndicated from Eevee original https://eev.ee/blog/2018/03/30/a-geometric-rust-adventure/

Hi. Yes. Sorry. I’ve been trying to write this post for ages, but I’ve also been working on a huge writing project, and apparently I have a very limited amount of writing mana at my disposal. I think this is supposed to be a Patreon reward from January. My bad. I hope it’s super great to make up for the wait!

I recently ported some math code from C++ to Rust in an attempt to do a cool thing with Doom. Here is my story.

The problem

I presented it recently as a conundrum (spoilers: I solved it!), but most of those details are unimportant.

The short version is: I have some shapes. I want to find their intersection.

Really, I want more than that: I want to drop them all on a canvas, intersect everything with everything, and pluck out all the resulting polygons. The input is a set of cookie cutters, and I want to press them all down on the same sheet of dough and figure out what all the resulting contiguous pieces are. And I want to know which cookie cutter(s) each piece came from.

But intersection is a good start.

Example of the goal.  Given two squares that overlap at their corners, I want to find the small overlap piece, plus the two L-shaped pieces left over from each square

I’m carefully referring to the input as shapes rather than polygons, because each one could be a completely arbitrary collection of lines. Obviously there’s not much you can do with shapes that aren’t even closed, but at the very least, I need to handle concavity and multiple disconnected polygons that together are considered a single input.

This is a non-trivial problem with a lot of edge cases, and offhand I don’t know how to solve it robustly. I’m not too eager to go figure it out from scratch, so I went hunting for something I could build from.

(Infuriatingly enough, I can just dump all the shapes out in an SVG file and any SVG viewer can immediately solve the problem, but that doesn’t quite help me. Though I have had a few people suggest I just rasterize the whole damn problem, and after all this, I’m starting to think they may have a point.)

Alas, I couldn’t find a Rust library for doing this. I had a hard time finding any library for doing this that wasn’t a massive fully-featured geometry engine. (I could’ve used that, but I wanted to avoid non-Rust dependencies if possible, since distributing software is already enough of a nightmare.)

A Twitter follower directed me towards a paper that described how to do very nearly what I wanted and nothing else: “A simple algorithm for Boolean operations on polygons” by F. Martínez (2013). Being an academic paper, it’s trapped in paywall hell; sorry about that. (And as I understand it, none of the money you’d pay to get the paper would even go to the authors? Is that right? What a horrible and predatory system for discovering and disseminating knowledge.)

The paper isn’t especially long, but it does describe an awful lot of subtle details and is mostly written in terms of its own reference implementation. Rather than write my own implementation based solely on the paper, I decided to try porting the reference implementation from C++ to Rust.

And so I fell down the rabbit hole.

The basic algorithm

Thankfully, the author has published the sample code on his own website, if you want to follow along. (It’s the bottom link; the same author has, confusingly, published two papers on the same topic with similar titles, four years apart.)

If not, let me describe the algorithm and how the code is generally laid out. The algorithm itself is based on a sweep line, where a vertical line passes across the plane and ✨ does stuff ✨ as it encounters various objects. This implementation has no physical line; instead, it keeps track of which segments from the original polygon would be intersecting the sweep line, which is all we really care about.

A vertical line is passing rightwards over a couple intersecting shapes.  The line current intersects two of the shapes' sides, and these two sides are the "sweep list"

The code is all bundled inside a class with only a single public method, run, because… that’s… more object-oriented, I guess. There are several helper methods, and state is stored in some attributes. A rough outline of run is:

  1. Run through all the line segments in both input polygons. For each one, generate two SweepEvents (one for each endpoint) and add them to a std::deque for storage.

    Add pointers to the two SweepEvents to a std::priority_queue, the event queue. This queue uses a custom comparator to order the events from left to right, so the top element is always the leftmost endpoint.

  2. Loop over the event queue (where an “event” means the sweep line passed over the left or right end of a segment). Encountering a left endpoint means the sweep line is newly touching that segment, so add it to a std::set called the sweep list. An important point is that std::set is ordered, and the sweep list uses a comparator that keeps segments in order vertically.

    Encountering a right endpoint means the sweep line is leaving a segment, so that segment is removed from the sweep list.

  3. When a segment is added to the sweep list, it may have up to two neighbors: the segment above it and the segment below it. Call possibleIntersection to check whether it intersects either of those neighbors. (This is nearly sufficient to find all intersections, which is neat.)

  4. If possibleIntersection detects an intersection, it will split each segment into two pieces then and there. The old segment is shortened in-place to become the left part, and a new segment is created for the right part. The new endpoints at the point of intersection are added to the event queue.

  5. Some bookkeeping is done along the way to track which original polygons each segment is inside, and eventually the segments are reconstructed into new polygons.

Hopefully that’s enough to follow along. It took me an inordinately long time to tease this out. The comments aren’t especially helpful.

1
    std::deque<SweepEvent> eventHolder;    // It holds the events generated during the computation of the boolean operation

Syntax and basic semantics

The first step was to get something that rustc could at least parse, which meant translating C++ syntax to Rust syntax.

This was surprisingly straightforward! C++ classes become Rust structs. (There was no inheritance here, thankfully.) All the method declarations go away. Method implementations only need to be indented and wrapped in impl.

I did encounter some unnecessarily obtuse uses of the ternary operator:

1
(prevprev != sl.begin()) ? --prevprev : prevprev = sl.end();

Rust doesn’t have a ternary — you can use a regular if block as an expression — so I expanded these out.

C++ switch blocks become Rust match blocks, but otherwise function basically the same. Rust’s enums are scoped (hallelujah), so I had to explicitly spell out where enum values came from.

The only really annoying part was changing function signatures; C++ types don’t look much at all like Rust types, save for the use of angle brackets. Rust also doesn’t pass by implicit reference, so I needed to sprinkle a few &s around.

I would’ve had a much harder time here if this code had relied on any remotely esoteric C++ functionality, but thankfully it stuck to pretty vanilla features.

Language conventions

This is a geometry problem, so the sample code unsurprisingly has its own home-grown point type. Rather than port that type to Rust, I opted to use the popular euclid crate. Not only is it code I didn’t have to write, but it already does several things that the C++ code was doing by hand inline, like dot products and cross products. And all I had to do was add one line to Cargo.toml to use it! I have no idea how anyone writes C or C++ without a package manager.

The C++ code used getters, i.e. point.x (). I’m not a huge fan of getters, though I do still appreciate the need for them in lowish-level systems languages where you want to future-proof your API and the language wants to keep a clear distinction between attribute access and method calls. But this is a point, which is nothing more than two of the same numeric type glued together; what possible future logic might you add to an accessor? The euclid authors appear to side with me and leave the coordinates as public fields, so I took great joy in removing all the superfluous parentheses.

Polygons are represented with a Polygon class, which has some number of Contours. A contour is a single contiguous loop. Something you’d usually think of as a polygon would only have one, but a shape with a hole would have two: one for the outside, one for the inside. The weird part of this arrangement was that Polygon implemented nearly the entire STL container interface, then waffled between using it and not using it throughout the rest of the code. Rust lets anything in the same module access non-public fields, so I just skipped all that and used polygon.contours directly. Hell, I think I made contours public.

Finally, the SweepEvent type has a pol field that’s declared as an enum PolygonType (either SUBJECT or CLIPPING, to indicate which of the two inputs it is), but then some other code uses the same field as a numeric index into a polygon’s contours. Boy I sure do love static typing where everything’s a goddamn integer. I wanted to extend the algorithm to work on arbitrarily many input polygons anyway, so I scrapped the enum and this became a usize.


Then I got to all the uses of STL. I have only a passing familiarity with the C++ standard library, and this code actually made modest use of it, which caused some fun days-long misunderstandings.

As mentioned, the SweepEvents are stored in a std::deque, which is never read from. It took me a little thinking to realize that the deque was being used as an arena: it’s the canonical home for the structs so pointers to them can be tossed around freely. (It can’t be a std::vector, because that could reallocate and invalidate all the pointers; std::deque is probably a doubly-linked list, and guarantees no reallocation.)

Rust’s standard library does have a doubly-linked list type, but I knew I’d run into ownership hell here later anyway, so I think I replaced it with a Rust Vec to start with. It won’t compile either way, so whatever. We’ll get back to this in a moment.

The list of segments currently intersecting the sweep line is stored in a std::set. That type is explicitly ordered, which I’m very glad I knew already. Rust has two set types, HashSet and BTreeSet; unsurprisingly, the former is unordered and the latter is ordered. Dropping in BTreeSet and fixing some method names got me 90% of the way there.

Which brought me to the other 90%. See, the C++ code also relies on finding nodes adjacent to the node that was just inserted, via STL iterators.

1
2
3
next = prev = se->posSL = it = sl.insert(se).first;
(prev != sl.begin()) ? --prev : prev = sl.end();
++next;

I freely admit I’m bad at C++, but this seems like something that could’ve used… I don’t know, 1 comment. Or variable names more than two letters long. What it actually does is:

  1. Add the current sweep event (se) to the sweep list (sl), which returns a pair whose first element is an iterator pointing at the just-inserted event.

  2. Copies that iterator to several other variables, including prev and next.

  3. If the event was inserted at the beginning of the sweep list, set prev to the sweep list’s end iterator, which in C++ is a legal-but-invalid iterator meaning “the space after the end” or something. This is checked for in later code, to see if there is a previous event to look at. Otherwise, decrement prev, so it’s now pointing at the event immediately before the inserted one.

  4. Increment next normally. If the inserted event is last, then this will bump next to the end iterator anyway.

In other words, I need to get the previous and next elements from a BTreeSet. Rust does have bidirectional iterators, which BTreeSet supports… but BTreeSet::insert only returns a bool telling me whether or not anything was inserted, not the position. I came up with this:

1
2
3
let mut maybe_below = active_segments.range(..segment).last().map(|v| *v);
let mut maybe_above = active_segments.range(segment..).next().map(|v| *v);
active_segments.insert(segment);

The range method returns an iterator over a subset of the tree. The .. syntax makes a range (where the right endpoint is exclusive), so ..segment finds the part of the tree before the new segment, and segment.. finds the part of the tree after it. (The latter would start with the segment itself, except I haven’t inserted it yet, so it’s not actually there.)

Then the standard next() and last() methods on bidirectional iterators find me the element I actually want. But the iterator might be empty, so they both return an Option. Also, iterators tend to return references to their contents, but in this case the contents are already references, and I don’t want a double reference, so the map call dereferences one layer — but only if the Option contains a value. Phew!

This is slightly less efficient than the C++ code, since it has to look up where segment goes three times rather than just one. I might be able to get it down to two with some more clever finagling of the iterator, but microsopic performance considerations were a low priority here.

Finally, the event queue uses a std::priority_queue to keep events in a desired order and efficiently pop the next one off the top.

Except priority queues act like heaps, where the greatest (i.e., last) item is made accessible.

Sorting out sorting

C++ comparison functions return true to indicate that the first argument is less than the second argument. Sweep events occur from left to right. You generally implement sorts so that the first thing comes, erm, first.

But sweep events go in a priority queue, and priority queues surface the last item, not the first. This C++ code handled this minor wrinkle by implementing its comparison backwards.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
struct SweepEventComp : public std::binary_function<SweepEvent, SweepEvent, bool> { // for sorting sweep events
// Compare two sweep events
// Return true means that e1 is placed at the event queue after e2, i.e,, e1 is processed by the algorithm after e2
bool operator() (const SweepEvent* e1, const SweepEvent* e2)
{
    if (e1->point.x () > e2->point.x ()) // Different x-coordinate
        return true;
    if (e2->point.x () > e1->point.x ()) // Different x-coordinate
        return false;
    if (e1->point.y () != e2->point.y ()) // Different points, but same x-coordinate. The event with lower y-coordinate is processed first
        return e1->point.y () > e2->point.y ();
    if (e1->left != e2->left) // Same point, but one is a left endpoint and the other a right endpoint. The right endpoint is processed first
        return e1->left;
    // Same point, both events are left endpoints or both are right endpoints.
    if (signedArea (e1->point, e1->otherEvent->point, e2->otherEvent->point) != 0) // not collinear
        return e1->above (e2->otherEvent->point); // the event associate to the bottom segment is processed first
    return e1->pol > e2->pol;
}
};

Maybe it’s just me, but I had a hell of a time just figuring out what problem this was even trying to solve. I still have to reread it several times whenever I look at it, to make sure I’m getting the right things backwards.

Making this even more ridiculous is that there’s a second implementation of this same sort, with the same name, in another file — and that one’s implemented forwards. And doesn’t use a tiebreaker. I don’t entirely understand how this even compiles, but it does!

I painstakingly translated this forwards to Rust. Unlike the STL, Rust doesn’t take custom comparators for its containers, so I had to implement ordering on the types themselves (which makes sense, anyway). I wrapped everything in the priority queue in a Reverse, which does what it sounds like.

I’m fairly pleased with Rust’s ordering model. Most of the work is done in Ord, a trait with a cmp() method returning an Ordering (one of Less, Equal, and Greater). No magic numbers, no need to implement all six ordering methods! It’s incredible. Ordering even has some handy methods on it, so the usual case of “order by this, then by this” can be written as:

1
2
return self.point().x.cmp(&other.point().x)
    .then(self.point().y.cmp(&other.point().y));

Well. Just kidding! It’s not quite that easy. You see, the points here are composed of floats, and floats have the fun property that not all of them are comparable. Specifically, NaN is not less than, greater than, or equal to anything else, including itself. So IEEE 754 float ordering cannot be expressed with Ord. Unless you want to just make up an answer for NaN, but Rust doesn’t tend to do that.

Rust’s float types thus implement the weaker PartialOrd, whose method returns an Option<Ordering> instead. That makes the above example slightly uglier:

1
2
return self.point().x.partial_cmp(&other.point().x).unwrap()
    .then(self.point().y.partial_cmp(&other.point().y).unwrap())

Also, since I use unwrap() here, this code will panic and take the whole program down if the points are infinite or NaN. Don’t do that.

This caused some minor inconveniences in other places; for example, the general-purpose cmp::min() doesn’t work on floats, because it requires an Ord-erable type. Thankfully there’s a f64::min(), which handles a NaN by returning the other argument.

(Cool story: for the longest time I had this code using f32s. I’m used to translating int to “32 bits”, and apparently that instinct kicked in for floats as well, even floats spelled double.)

The only other sorting adventure was this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// Due to overlapping edges the resultEvents array can be not wholly sorted
bool sorted = false;
while (!sorted) {
    sorted = true;
    for (unsigned int i = 0; i < resultEvents.size (); ++i) {
        if (i + 1 < resultEvents.size () && sec (resultEvents[i], resultEvents[i+1])) {
            std::swap (resultEvents[i], resultEvents[i+1]);
            sorted = false;
        }
    }
}

(I originally misread this comment as saying “the array cannot be wholly sorted” and had no idea why that would be the case, or why the author would then immediately attempt to bubble sort it.)

I’m still not sure why this uses an ad-hoc sort instead of std::sort. But I’m used to taking for granted that general-purpose sorting implementations are tuned to work well for almost-sorted data, like Python’s. Maybe C++ is untrustworthy here, for some reason. I replaced it with a call to .sort() and all seemed fine.

Phew! We’re getting there. Finally, my code appears to type-check.

But now I see storm clouds gathering on the horizon.

Ownership hell

I have a problem. I somehow run into this problem every single time I use Rust. The solutions are never especially satisfying, and all the hacks I might use if forced to write C++ turn out to be unsound, which is even more annoying because rustc is just sitting there with this smug “I told you so expression” and—

The problem is ownership, which Rust is fundamentally built on. Any given value must have exactly one owner, and Rust must be able to statically convince itself that:

  1. No reference to a value outlives that value.
  2. If a mutable reference to a value exists, no other references to that value exist at the same time.

This is the core of Rust. It guarantees at compile time that you cannot lose pointers to allocated memory, you cannot double-free, you cannot have dangling pointers.

It also completely thwarts a lot of approaches you might be inclined to take if you come from managed languages (where who cares, the GC will take care of it) or C++ (where you just throw pointers everywhere and hope for the best apparently).

For example, pointer loops are impossible. Rust’s understanding of ownership and lifetimes is hierarchical, and it simply cannot express loops. (Rust’s own doubly-linked list type uses raw pointers and unsafe code under the hood, where “unsafe” is an escape hatch for the usual ownership rules. Since I only recently realized that pointers to the inside of a mutable Vec are a bad idea, I figure I should probably not be writing unsafe code myself.)

This throws a few wrenches in the works.

Problem the first: pointer loops

I immediately ran into trouble with the SweepEvent struct itself. A SweepEvent pulls double duty: it represents one endpoint of a segment, but each left endpoint also handles bookkeeping for the segment itself — which means that most of the fields on a right endpoint are unused. Also, and more importantly, each SweepEvent has a pointer to the corresponding SweepEvent at the other end of the same segment. So a pair of SweepEvents point to each other.

Rust frowns upon this. In retrospect, I think I could’ve kept it working, but I also think I’m wrong about that.

My first step was to wrench SweepEvent apart. I moved all of the segment-stuff (which is virtually all of it) into a single SweepSegment type, and then populated the event queue with a SweepEndpoint tuple struct, similar to:

1
2
3
4
5
6
enum SegmentEnd {
    Left,
    Right,
}

struct SweepEndpoint<'a>(&'a SweepSegment, SegmentEnd);

This makes SweepEndpoint essentially a tuple with a name. The 'a is a lifetime and says, more or less, that a SweepEndpoint cannot outlive the SweepSegment it references. Makes sense.

Problem solved! I no longer have mutually referential pointers. But I do still have pointers (well, references), and they have to point to something.

Problem the second: where’s all the data

Which brings me to the problem I always run into with Rust. I have a bucket of things, and I need to refer to some of them multiple times.

I tried half a dozen different approaches here and don’t clearly remember all of them, but I think my core problem went as follows. I translated the C++ class to a Rust struct with some methods hanging off of it. A simplified version might look like this.

1
2
3
4
struct Algorithm {
    arena: LinkedList<SweepSegment>,
    event_queue: BinaryHeap<SweepEndpoint>,
}

Ah, hang on — SweepEndpoint needs to be annotated with a lifetime, so Rust can enforce that those endpoints don’t live longer than the segments they refer to. No problem?

1
2
3
4
struct Algorithm<'a> {
    arena: LinkedList<SweepSegment>,
    event_queue: BinaryHeap<SweepEndpoint<'a>>,
}

Okay! Now for some methods.

1
2
3
4
5
6
7
8
fn run(&mut self) {
    self.arena.push_back(SweepSegment{ data: 5 });
    self.event_queue.push(SweepEndpoint(self.arena.back().unwrap(), SegmentEnd::Left));
    self.event_queue.push(SweepEndpoint(self.arena.back().unwrap(), SegmentEnd::Right));
    for event in &self.event_queue {
        println!("{:?}", event)
    }
}

Aaand… this doesn’t work. Rust “cannot infer an appropriate lifetime for autoref due to conflicting requirements”. The trouble is that self.arena.back() takes a reference to self.arena, and then I put that reference in the event queue. But I promised that everything in the event queue has lifetime 'a, and I don’t actually know how long self lives here; I only know that it can’t outlive 'a, because that would invalidate the references it holds.

A little random guessing let me to change &mut self to &'a mut self — which is fine because the entire impl block this lives in is already parameterized by 'a — and that makes this compile! Hooray! I think that’s because I’m saying self itself has exactly the same lifetime as the references it holds onto, which is true, since it’s referring to itself.

Let’s get a little more ambitious and try having two segments.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
fn run(&'a mut self) {
    self.arena.push_back(SweepSegment{ data: 5 });
    self.event_queue.push(SweepEndpoint(self.arena.back().unwrap(), SegmentEnd::Left));
    self.event_queue.push(SweepEndpoint(self.arena.back().unwrap(), SegmentEnd::Right));
    self.arena.push_back(SweepSegment{ data: 17 });
    self.event_queue.push(SweepEndpoint(self.arena.back().unwrap(), SegmentEnd::Left));
    self.event_queue.push(SweepEndpoint(self.arena.back().unwrap(), SegmentEnd::Right));
    for event in &self.event_queue {
        println!("{:?}", event)
    }
}

Whoops! Rust complains that I’m trying to mutate self.arena while other stuff is referring to it. And, yes, that’s true — I have references to it in the event queue, and Rust is preventing me from potentially deleting everything from the queue when references to it still exist. I’m not actually deleting anything here, of course (though I could be if this were a Vec!), but Rust’s type system can’t encode that (and I dread the thought of a type system that can).

I struggled with this for a while, and rapidly encountered another complete showstopper:

1
2
3
4
5
6
fn run(&'a mut self) {
    self.mutate_something();
    self.mutate_something();
}

fn mutate_something(&'a mut self) {}

Rust objects that I’m trying to borrow self mutably, twice — once for the first call, once for the second.

But why? A borrow is supposed to end automatically once it’s no longer used, right? Maybe if I throw some braces around it for scope… nope, that doesn’t help either.

It’s true that borrows usually end automatically, but here I have explicitly told Rust that mutate_something() should borrow with the lifetime 'a, which is the same as the lifetime in run(). So the first call explicitly borrows self for at least the rest of the method. Removing the lifetime from mutate_something() does fix this error, but if that method tries to add new segments, I’m back to the original problem.

Oh no. The mutation in the C++ code is several calls deep. Porting it directly seems nearly impossible.

The typical solution here — at least, the first thing people suggest to me on Twitter — is to wrap basically everything everywhere in Rc<RefCell<T>>, which gives you something that’s reference-counted (avoiding questions of ownership) and defers borrow checks until runtime (avoiding questions of mutable borrows). But that seems pretty heavy-handed here — not only does RefCell add .borrow() noise anywhere you actually want to interact with the underlying value, but do I really need to refcount these tiny structs that only hold a handful of floats each?

I set out to find a middle ground.

Solution, kind of

I really, really didn’t want to perform serious surgery on this code just to get it to build. I still didn’t know if it worked at all, and now I had to rearrange it without being able to check if I was breaking it further. (This isn’t Rust’s fault; it’s a natural problem with porting between fairly different paradigms.)

So I kind of hacked it into working with minimal changes, producing a grotesque abomination which I’m ashamed to link to. Here’s how!

First, I got rid of the class. It turns out this makes lifetime juggling much easier right off the bat. I’m pretty sure Rust considers everything in a struct to be destroyed simultaneously (though in practice it guarantees it’ll destroy fields in order), which doesn’t leave much wiggle room. Locals within a function, on the other hand, can each have their own distinct lifetimes, which solves the problem of expressing that the borrows won’t outlive the arena.

Speaking of the arena, I solved the mutability problem there by switching to… an arena! The typed-arena crate (a port of a type used within Rust itself, I think) is an allocator — you give it a value, and it gives you back a reference, and the reference is guaranteed to be valid for as long as the arena exists. The method that does this is sneaky and takes &self rather than &mut self, so Rust doesn’t know you’re mutating the arena and won’t complain. (One drawback is that the arena will never free anything you give to it, but that’s not a big problem here.)


My next problem was with mutation. The main loop repeatedly calls possibleIntersection with pairs of segments, which can split either or both segment. Rust definitely doesn’t like that — I’d have to pass in two &muts, both of which are mutable references into the same arena, and I’d have a bunch of immutable references into that arena in the sweep list and elsewhere. This isn’t going to fly.

This is kind of a shame, and is one place where Rust seems a little overzealous. Something like this seems like it ought to be perfectly valid:

1
2
3
4
let mut v = vec![1u32, 2u32];
let a = &mut v[0];
let b = &mut v[1];
// do stuff with a, b

The trouble is, Rust only knows the type signature, which here is something like index_mut(&'a mut self, index: usize) -> &'a T. Nothing about that says that you’re borrowing distinct elements rather than some core part of the type — and, in fact, the above code is only safe because you’re borrowing distinct elements. In the general case, Rust can’t possibly know that. It seems obvious enough from the different indexes, but nothing about the type system even says that different indexes have to return different values. And what if one were borrowed as &mut v[1] and the other were borrowed with v.iter_mut().next().unwrap()?

Anyway, this is exactly where people start to turn to RefCell — if you’re very sure you know better than Rust, then a RefCell will skirt the borrow checker while still enforcing at runtime that you don’t have more than one mutable borrow at a time.

But half the lines in this algorithm examine the endpoints of a segment! I don’t want to wrap the whole thing in a RefCell, or I’ll have to say this everywhere:

1
if segment1.borrow().point.x < segment2.borrow().point.x { ... }

Gross.

But wait — this code only mutates the points themselves in one place. When a segment is split, the original segment becomes the left half, and a new segment is created to be the right half. There’s no compelling need for this; it saves an allocation for the left half, but it’s not critical to the algorithm.

Thus, I settled on a compromise. My segment type now looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
struct SegmentPacket {
    // a bunch of flags and whatnot used in the algorithm
}
struct SweepSegment {
    left_point: MapPoint,
    right_point: MapPoint,
    faces_outwards: bool,
    index: usize,
    order: usize,
    packet: RefCell<SegmentPacket>,
}

I do still need to call .borrow() or .borrow_mut() to get at the stuff in the “packet”, but that’s far less common, so there’s less noise overall. And I don’t need to wrap it in Rc because it’s part of a type that’s allocated in the arena and passed around only via references.


This still leaves me with the problem of how to actually perform the splits.

I’m not especially happy with what I came up with, I don’t know if I can defend it, and I suspect I could do much better. I changed possibleIntersection so that rather than performing splits, it returns the points at which each segment needs splitting, in the form (usize, Option<MapPoint>, Option<MapPoint>). (The usize is used as a flag for calling code and oughta be an enum, but, isn’t yet.)

Now the top-level function is responsible for all arena management, and all is well.

Except, er. possibleIntersection is called multiple times, and I don’t want to copy-paste a dozen lines of split code after each call. I tried putting just that code in its own function, which had the world’s most godawful signature, and that didn’t work because… uh… hm. I can’t remember why, exactly! Should’ve written that down.

I tried a local closure next, but closures capture their environment by reference, so now I had references to a bunch of locals for as long as the closure existed, which meant I couldn’t mutate those locals. Argh. (This seems a little silly to me, since the closure’s references cannot possibly be used for anything if the closure isn’t being called, but maybe I’m missing something. Or maybe this is just a limitation of lifetimes.)

Increasingly desperate, I tried using a macro. But… macros are hygienic, which means that any new name you use inside a macro is different from any name outside that macro. The macro thus could not see any of my locals. Usually that’s good, but here I explicitly wanted the macro to mess with my locals.

I was just about to give up and go live as a hermit in a cabin in the woods, when I discovered something quite incredible. You can define local macros! If you define a macro inside a function, then it can see any locals defined earlier in that function. Perfect!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
macro_rules! _split_segment (
    ($seg:expr, $pt:expr) => (
        {
            let pt = $pt;
            let seg = $seg;
            // ... waaay too much code ...
        }
    );
);

loop {
    // ...
    // This is possibleIntersection, renamed because Rust rightfully complains about camelCase
    let cross = handle_intersections(Some(segment), maybe_above);
    if let Some(pt) = cross.1 {
        segment = _split_segment!(segment, pt);
    }
    if let Some(pt) = cross.2 {
        maybe_above = Some(_split_segment!(maybe_above.unwrap(), pt));
    }
    // ...
}

(This doesn’t actually quite match the original algorithm, which has one case where a segment can be split twice. I realized that I could just do the left-most split, and a later iteration would perform the other split. I sure hope that’s right, anyway.)

It’s a bit ugly, and I ran into a whole lot of implicit behavior from the C++ code that I had to fix — for example, the segment is sometimes mutated just before it’s split, purely as a shortcut for mutating the left part of the split. But it finally compiles! And runs! And kinda worked, a bit!

Aftermath

I still had a lot of work to do.

For one, this code was designed for intersecting two shapes, not mass-intersecting a big pile of shapes. The basic algorithm doesn’t care about how many polygons you start with — all it sees is segments — but the code for constructing the return value needed some heavy modification.

The biggest change by far? The original code traced each segment once, expecting the result to be only a single shape. I had to change that to trace each side of each segment once, since the vast bulk of the output consists of shapes which share a side. This violated a few assumptions, which I had to hack around.

I also ran into a couple very bad edge cases, spent ages debugging them, then found out that the original algorithm had a subtle workaround that I’d commented out because it was awkward to port but didn’t seem to do anything. Whoops!

The worst was a precision error, where a vertical line could be split on a point not quite actually on the line, which wreaked all kinds of havoc. I worked around that with some tasteful rounding, which is highly dubious but makes the output more appealing to my squishy human brain. (I might switch to the original workaround, but I really dislike that even simple cases can spit out points at 1500.0000000000003. The whole thing is parameterized over the coordinate type, so maybe I could throw a rational type in there and cross my fingers?)

All that done, I finally, finally, after a couple months of intermittent progress, got what I wanted!

This is Doom 2’s MAP01. The black area to the left of center is where the player starts. Gray areas indicate where the player can walk from there, with lighter shades indicating more distant areas, where “distance” is measured by the minimum number of line crossings. Red areas can’t be reached at all.

(Note: large playable chunks of the map, including the exit room, are red. That’s because those areas are behind doors, and this code doesn’t understand doors yet.)

(Also note: The big crescent in the lower-right is also black because I was lazy and looked for the player’s starting sector by checking the bbox, and that sector’s bbox happens to match.)

The code that generated this had to go out of its way to delete all the unreachable zones around solid walls. I think I could modify the algorithm to do that on the fly pretty easily, which would probably speed it up a bit too. Downside is that the algorithm would then be pretty specifically tied to this problem, and not usable for any other kind of polygon intersection, which I would think could come up elsewhere? The modifications would be pretty minor, though, so maybe I could confine them to a closure or something.

Some final observations

It runs surprisingly slowly. Like, multiple seconds. Unless I add --release, which speeds it up by a factor of… some number with multiple digits. Wahoo. Debug mode has a high price, especially with a lot of calls in play.

The current state of this code is on GitHub. Please don’t look at it. I’m very sorry.

Honestly, most of my anguish came not from Rust, but from the original code relying on lots of fairly subtle behavior without bothering to explain what it was doing or even hint that anything unusual was going on. God, I hate C++.

I don’t know if the Rust community can learn from this. I don’t know if I even learned from this. Let’s all just quietly forget about it.

Now I just need to figure this one out…