Gizmodo is reporting that schools in the US are buying equipment to unlock cell phones from companies like Cellebrite:
Gizmodo has reviewed similar accounting documents from eight school districts, seven of which are in Texas, showing that administrators paid as much $11,582 for the controversial surveillance technology. Known as mobile device forensic tools (MDFTs), this type of tech is able to siphon text messages, photos, and application data from student’s devices. Together, the districts encompass hundreds of schools, potentially exposing hundreds of thousands of students to invasive cell phone searches.
One of the things we learned from the Snowden documents is that the NSA conducts “about” searches. That is, searches based on activities and not identifiers. A normal search would be on a name, or IP address, or phone number. An about search would something like “show me anyone that has used this particular name in a communications,” or “show me anyone who was at this particular location within this time frame.” These searches are legal when conducted for the purpose of foreign surveillance, but the worry about using them domestically is that they are unconstitutionally broad. After all, the only way to know who said a particular name is to know what everyone said, and the only way to know who was at a particular location is to know where everyone was. The very nature of these searches requires mass surveillance.
The FBI does not conduct mass surveillance. But many US corporations do, as a normal part of their business model. And the FBI uses that surveillance infrastructure to conduct its own about searches. Here’s an arson case where the FBI asked Google who searched for a particular street address:
Homeland Security special agent Sylvette Reynoso testified that her team began by asking Google to produce a list of public IP addresses used to google the home of the victim in the run-up to the arson. The Chocolate Factory [Google] complied with the warrant, and gave the investigators the list. As Reynoso put it:
On June 15, 2020, the Honorable Ramon E. Reyes, Jr., United States Magistrate Judge for the Eastern District of New York, authorized a search warrant to Google for users who had searched the address of the Residence close in time to the arson.
The records indicated two IPv6 addresses had been used to search for the address three times: one the day before the SUV was set on fire, and the other two about an hour before the attack. The IPv6 addresses were traced to Verizon Wireless, which told the investigators that the addresses were in use by an account belonging to Williams.
Google’s response is that this is rare:
While word of these sort of requests for the identities of people making specific searches will raise the eyebrows of privacy-conscious users, Google told The Register the warrants are a very rare occurrence, and its team fights overly broad or vague requests.
“We vigorously protect the privacy of our users while supporting the important work of law enforcement,” Google’s director of law enforcement and information security Richard Salgado told us. “We require a warrant and push to narrow the scope of these particular demands when overly broad, including by objecting in court when appropriate.
“These data demands represent less than one per cent of total warrants and a small fraction of the overall legal demands for user data that we currently receive.”
Here’s another example of what seems to be about data leading to a false arrest.
According to the lawsuit, police investigating the murder knew months before they arrested Molina that the location data obtained from Google often showed him in two places at once, and that he was not the only person who drove the Honda registered under his name.
Avondale police knew almost two months before they arrested Molina that another man his stepfather sometimes drove Molina’s white Honda. On October 25, 2018, police obtained records showing that Molina’s Honda had been impounded earlier that year after Molina’s stepfather was caught driving the car without a license.
Data obtained by Avondale police from Google did show that a device logged into Molina’s Google account was in the area at the time of Knight’s murder. Yet on a different date, the location data from Google also showed that Molina was at a retirement community in Scottsdale (where his mother worked) while debit card records showed that Molina had made a purchase at a Walmart across town at the exact same time.
Molina’s attorneys argue that this and other instances like it should have made it clear to Avondale police that Google’s account-location data is not always reliable in determining the actual location of a person.
“About” searches might be rare, but that doesn’t make them a good idea. We have knowingly and willingly built the architecture of a police state, just so companies can show us ads. (And it is increasingly apparent that the advertising-supported Internet is heading for a crash.)
Pretty horrible story of a US journalist who had his computer and phone searched at the border when returning to the US from Mexico.
After I gave him the password to my iPhone, Moncivias spent three hours reviewing hundreds of photos and videos and emails and calls and texts, including encrypted messages on WhatsApp, Signal, and Telegram. It was the digital equivalent of tossing someone’s house: opening cabinets, pulling out drawers, and overturning furniture in hopes of finding something — anything — illegal. He read my communications with friends, family, and loved ones. He went through my correspondence with colleagues, editors, and sources. He asked about the identities of people who have worked with me in war zones. He also went through my personal photos, which I resented. Consider everything on your phone right now. Nothing on mine was spared.
Pomeroy, meanwhile, searched my laptop. He browsed my emails and my internet history. He looked through financial spreadsheets and property records and business correspondence. He was able to see all the same photos and videos as Moncivias and then some, including photos I thought I had deleted.
If you are a U.S. citizen, border agents cannot stop you from entering the country, even if you refuse to unlock your device, provide your device password, or disclose your social media information. However, agents may escalate the encounter if you refuse. For example, agents may seize your devices, ask you intrusive questions, search your bags more intensively, or increase by many hours the length of detention. If you are a lawful permanent resident, agents may raise complicated questions about your continued status as a resident. If you are a foreign visitor, agents may deny you entry.
The most important piece of advice is to think about this all beforehand, and plan accordingly.
This is a pretty awful story of how Andreas Gal, former Mozilla CTO and US citizen, was detained and threatened at the US border. CBP agents demanded that he unlock his phone and computer.
Know your rights when you enter the US. The EFF publishes a handy guide. And if you want to encrypt your computer so that you are unable to unlock it on demand, here’s my guide. Remember not to lie to a customs officer; that’s a crime all by itself.
The police are increasingly getting search warrants for information about all cell phones in a certain location at a certain time:
Police departments across the country have been knocking at Google’s door for at least the last two years with warrants to tap into the company’s extensive stores of cellphone location data. Known as “reverse location search warrants,” these legal mandates allow law enforcement to sweep up the coordinates and movements of every cellphone in a broad area. The police can then check to see if any of the phones came close to the crime scene. In doing so, however, the police can end up not only fishing for a suspect, but also gathering the location data of potentially hundreds (or thousands) of innocent people. There have only been anecdotal reports of reverse location searches, so it’s unclear how widespread the practice is, but privacy advocates worry that Google’s data will eventually allow more and more departments to conduct indiscriminate searches.
Of course, it’s not just Google who can provide this information.
I spend a lot of time talking about this sort of thing in Data and Goliath. Once you have everyone under surveillance all the time, many things are possible.
The New York Times is reporting about a company called Securus Technologies that gives police the ability to track cell phone locations without a warrant:
The service can find the whereabouts of almost any cellphone in the country within seconds. It does this by going through a system typically used by marketers and other companies to get location data from major cellphone carriers, including AT&T, Sprint, T-Mobile and Verizon, documents show.
Spencer Ackerman has this interesting story about a guy assigned to crack down on unauthorized White House leaks. It’s necessarily light on technical details, so I thought I’d write up some guesses, either as a guide for future reporters asking questions, or for people who want to better know the risks when leak information.
It should come as no surprise that your work email and phone are already monitored. They can get every email you’ve sent or received, even if you’ve deleted it. They can get every text message you’ve sent or received, the metadata of every phone call sent or received, and so forth.
To a lesser extent, this also applies to your well-known personal phone and email accounts. Law enforcement can get the metadata (which includes text messages) for these things without a warrant. In the above story, the person doing the investigation wasn’t law enforcement, but I’m not sure that’s a significant barrier if they can pass things onto the Secret Service or something.
The danger here isn’t that you used these things to leak, it’s that you’ve used these things to converse with the reporter before you made the decision to leak. That’s what happened in the Reality Winner case: she communicated with The Intercept before she allegedly leaked a printed document to them via postal mail. While it wasn’t conclusive enough to convict her, the innocent emails certainly put the investigators on her trail.
The path to leaking often starts this way: innocent actions before the decision to leak was made that will come back to haunt the person afterwards. That includes emails. That also includes Google searches. That includes websites you visit (like this one). I’m not sure how to solve this, except that if you’ve been in contact with The Intercept, and then you decide to leak, send it to anybody but The Intercept.
By the way, the other thing that caught Reality Winner is the records they had of her accessing files and printing them on a printer. Depending where you work, they may have a record of every file you’ve accessed, every intranet page you visited. Because of the way printers put secret dots on documents, investigators know precisely which printer and time the document leaked to The Intercept was printed.
Photographs suffer the same problem: your camera and phone tag the photographs with GPS coordinates and time the photograph was taken, as well as information about the camera. This accidentally exposed John McAfee’s hiding location when Vice took pictures of him a few years ago. Some people leak by taking pictures of the screen — use a camera without GPS for this (meaning, a really old camera you bought from a pawnshop).
These examples should impress upon you the dangers of not understanding technology. As soon as you do something to evade surveillance you know about, you may get caught by surveillance you don’t know about.
If you nonetheless want to continue forward, the next step may be to get a “burner phone”. You can get an adequate Android “prepaid” phone for cash at the local Walmart, electronics store, or phone store.
There’s some problems with such phones, though. They can often be tracked back to the store that sold them, and the store will have security cameras that record you making the purchase. License plate readers and GPS tracking on your existing phone may also place you at that Walmart.
I don’t know how to resolve these problems. Perhaps the best is grow a beard and on the last day of your vacation, color your hair, take a long bike/metro ride (without your existing phone) to a store many miles away and pick up a phone, then shave and change your color back again. I don’t know — there’s a good chance any lame attempt you or I might think of has already been experienced by law enforcement, so they are likely ahead of you. Maybe ask your local drug dealer where they get their burner phones, and if they can sell you one. Of course, that just means when they get caught for drug dealing, they can reduce their sentence by giving up the middle class person who bought a phone from them.
Lastly, they may age out old security videos, so simply waiting six months before using the phone might work. That means prepaying for an entire year.
Note that I’m not going to link to examples of cheap burner phones on this page. Web browsers will sometimes prefetch some information from links in a webpage, so simply including links in this page can condemn you as having interest in burner phones. You are already in enough trouble for having visited this web page.
Burner phones have GPS. Newer the technology, like the latest Android LTE phones, have pretty accurate GPS that the police can query (without a warrant). If you take the phone home and turn it on, they’ll then be able to trace back the phone to your home. Carrying the phone around with you has the same problem, with the phone’s location correlating with your existing phone (which presumably you also carry) or credit card receipts. Rumors are that Petraeus was partly brought down by tracking locations where he used his credit card, namely, matching the hotel he was in with Internet address information.
Older phones that support 3G or even 2G have poorer GPS capabilities. They’ll still located you to the nearest cell tower, but not as accurately to your exact location.
A better strategy than a burner phone would be a burner laptop computer used with WiFi. You can get a cheap one for $200 at Amazon.com. My favorite are the 11 inch ones with a full sized keyboard and Windows 10. Better yet, get an older laptop for cash from a pawn shop.
You can install chat apps on this like “Signal Desktop”, “Wire Desktop”, or “WhatsApp” that will allow you to securely communicate. Or use “Discord”, which isn’t really encrypted, but it’s popular among gamers so therefore less likely to stand out. You can sit in a bar with free WiFi and a USB headset and talk to reporters without having a phone. If the reporter you want to leak to doesn’t have those apps (either on their own laptop or phone) then you don’t want to talk to them.
Needless to say, don’t cross the streams. Don’t log onto your normal accounts like Facebook. If you create fake Facebook accounts, don’t follow the same things. Better yet, configure your browser to discard all information (especially “cookies”) every time you log off, so you can’t be tracked. Install ad blockers, or use the “Brave” web browser, to remove even more trackers. A common trick among hackers is to change the “theme” to a red background, as a constant subliminal reminder that you using your dangerous computer, and never to do anything that identifies the real you.
Put tape over the camera. I’m not sure it’s a really big danger, but put tape over the camera. If they infect you enough to get your picture, they’ve also infected you enough to record any audio on your computer. Remember that proper encryption is end-to-end (they can’t eavesdrop in transit), but if they hack the ends (your laptop, or the reporter’s) they can still record the audio.
Note that when your burner laptop is in “sleep” mode, it can still be talking to the local wifi. Before taking it home, make sure it’s off. Go into the settings and configure it so that when the lid is closed, the computer is turned completely off.
It goes without saying: don’t use that burner laptop from home. Luckily, free wifi is everyone, so the local cafe, bar, or library can be used.
The next step is to also use a VPN or Tor to mask your Internet address. If there’s an active investigation into the reporter, they’ll get the metadata, the Internet address of the bar/cafe you are coming from. A good VPN provider or especially Tor will stop this. Remember that these providers increase latency, making phone calls a bit harder, but they are a lot safer.
Remember that Ross Ulbricht (owner of dark website market Silk Road) was caught in a library. They’d traced back his Internet address and grabbed his laptop out of his hands. Having it turn off (off off, not sleep off) when the lid is closed is one way to reduce this risk. Configuring your web browser to flush all cookies and passwords on restart is another. If they catch you in mid conversation with your secret contact, though, they’ll at least be able to hear your side of the conversation, and know who you are talking to.
The best measure, though it takes some learning, is “Tails live”. It’s a Linux distribution preconfigured with Tor and various secure chat apps that’ll boot from the USB or SD card. When you turn off the computer, nothing will be saved, so there will be no evidence saved to the disk for investigators to retrieve later.
While we are talking about Tor, it should be noted that many news organizations (NYTimes, Washington Post, The Intercept, etc.) support “SecureDrop” accessed only through Tor for receiving anonymous tips. Burner laptops you use from bars from Tails is the likely your most secure way of doing things.
Summary
The point of this post was not to provide a howto guide, but to discuss many of the technological issues involved. In a story about White House people investigating leaks, I’d like to see something in this technological direction. I’d like to know exactly how they were investigating leaks. Certainly, they were investigating all work computers, accounts, and phones. Where they also able to get to non-work computers, accounts, phones? Did they have law enforcement powers? What could they do about burner phones and laptops?
In any case, if you do want a howto guide, the discussion above should put some fear into you how easily you can inadvertently make a mistake.
In this post, I’ll show you how to create a sample dataset for Amazon Macie, and how you can use Amazon Macie to implement data-centric compliance and security analytics in your Amazon S3 environment. I’ll also dive into the different kinds of credentials, document types, and PII detections supported by Macie. First, I’ll walk through creating a “getting started” sample set of artificial, generated data that you can use to test Macie capabilities and start building your own policies and alerts.
Create a realistic data sample set in S3
I’ll use amazon-macie-activity-generator, which we call “AMG” for short, a sample application developed by AWS that generates realistic content and accesses your test account to create the data. AMG uses AWS CloudFormation, AWS Lambda, and Python’s excellent Faker library to create a data set with artificial—but realistic—data classifications and access patterns to help test some of the features and capabilities of Macie. AMG is released under Amazon Software License 1.0, and we’ll accept pull requests on our GitHub repository and monitor any issues that are opened so we can try to fix bugs and consider new feature requests.
The following diagram shows a high level architecture overview of the components that will be created in your AWS account for AMG. For additional detail about these components and their relationships, review the CloudFormation setup script.
Depending on the data types specified in your JSON configuration template (details below), AMG will periodically generate artificial documents for the specified S3 target with a PutObject action. By default, the CloudFormation stack uses a configuration file that instructs AMG to create a new, private S3 bucket that can only be accessed by authorized AWS users/roles in the same account as the bucket. All the S3 objects with fake data in this bucket have a private ACL and inherit the bucket’s access control configuration. All generated objects feature the header in the example below, and AMG supports all fake data providers offered by https://faker.readthedocs.io/en/latest/index.html, as well as a few of AMG‘s own custom fake data providers requested by our customers: aws_creds, slack_creds, github_creds, facebook_creds, linux_shadow, rsa, linux_passwd, dsa, ec, pgp, cert, itin, swift_code, and cve.
# Sample Report - No identification of actual persons or places is # intended or should be inferred
74323 Julie Field Lake Joshuamouth, OR 30055-3905 1-196-191-4438x974 53001 Paul Union New John, HI 94740 Mastercard Amanda Wells 5135725008183484 09/26 CVV: 550
354-70-6172 242 George Plaza East Lawrencefurt, VA 37287-7620 GB73WAUS0628038988364 587 Silva Village Pearsonburgh, NM 11616-7231 LDNM1948227117807 American Express Brett Garza 347965534580275 05/20 CID: 4758
599.335.2742 JCB 15 digit Michael Arias 210069190253121 03/27 CVC: 861
Create your amazon-macie-activity-generator CloudFormation stack
You can deploy AMG in your AWS account by using either these methods:
Log in to the AWS Console in a region supported by Amazon Macie, which currently includes US East (N. Virginia), US West (Oregon).
Select the One-click CloudFormation launch stack, or launch CloudFormation using the template above.
Read our terms, select the Acknowledgement box, and then select Create.
Creating the data takes a few minutes, and you can periodically refresh CloudWatch to track progress.
Add the new sample data to Macie
Now, I’ll log into the Macie console and add the newly created sample data buckets for analysis by Macie.
Note: If you don’t explicitly specify a bucket for S3 targets in CloudFormation, AMG will use the S3 bucket that’s created by default for the stack, which will be printed out in the CloudFormation stack’s output.
To add buckets for data classification, follow these steps:
Log in to Amazon Macie.
Select Integrations, and then select Services.
Select your account, and then select Details from the Amazon S3 card.
Select your newly created buckets for Full classification, including existing data.
For additional details on configuring Macie, refer to our getting started documentation.
Macie classifies all historical and newly created data in the buckets created by AMG, and the data will be available in the Macie console as it’s classified. Typically, you can expect the data in the sample set to be classified within 60 minutes of the time it was selected for analysis.
Classifying objects with Macie
To see the objects in your test sample set, in Macie, open the Research tab, and then select the S3 Objects index. We’ll use the regular expression search capability in Macie to find any objects written to buckets that start with “amazon-macie-activity-generator-defaults3bucket”. To search for this, type the following text into the Macie search box and select the magnifying glass icon.
From here, you can see a nice breakdown of the kinds of objects that have been classified by Macie, as well as the object-specific details. Create an advanced search using Lucene Query Syntax, and save it as an alert to be matched against any newly created data.
Analyzing accesses to your test data
In addition to classifying data, Macie tracks all control plane and data plane accesses to your content using CloudTrail. To see accesses to your generated environment (created periodically by AMG to mimic user activity), on the Macie navigation bar, select Research, select the CloudTrail data index, and then use the following search to identify our generated role activity:
From this search, you can dive into the user activity (IAM users, assumed roles, federated users, and so on), which is summarized in 5-minute aggregations (user sessions). For example, in the screen shot you can see that one of our AMG-generated users listed objects one time (ListObjects) and wrote 56 objects to S3 (PutObject) during a 5-minute period.
Macie alerts
Macie features both predictive (machine learning-based) and basic (rule-based) alerts, including alerts on unencrypted credentials being uploaded to S3 (because this activity might not follow compliance best practices), risky activity such as data exfiltration, and user-defined alerts that are based on saved searches. To see alerts that have been generated based on AMG‘s activity, on the Macie navigation bar, select Alerts.
AMG will continue to run, periodically uploading content to the specified S3 buckets. To stop AMG, delete the AMG CloudFormation stack and associated resources here.
What are the costs?
Macie has a free tier enabling up to 1GB of content to be analyzed per month at no cost to you. By default, AMG will write approximately 10MB of objects to Amazon S3 per day, and you will incur charges for data classification after crossing the 1GB monthly free tier. Running continuously, AMG will generate about 310MB of content per month (10MB/day x 31 days), which will stay below the free tier. Any data use above 1GB will be billed at the Macie public price of $5/GB. For more detail, see the Macie pricing documentation.
If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the Amazon Macie forum or contact AWS Support.
In the wake of the Cambridge Analytica scandal, news articles and commentators have focused on what Facebook knows about us. A lot, it turns out. It collects data from our posts, our likes, our photos, things we type and delete without posting, and things we do while not on Facebook and even when we’re offline. It buys data about us from others. And it can infer even more: our sexual orientation, political beliefs, relationship status, drug use, and other personality traits — even if we didn’t take the personality test that Cambridge Analytica developed.
But for every article about Facebook’s creepy stalker behavior, thousands of other companies are breathing a collective sigh of relief that it’s Facebook and not them in the spotlight. Because while Facebook is one of the biggest players in this space, there are thousands of other companies that spy on and manipulate us for profit.
Harvard Business School professor Shoshana Zuboff calls it “surveillance capitalism.” And as creepy as Facebook is turning out to be, the entire industry is far creepier. It has existed in secret far too long, and it’s up to lawmakers to force these companies into the public spotlight, where we can all decide if this is how we want society to operate and — if not — what to do about it.
There are 2,500 to 4,000 data brokers in the United States whose business is buying and selling our personal data. Last year, Equifax was in thenews when hackers stole personal information on 150 million people, including Social Security numbers, birth dates, addresses, and driver’s license numbers.
You certainly didn’t give it permission to collect any of that information. Equifax is one of those thousands of data brokers, most of them you’ve never heard of, selling your personal information without your knowledge or consent to pretty much anyone who will pay for it.
Surveillance capitalism takes this one step further. Companies like Facebook and Google offer you free services in exchange for your data. Google’s surveillance isn’t in the news, but it’s startlingly intimate. We never lie to our search engines. Our interests and curiosities, hopes and fears, desires and sexual proclivities, are all collected and saved. Add to that the websites we visit that Google tracks through its advertising network, our Gmail accounts, our movements via Google Maps, and what it can collect from our smartphones.
That phone is probably the most intimate surveillance device ever invented. It tracks our location continuously, so it knows where we live, where we work, and where we spend our time. It’s the first and last thing we check in a day, so it knows when we wake up and when we go to sleep. We all have one, so it knows who we sleep with. Uber used just some of that information to detect one-night stands; your smartphone provider and any app you allow to collect location data knows a lot more.
Surveillance capitalism drives much of the internet. It’s behind most of the “free” services, and many of the paid ones as well. Its goal is psychological manipulation, in the form of personalized advertising to persuade you to buy something or do something, like vote for a candidate. And while the individualized profile-driven manipulation exposed by Cambridge Analytica feels abhorrent, it’s really no different from what every company wants in the end. This is why all your personal information is collected, and this is why it is so valuable. Companies that can understand it can use it against you.
None of this is new. The media has been reporting on surveillance capitalism for years. In 2015, I wrote a book about it. Back in 2010, the Wall Street Journal publishedan award-winning two-year series about how people are tracked both online and offline, titled “What They Know.”
Surveillance capitalism is deeply embedded in our increasingly computerized society, and if the extent of it came to light there would be broad demands for limits and regulation. But because this industry can largely operate in secret, only occasionally exposed after a data breach or investigative report, we remain mostly ignorant of its reach.
This might change soon. In 2016, the European Union passed the comprehensive General Data Protection Regulation, or GDPR. The details of the law are far too complex to explain here, but some of the things it mandates are that personal data of EU citizens can only be collected and saved for “specific, explicit, and legitimate purposes,” and only with explicit consent of the user. Consent can’t be buried in the terms and conditions, nor can it be assumed unless the user opts in. This law will take effect in May, and companies worldwide are bracing for its enforcement.
Because pretty much all surveillance capitalism companies collect data on Europeans, this will expose the industry like nothing else. Here’s just one example. In preparation for this law, PayPal quietlypublished a list of over 600 companies it might share your personal data with. What will it be like when every company has to publish this sort of information, and explicitly explain how it’s using your personal data? We’re about to find out.
In the wake of this scandal, even Mark Zuckerberg saidthat his industry probably should be regulated, although he’s certainly not wishing for the sorts of comprehensive regulation the GDPR is bringing to Europe.
He’s right. Surveillance capitalism has operated without constraints for far too long. And advances in both big data analysis and artificial intelligence will make tomorrow’s applications far creepier than today’s. Regulation is the only answer.
The first step to any regulation is transparency. Who has our data? Is it accurate? What are they doing with it? Who are they selling it to? How are they securing it? Can we delete it? I don’t see any hope of Congress passing a GDPR-like data protection law anytime soon, but it’s not too far-fetched to demand laws requiring these companies to be more transparent in what they’re doing.
One of the responses to the Cambridge Analytica scandal is that people are deleting their Facebook accounts. It’s hard to do right, and doesn’t do anything about the data that Facebook collectsaboutpeople who don’t use Facebook. But it’s a start. The market can put pressure on these companies to reduce their spying on us, but it can only do that if we force the industry out of its secret shadows.
The cell phones we carry with us constantly are the most perfect surveillance device ever invented, and our laws haven’t caught up to that reality. That might change soon.
This week, the Supreme Court will hear a case with profound implications on your security and privacy in the coming years. The Fourth Amendment’s prohibition of unlawful search and seizure is a vital right that protects us all from police overreach, and the way the courts interpret it is increasingly nonsensical in our computerized and networked world. The Supreme Court can either update current law to reflect the world, or it can further solidify an unnecessary and dangerous police power.
The case centers on cell phone location data and whether the police need a warrant to get it, or if they can use a simple subpoena, which is easier to obtain. Current Fourth Amendment doctrine holds that you lose all privacy protections over any data you willingly share with a third party. Your cellular provider, under this interpretation, is a third party with whom you’ve willingly shared your movements, 24 hours a day, going back months — even though you don’t really have any choice about whether to share with them. So police can request records of where you’ve been from cell carriers without any judicial oversight. The case before the court, Carpenter v. United States, could change that.
Traditionally, information that was most precious to us was physically close to us. It was on our bodies, in our homes and offices, in our cars. Because of that, the courts gave that information extra protections. Information that we stored far away from us, or gave to other people, afforded fewer protections. Police searches have been governed by the “third-party doctrine,” which explicitly says that information we share with others is not considered private.
The Internet has turned that thinking upside-down. Our cell phones know who we talk to and, if we’re talking via text or e-mail, what we say. They track our location constantly, so they know where we live and work. Because they’re the first and last thing we check every day, they know when we go to sleep and when we wake up. Because everyone has one, they know whom we sleep with. And because of how those phones work, all that information is naturally shared with third parties.
More generally, all our data is literally stored on computers belonging to other people. It’s our e-mail, text messages, photos, Google docs, and more all in the cloud. We store it there not because it’s unimportant, but precisely because it is important. And as the Internet of Things computerizes the rest our lives, even more data will be collected by other people: data from our health trackers and medical devices, data from our home sensors and appliances, data from Internet-connected “listeners” like Alexa, Siri, and your voice-activated television.
All this data will be collected and saved by third parties, sometimes for years. The result is a detailed dossier of your activities more complete than any private investigator – or police officer – could possibly collect by following you around.
The issue here is not whether the police should be allowed to use that data to help solve crimes. Of course they should. The issue is whether that information should be protected by the warrant process that requires the police to have probable cause to investigate you and get approval by a court.
Warrants are a security mechanism. They prevent the police from abusing their authority to investigate someone they have no reason to suspect of a crime. They prevent the police from going on “fishing expeditions.” They protect our rights and liberties, even as we willingly give up our privacy to the legitimate needs of law enforcement.
The third-party doctrine never made a lot of sense. Just because I share an intimate secret with my spouse, friend, or doctor doesn’t mean that I no longer consider it private. It makes even less sense in today’s hyper-connected world. It’s long past time the Supreme Court recognized that a months’-long history of my movements is private, and my e-mails and other personal data deserve the same protections, whether they’re on my laptop or on Google’s servers.
TL;DR pip install numpy used to take ages, and now it’s super fast thanks to piwheels.
The Python Package Index (PyPI) is a package repository for Python modules. Members of the Python community publish software and libraries in it as an easy method of distribution. If you’ve ever used pip install, PyPI is the service that hosts the software you installed. You may have noticed that some installations can take a long time on the Raspberry Pi. That usually happens when modules have been implemented in C and require compilation.
No more slacking off! pip install numpy takes just a few seconds now \o/
Wheels for Python packages
A general solution to this problem exists: Python wheels are a standard for distributing pre-built versions of packages, saving users from having to build from source. However, when C code is compiled, it’s compiled for a particular architecture, so package maintainers usually publish wheels for 32-bit and 64-bit Windows, macOS, and Linux. Although Raspberry Pi runs Linux, its architecture is ARM, so Linux wheels are not compatible.
What Python wheels are not
Pip works by browsing PyPI for a wheel matching the user’s architecture — and if it doesn’t find one, it falls back to the source distribution (usually a tarball or zip of the source code). Then the user has to build it themselves, which can take a long time, or may require certain dependencies. And if pip can’t find a source distribution, the installation fails.
Developing piwheels
In order to solve this problem, I decided to build wheels of every package on PyPI. I wrote some tooling for automating the process and used a postgres database to monitor the status of builds and log the output. Using a Pi 3 in my living room, I attempted to build wheels of the latest version of all 100 000 packages on PyPI and to host them on a web server on the Pi. This took a total of ten days, and my proof-of-concept seemed to show that it generally worked and was likely to be useful! You could install packages directly from the server, and installations were really fast.
This Pi 3 was the piwheels beta server, sitting atop my SSH gateway Pi 2 at home
I proceeded to plan for version 2, which would attempt to build every version of every package — about 750 000 versions in total. I estimated this would take 75 days for one Pi, but I intended to scale it up to use multiple Pis. Our web hosts Mythic Beasts provide dedicated Pi 3 hosting, so I fired up 20 of them to share the load. With some help from Dave Jones, who created an efficient queuing system for the builders, we were able make this run like clockwork. In under two weeks, it was complete! Read ALL about the first build run drama on my blog.
ALL the cloud Pis
Improving piwheels
We analysed the failures, made some tweaks, installed some key dependencies, and ran the build again to raise our success rate from 76% to 83%. We also rebuilt packages for Python 3.5 (the new default in Raspbian Stretch). The wheels we build are tagged ‘armv7l’, but because our Raspbian image is compatible with all Pi models, they’re really ARMv6, so they’re compatible with Pi 3, Pi 2, Pi 1 and Pi Zero. This means the ‘armv6l’-tagged wheels we provide are really just the ARMv7 wheels renamed.
The wonderful piwheels monitor interface created by Dave
Now, you might be thinking “Why didn’t you just cross-compile?” I really wanted to have full compatibility, and building natively on Pis seemed to be the best way to achieve that. I had easy access to the Pis, and it really didn’t take all that long. Plus, you know, I wanted to eat my own dog food.
You might also be thinking “Why don’t you just apt install python3-numpy?” It’s true that some Python packages are distributed via the Raspbian/Debian archives too. However, if you’re in a virtual environment, or you need a more recent version than the one packaged for Debian, you need pip.
How it works
Now that the piwheels package repository is running as a service, hosted on a Pi 3 in the Mythic Beasts data centre in London. The pip package in Raspbian Stretch is configured to use piwheels as an additional index, so it falls back to PyPI if we’re missing a package. Just run sudo apt upgrade to get the configuration change. You’ll find that pip installs are much faster now! If you want to use piwheels on Raspbian Jessie, that’s possible too — find the instructions in our FAQs. And now, every time you pip install something, your files come from a web server running on a Raspberry Pi (that capable little machine)!
This takes about 20 minutes on a Pi 3, 2.5 hours on a Pi 1, or just a few seconds on either if you use piwheels.
If you’re interested to see the details, try pip install numpy -v for verbose output. You’ll see that pip discovers two indexes to search:
2 location(s) to search for versions of numpy:
* https://pypi.python.org/simple/numpy/
* https://www.piwheels.hostedpi.com/simple/numpy/
Then it searches both indexes for available files. From this list of files, it determines the latest version available. Next it looks for a Python version and architecture match, and then opts for a wheel over a source distribution. If a new package or version is released, piwheels will automatically pick it up and add it to the build queue.
How piwheels works
For the users unfamiliar with virtual environments I should mention that doing this isn’t a requirement — just an easy way of testing installations in a sandbox. Most pip usage will require sudo pip3 install {package}, which installs at a system level.
If you come across any issues with any packages from piwheels, please let us know in a GitHub issue.
Taking piwheels further
We currently provide over 670 000 wheels for more than 96 000 packages, all compiled natively on Raspberry Pi hardware. Moreover, we’ll keep building new packages as they are released.
Note that, at present, we have built wheels for Python 3.4 and 3.5 — we’re planning to add support for Python 3.6 and 2.7. The fact that piwheels is currently missing Python 2 wheels does not affect users: until we rebuild for Python 2, PyPI will be used as normal, it’ll just take longer than installing a Python 3 package for which we have a wheel. But remember, Python 2 end-of-life is less than three years away!
Search has become the most powerful method to find content on the Web, both for finding websites themselves and for discovering information within websites. Our blog readers find content in both ways — using Google, Bing, Yahoo, Ask, DuckDuckGo, and other search engines to follow search results directly to our blog, and using the site search function once on our blog to find content in the blog posts themselves.
There’s a Lot of Great Content on the Backblaze Blog
Backblaze’s CEO Gleb Budman wrote the first post for this blog in March of 2008. Since that post there have been 612 more. There’s a lot of great content on this blog, as evidenced by the more than two million page views we’ve had since the beginning of this year. We typically publish two blog posts per week on a variety of topics, but we focus primarily on cloud storage technology and data backup, company news, and how-to articles on how to use cloud storage and various hardware and software solutions.
Earlier this year we initiated a series of posts on entrepreneurship by our CEO and co-founder, Gleb Budman, which has proven tremendously popular. We also occasionally publish something a little lighter, such as our current Halloween video contest — there’s still time to enter!
The Site Search Box — Your gateway to Backblaze blog content
We Could do a Better Job of Helping You Find It
I joined Backblaze as Content Director in July of this year. During the application process, I spent quite a bit of time reading through the blog to understand the company, the market, and its customers. That’s a lot of reading. I used the site search many times to uncover topics and posts, and discovered that site search had a number of weaknesses that made it less-than-easy to find what I was looking for.
These site search weaknesses included:
Searches were case sensitive
Visitor could easily miss content capitalized differently than the search terms
Results showed no date or author information
Visitor couldn’t tell how recent the post was or who wrote it
Search terms were not highlighted in context
Visitor had to scrutinize the results to find the terms in the post
No indication of the number of results or number of pages of results
Visitor didn’t know how fruitful the search was
No record of search terms used by visitors
We couldn’t tell what our visitors were searching for!
I wanted to make it easier for blog visitors to find all the great content on the Backblaze blog and help me understand what our visitors are searching for. To do that, we needed to upgrade our site search.
I started with a list of goals I wanted for site search.
Make it easier to find content on the blog
Provide a summary of what was found
Search the comments as well as the posts
Highlight the search terms in the results to help find them in context
Provide a record of searches to help me understand what interests our readers
I had the goals, now how could I find a solution to achieve them?
Our blog is built on WordPress, which has a built-in site search function that could be described as simply adequate. The most obvious of its limitations is that search results are listed chronologically, not based on “most popular,” most occurring,” or any other metric that might make the result more relevant to your interests.
The Search for Improved (Site) Search
An obvious choice to improve site search would be to adopt Google Site Search, as many websites and blogs have done. Unfortunately, I quickly discovered that Google is sunsetting Site Search by April of 2018. That left the choice among a number of third-party services or WordPress-specific solutions. My immediate inclination was to see what is available specifically for WordPress.
There are a handful of search plugins for WordPress. One stood out to me for the number of installations (100,000+) and overwhelmingly high reviews: Relevanssi. Still, I had a number of questions. The first question was whether the plugin retained any search data from our site — I wanted to make sure that the privacy of our visitors is maintained, and even harvesting anonymous search data would not be acceptable to Backblaze. I wrote to the developer and was pleased by the responsiveness from Relevanssi’s creator, Mikko Saari. He explained to me that Relevanssi doesn’t have access to any of the search data from the sites using his plugin. Receiving a quick response from a developer is always a good sign. Other signs of a good WordPress plugin are recent updates and an active support forum.
Our solution: Relevanssi for Site Search
The WordPress plugin Relevanssi met all of our criteria, so we installed the plugin and switched to using it for site search in September.
In addition to solving the problems listed above, our search results are now displayed based on relevance instead of date, which is the default behavior of WordPress search. That capability is very useful on our blog where a lot of the content from years ago is still valuable — often called evergreen content. The new site search also enables visitors to search using the boolean expressions AND and OR. For example, a visitor can search for “seagate AND drive,” and see results that only include both words. Alternatively, a visitor can search for “seagate OR drive” and see results that include either word.
Search results showing total number of results, hits and their location, and highlighted search terms in context
Visitors can put search terms in quotation marks to search for an entire phrase. For example, a visitor can search for “2016 drive stats” and see results that include only that exact phrase. In addition, the site search results come with a summary, showing where the results were found (title, post, or comments). Search terms are highlighted in yellow in the content, showing exactly where the search result was found.
Here’s an example of a popular post that shows up in searches. Hard Drive Stats for Q1 2017 was published on May 9, 2017. Since September 4, it has shown up over 150 times in site searches and in the last 90 days in has been viewed over 53,000 times on our blog.
Since initiating the new search on our blog on September 4, there have been almost 23,000 site searches conducted, so we know you are using it. We’ve implemented pagination for the blog feed and search results so you know how many pages of results there are and made it easier to navigate to them.
Now that we have this site search data, you likely are wondering which are the most popular search terms on our blog. Here are some of the top searches:
Please tell us how you use site search and whether there are any other capabilities you’d like to see that would make it easier to find content on our blog.
There’s a newly discovered bug in Internet Explorer that allows any currently visited website to learn the contents of the address bar when the user hits enter. This feels important; the site I am at now has no business knowing where I go next.
I recently wrote about the new ability to disable the Touch ID login on iPhones. This is important because of a weirdness in current US law that protects people’s passcodes from forced disclosure in ways it does not protect actions: being forced to place a thumb on a fingerprint reader.
There’s another, more significant, change: iOS now requires a passcode before the phone will establish trust with another device.
In the current system, when you connect your phone to a computer, you’re prompted with the question “Trust this computer?” and you can click yes or no. Now you have to enter in your passcode again. That means if the police have an unlocked phone, they can scroll through the phone looking for things but they can’t download all of the contents onto a another computer without also knowing the passcode.
This might be particularly consequential during border searches. The “border search” exception, which allows Customs and Border Protection to search anything going into the country, is a contentious issue when applied electronics. It is somewhat (but not completely) settled law, but that the U.S. government can, without any cause at all (not even “reasonable articulable suspicion”, let alone “probable cause”), copy all the contents of my devices when I reenter the country sows deep discomfort in myself and many others. The only legal limitation appears to be a promise not to use this information to connect to remote services. The new iOS feature means that a Customs office can browse through a device — a time limited exercise — but not download the full contents.
For teaching hacking/cybersecurity, I thought I’d create of the most obvious hacks of all time. Not the best hacks, the most sophisticated hacks, or the hacks with the biggest impact, but the most obvious hacks — ones that even the least knowledgeable among us should be able to understand. Below I propose some hacks that fit this bill, though in no particular order.
The reason I’m writing this is that my niece wants me to teach her some hacking. I thought I’d start with the obvious stuff first.
Shared Passwords
If you use the same password for every website, and one of those websites gets hacked, then the hacker has your password for all your websites. The reason your Facebook account got hacked wasn’t because of anything Facebook did, but because you used the same email-address and password when creating an account on “beagleforums.com”, which got hacked last year.
I’ve heard people say “I’m sure, because I choose a complex password and use it everywhere”. No, this is the very worst thing you can do. Sure, you can the use the same password on all sites you don’t care much about, but for Facebook, your email account, and your bank, you should have a unique password, so that when other sites get hacked, your important sites are secure.
And yes, it’s okay to write down your passwords on paper.
My accountant emails PDF statements encrypted with the last 4 digits of my Social Security Number. This is not encryption — a 4 digit number has only 10,000 combinations, and a hacker can guess all of them in seconds.
PIN numbers for ATM cards work because ATM machines are online, and the machine can reject your card after four guesses. PIN numbers don’t work for documents, because they are offline — the hacker has a copy of the document on their own machine, disconnected from the Internet, and can continue making bad guesses with no restrictions.
Passwords protecting documents must be long enough that even trillion upon trillion guesses are insufficient to guess.
The lazy way of combining websites with databases is to combine user input with an SQL statement. This combines code with data, so the obvious consequence is that hackers can craft data to mess with the code.
No, this isn’t obvious to the general public, but it should be obvious to programmers. The moment you write code that adds unfiltered user-input to an SQL statement, the consequence should be obvious. Yet, “SQL injection” has remained one of the most effective hacks for the last 15 years because somehow programmers don’t understand the consequence.
CGI shell injection is a similar issue. Back in early days, when “CGI scripts” were a thing, it was really important, but these days, not so much, so I just included it with SQL. The consequence of executing shell code should’ve been obvious, but weirdly, it wasn’t. The IT guy at the company I worked for back in the late 1990s came to me and asked “this guy says we have a vulnerability, is he full of shit?”, and I had to answer “no, he’s right — obviously so”.
XSS (“Cross Site Scripting”) [*] is another injection issue, but this time at somebody’s web browser rather than a server. It works because websites will echo back what is sent to them. For example, if you search for Cross Site Scripting with the URL https://www.google.com/search?q=cross+site+scripting, then you’ll get a page back from the server that contains that string. If the string is JavaScript code rather than text, then some servers (thought not Google) send back the code in the page in a way that it’ll be executed. This is most often used to hack somebody’s account: you send them an email or tweet a link, and when they click on it, the JavaScript gives control of the account to the hacker.
Cross site injection issues like this should probably be their own category, but I’m including it here for now.
In the C programming language, programmers first create a buffer, then read input into it. If input is long than the buffer, then it overflows. The extra bytes overwrite other parts of the program, letting the hacker run code.
Again, it’s not a thing the general public is expected to know about, but is instead something C programmers should be expected to understand. They should know that it’s up to them to check the length and stop reading input before it overflows the buffer, that there’s no language feature that takes care of this for them.
We are three decades after the first major buffer overflow exploits, so there is no excuse for C programmers not to understand this issue.
What makes particular obvious is the way they are wrapped in exploits, like in Metasploit. While the bug itself is obvious that it’s a bug, actually exploiting it can take some very non-obvious skill. However, once that exploit is written, any trained monkey can press a button and run the exploit. That’s where we get the insult “script kiddie” from — referring to wannabe-hackers who never learn enough to write their own exploits, but who spend a lot of time running the exploit scripts written by better hackers than they.
The first popular email server in the 1980s was called “SendMail”. It had a feature whereby if you send a “DEBUG” command to it, it would execute any code following the command. The consequence of this was obvious — hackers could (and did) upload code to take control of the server. This was used in the Morris Worm of 1988. Most Internet machines of the day ran SendMail, so the worm spread fast infecting most machines.
This bug was mostly ignored at the time. It was thought of as a theoretical problem, that might only rarely be used to hack a system. Part of the motivation of the Morris Worm was to demonstrate that such problems was to demonstrate the consequences — consequences that should’ve been obvious but somehow were rejected by everyone.
I’m conflicted whether I should add this or not, because here’s the deal: you are supposed to click on attachments and links within emails. That’s what they are there for. The difference between good and bad attachments/links is not obvious. Indeed, easy-to-use email systems makes detecting the difference harder.
On the other hand, the consequences of bad attachments/links is obvious. That worms like ILOVEYOU spread so easily is because people trusted attachments coming from their friends, and ran them.
We have no solution to the problem of bad email attachments and links. Viruses and phishing are pervasive problems. Yet, we know why they exist.
Default and backdoor passwords
The Mirai botnet was caused by surveillance-cameras having default and backdoor passwords, and being exposed to the Internet without a firewall. The consequence should be obvious: people will discover the passwords and use them to take control of the bots.
Surveillance-cameras have the problem that they are usually exposed to the public, and can’t be reached without a ladder — often a really tall ladder. Therefore, you don’t want a button consumers can press to reset to factory defaults. You want a remote way to reset them. Therefore, they put backdoor passwords to do the reset. Such passwords are easy for hackers to reverse-engineer, and hence, take control of millions of cameras across the Internet.
The same reasoning applies to “default” passwords. Many users will not change the defaults, leaving a ton of devices hackers can hack.
Masscan and background radiation of the Internet
I’ve written a tool that can easily scan the entire Internet in a short period of time. It surprises people that this possible, but it obvious from the numbers. Internet addresses are only 32-bits long, or roughly 4 billion combinations. A fast Internet link can easily handle 1 million packets-per-second, so the entire Internet can be scanned in 4000 seconds, little more than an hour. It’s basic math.
Because it’s so easy, many people do it. If you monitor your Internet link, you’ll see a steady trickle of packets coming in from all over the Internet, especially Russia and China, from hackers scanning the Internet for things they can hack.
People’s reaction to this scanning is weirdly emotional, taking is personally, such as:
Why are they hacking me? What did I do to them?
Great! They are hacking me! That must mean I’m important!
Grrr! How dare they?! How can I hack them back for some retribution!?
I find this odd, because obviously such scanning isn’t personal, the hackers have no idea who you are.
If you connect to the Starbucks WiFi, a hacker nearby can easily eavesdrop on your network traffic, because it’s not encrypted. Windows even warns you about this, in case you weren’t sure.
At DefCon, they have a “Wall of Sheep”, where they show passwords from people who logged onto stuff using the insecure “DefCon-Open” network. Calling them “sheep” for not grasping this basic fact that unencrypted traffic is unencrypted.
To be fair, it’s actually non-obvious to many people. Even if the WiFi itself is not encrypted, SSL traffic is. They expect their services to be encrypted, without them having to worry about it. And in fact, most are, especially Google, Facebook, Twitter, Apple, and other major services that won’t allow you to log in anymore without encryption.
But many services (especially old ones) may not be encrypted. Unless users check and verify them carefully, they’ll happily expose passwords.
What’s interesting about this was 10 years ago, when most services which only used SSL to encrypt the passwords, but then used unencrypted connections after that, using “cookies”. This allowed the cookies to be sniffed and stolen, allowing other people to share the login session. I used this on stage at BlackHat to connect to somebody’s GMail session. Google, and other major websites, fixed this soon after. But it should never have been a problem — because the sidejacking of cookies should have been obvious.
Tools: Wireshark, dsniff
Stuxnet LNK vulnerability
Again, this issue isn’t obvious to the public, but it should’ve been obvious to anybody who knew how Windows works.
When Windows loads a .dll, it first calls the function DllMain(). A Windows link file (.lnk) can load icons/graphics from the resources in a .dll file. It does this by loading the .dll file, thus calling DllMain. Thus, a hacker could put on a USB drive a .lnk file pointing to a .dll file, and thus, cause arbitrary code execution as soon as a user inserted a drive.
I say this is obvious because I did this, created .lnks that pointed to .dlls, but without hostile DllMain code. The consequence should’ve been obvious to me, but I totally missed the connection. We all missed the connection, for decades.
Social Engineering and Tech Support [* * *]
After posting this, many people have pointed out “social engineering”, especially of “tech support”. This probably should be up near #1 in terms of obviousness.
The classic example of social engineering is when you call tech support and tell them you’ve lost your password, and they reset it for you with minimum of questions proving who you are. For example, you set the volume on your computer really loud and play the sound of a crying baby in the background and appear to be a bit frazzled and incoherent, which explains why you aren’t answering the questions they are asking. They, understanding your predicament as a new parent, will go the extra mile in helping you, resetting “your” password.
One of the interesting consequences is how it affects domain names (DNS). It’s quite easy in many cases to call up the registrar and convince them to transfer a domain name. This has been used in lots of hacks. It’s really hard to defend against. If a registrar charges only $9/year for a domain name, then it really can’t afford to provide very good tech support — or very secure tech support — to prevent this sort of hack.
Social engineering is such a huge problem, and obvious problem, that it’s outside the scope of this document. Just google it to find example after example.
A related issue that perhaps deserves it’s own section is OSINT [*], or “open-source intelligence”, where you gather public information about a target. For example, on the day the bank manager is out on vacation (which you got from their Facebook post) you show up and claim to be a bank auditor, and are shown into their office where you grab their backup tapes. (We’ve actually done this).
Telephones historically used what we call “in-band signaling”. That’s why when you dial on an old phone, it makes sounds — those sounds are sent no differently than the way your voice is sent. Thus, it was possible to make tone generators to do things other than simply dial calls. Early hackers (in the 1970s) would make tone-generators called “blue-boxes” and “black-boxes” to make free long distance calls, for example.
These days, “signaling” and “voice” are digitized, then sent as separate channels or “bands”. This is call “out-of-band signaling”. You can’t trick the phone system by generating tones. When your iPhone makes sounds when you dial, it’s entirely for you benefit and has nothing to do with how it signals the cell tower to make a call.
Early hackers, like the founders of Apple, are famous for having started their careers making such “boxes” for tricking the phone system. The problem was obvious back in the day, which is why as the phone system moves from analog to digital, the problem was fixed.
A simple trick is to put a virus on a USB flash drive, and drop it in a parking lot. Somebody is bound to notice it, stick it in their computer, and open the file.
This can be extended with tricks. For example, you can put a file labeled “third-quarter-salaries.xlsx” on the drive that required macros to be run in order to open. It’s irresistible to other employees who want to know what their peers are being paid, so they’ll bypass any warning prompts in order to see the data.
Another example is to go online and get custom USB sticks made printed with the logo of the target company, making them seem more trustworthy.
We also did a trick of taking an Adobe Flash game “Punch the Monkey” and replaced the monkey with a logo of a competitor of our target. They now only played the game (infecting themselves with our virus), but gave to others inside the company to play, infecting others, including the CEO.
Search engines like Google will index your website — your entire website. Frequently companies put things on their website without much protection because they are nearly impossible for users to find. But Google finds them, then indexes them, causing them to pop up with innocent searches.
There are books written on “Google hacking” explaining what search terms to look for, like “not for public release”, in order to find such documents.
At the top of every browser is what’s called the “URL”. You can change it. Thus, if you see a URL that looks like this:
http://www.example.com/documents?id=138493
Then you can edit it to see the next document on the server:
http://www.example.com/documents?id=138494
The owner of the website may think they are secure, because nothing points to this document, so the Google search won’t find it. But that doesn’t stop a user from manually editing the URL.
An example of this is a big Fortune 500 company that posts the quarterly results to the website an hour before the official announcement. Simply editing the URL from previous financial announcements allows hackers to find the document, then buy/sell the stock as appropriate in order to make a lot of money.
Another example is the classic case of Andrew “Weev” Auernheimer who did this trick in order to download the account email addresses of early owners of the iPad, including movie stars and members of the Obama administration. It’s an interesting legal case because on one hand, techies consider this so obvious as to not be “hacking”. On the other hand, non-techies, especially judges and prosecutors, believe this to be obviously “hacking”.
For decades now, online gamers have figured out an easy way to win: just flood the opponent with Internet traffic, slowing their network connection. This is called a DoS, which stands for “Denial of Service”. DoSing game competitors is often a teenager’s first foray into hacking.
A variant of this is when you hack a bunch of other machines on the Internet, then command them to flood your target. (The hacked machines are often called a “botnet”, a network of robot computers). This is called DDoS, or “Distributed DoS”. At this point, it gets quite serious, as instead of competitive gamers hackers can take down entire businesses. Extortion scams, DDoSing websites then demanding payment to stop, is a common way hackers earn money.
Another form of DDoS is “amplification”. Sometimes when you send a packet to a machine on the Internet it’ll respond with a much larger response, either a very large packet or many packets. The hacker can then send a packet to many of these sites, “spoofing” or forging the IP address of the victim. This causes all those sites to then flood the victim with traffic. Thus, with a small amount of outbound traffic, the hacker can flood the inbound traffic of the victim.
This is one of those things that has worked for 20 years, because it’s so obvious teenagers can do it, yet there is no obvious solution. President Trump’s executive order of cyberspace specifically demanded that his government come up with a report on how to address this, but it’s unlikely that they’ll come up with any useful strategy.
If you launch your startup and no one knows, did you actually launch? As mentioned in my last post, our initial launch target was to get a 1,000 people to use our service. But how do you get even 1,000 people to sign up for your service when no one knows who you are?
There are a variety of methods to attract your first 1,000 customers, but launching with the press is my favorite. I’ll explain why and how to do it below.
Paths to Attract Your First 1,000 Customers
Social following: If you have a massive social following, those people are a reasonable target for what you’re offering. In particular if your relationship with them is one where they would buy something you recommend, this can be one of the easiest ways to get your initial customers. However, building this type of following is non-trivial and often is done over several years.
Press not only provides awareness and customers, but credibility and SEO benefits as well
Paid advertising: The advantage of paid ads is you have control over when they are presented and what they say. The primary disadvantage is they tend to be expensive, especially before you have your positioning, messaging, and funnel nailed.
Viral: There are certainly examples of companies that launched with a hugely viral video, blog post, or promotion. While fantastic if it happens, even if you do everything right, the likelihood of massive virality is miniscule and the conversion rate is often low.
Press: As I said, this is my favorite. You don’t need to pay a PR agency and can go from nothing to launched in a couple weeks. Press not only provides awareness and customers, but credibility and SEO benefits as well.
How to Pitch the Press
It’s easy: Have a compelling story, find the right journalists, make their life easy, pitch and follow-up. Of course, each one of those has some nuance, so let’s dig in.
Have a compelling story
When you’ve been working for months on your startup, it’s easy to get lost in the minutiae when talking to others. Stories that a journalist will write about need to be something their readers will care about. Knowing what story to tell and how to tell it is part science and part art. Here’s how you can get there:
The basics of your story
Ask yourself the following questions, and write down the answers:
What are we doing? What product service are we offering?
Why? What problem are we solving?
What is interesting or unique? Either about what we’re doing, how we’re doing it, or for who we’re doing it.
“But my story isn’t that exciting”
Neither was announcing a data backup company, believe me. Look for angles that make it compelling. Here are some:
Did someone on your team do something major before? (build a successful company/product, create some innovation, market something we all know, etc.)
Do you have an interesting investor or board member?
Is there a personal story that drove you to start this company?
Are you starting it in a unique place?
Did you come upon the idea in a unique way?
Can you share something people want to know that’s not usually shared?
Are you partnered with a well-known company?
…is there something interesting/entertaining/odd/shocking/touching/etc.?
It doesn’t get much less exciting than, “We’re launching a company that will backup your data.” But there were still a lot of compelling stories:
Founded by serial entrepreneurs, bootstrapped a capital-intensive company, committed to each other for a year without salary.
Challenging the way that every backup company before was set up by not asking customers to pick and choose files to backup.
Designing our own storage system.
Etc. etc.
For the initial launch, we focused on “unlimited for $5/month” and statistics from a survey we ran with Harris Interactive that said that 94% of people did not regularly backup their data.
It’s an old adage that “Everyone has a story.” Regardless of what you’re doing, there is always something interesting to share. Dig for that.
The headline
Once you’ve captured what you think the interesting story is, you’ve got to boil it down. Yes, you need the elevator pitch, but this is shorter…it’s the headline pitch. Write the headline that you would love to see a journalist write.
Regardless of what you’re doing, there is always something interesting to share. Dig for that.
Now comes the part where you have to be really honest with yourself: if you weren’t involved, would you care?
The “Techmeme Test”
One way I try to ground myself is what I call the “Techmeme Test”. Techmeme lists the top tech articles. Read the headlines. Imagine the headline you wrote in the middle of the page. If you weren’t involved, would you click on it? Is it more or less compelling than the others. Much of tech news is dominated by the largest companies. If you want to get written about, your story should be more compelling. If not, go back above and explore your story some more.
Embargoes, exclusives and calls-to-action
Journalists write about news. Thus, if you’ve already announced something and are then pitching a journalist to cover it, unless you’re giving her something significant that hasn’t been said, it’s no longer news. As a result, there are ‘embargoes’ and ‘exclusives’.
Embargoes
: An embargo simply means that you are sharing news with a journalist that they need to keep private until a certain date and time.
If you’re Apple, this may be a formal and legal document. In our case, it’s as simple as saying, “Please keep embargoed until 4/13/17 at 8am California time.” in the pitch. Some sites explicitly will not keep embargoes; for example The Information will only break news. If you want to launch something later, do not share information with journalists at these sites. If you are only working with a single journalist for a story, and your announcement time is flexible, you can jointly work out a date and time to announce. However, if you have a fixed launch time or are working with a few journalists, embargoes are key.
Exclusives: An exclusive means you’re giving something specifically to that journalist. Most journalists love an exclusive as it means readers have to come to them for the story. One option is to give a journalist an exclusive on the entire story. If it is your dream journalist, this may make sense. Another option, however, is to give exclusivity on certain pieces. For example, for your launch you could give an exclusive on funding detail & a VC interview to a more finance-focused journalist and insight into the tech & a CTO interview to a more tech-focused journalist.
Call-to-Action: With our launch we gave TechCrunch, Ars Technica, and SimplyHelp URLs that gave the first few hundred of their readers access to the private beta. Once those first few hundred users from each site downloaded, the beta would be turned off.
Thus, we used a combination of embargoes, exclusives, and a call-to-action during our initial launch to be able to brief journalists on the news before it went live, give them something they could announce as exclusive, and provide a time-sensitive call-to-action to the readers so that they would actually sign up and not just read and go away.
How to Find the Most Authoritative Sites / Authors
“If a press release is published and no one sees it, was it published?” Perhaps the time existed when sending a press release out over the wire meant journalists would read it and write about it. That time has long been forgotten. Over 1,000 unread press releases are published every day. If you want your compelling story to be covered, you need to find the handful of journalists that will care.
Determine the publications
Find the publications that cover the type of story you want to share. If you’re in tech, Techmeme has a leaderboard of publications ranked by leadership and presence. This list will tell you which publications are likely to have influence. Visit the sites and see if your type of story appears on their site. But, once you’ve determined the publication do NOT send a pitch their “[email protected]” or “[email protected]” email addresses. In all the times I’ve done that, I have never had a single response. Those email addresses are likely on every PR, press release, and spam list and unlikely to get read. Instead…
Determine the journalists
Once you’ve determined which publications cover your area, check which journalists are doing the writing. Skim the articles and search for keywords and competitor names.
Over 1,000 unread press releases are published every day.
Identify one primary journalist at the publication that you would love to have cover you, and secondary ones if there are a few good options. If you’re not sure which one should be the primary, consider a few tests:
Do they truly seem to care about the space?
Do they write interesting/compelling stories that ‘get it’?
Do they appear on the Techmeme leaderboard?
Do their articles get liked/tweeted/shared and commented on?
Do they have a significant social presence?
Leveraging Google
In addition to Techmeme or if you aren’t in the tech space Google will become a must have tool for finding the right journalists to pitch. Below the search box you will find a number of tabs. Click on Tools and change the Any time setting to Custom range. I like to use the past six months to ensure I find authors that are actively writing about my market. I start with the All results. This will return a combination of product sites and articles depending upon your search term.
Scan for articles and click on the link to see if the article is on topic. If it is find the author’s name. Often if you click on the author name it will take you to a bio page that includes their Twitter, LinkedIn, and/or Facebook profile. Many times you will find their email address in the bio. You should collect all the information and add it to your outreach spreadsheet. Click here to get a copy. It’s always a good idea to comment on the article to start building awareness of your name. Another good idea is to Tweet or Like the article.
Next click on the News tab and set the same search parameters. You will get a different set of results. Repeat the same steps. Between the two searches you will have a list of authors that actively write for the websites that Google considers the most authoritative on your market.
How to find the most socially shared authors
Your next step is to find the writers whose articles get shared the most socially. Go to Buzzsumo and click on the Most Shared tab. Enter search terms for your market as well as competitor names. Again I like to use the past 6 months as the time range. You will get a list of articles that have been shared the most across Facebook, LinkedIn, Twitter, Pinterest, and Google+. In addition to finding the most shared articles and their authors you can also see some of the Twitter users that shared the article. Many of those Twitter users are big influencers in your market so it’s smart to start following and interacting with them as well as the authors.
How to Find Author Email Addresses
Some journalists publish their contact info right on the stories. For those that don’t, a bit of googling will often get you the email. For example, TechCrunch wrote a story a few years ago where they published all of their email addresses, which was in response to this new service that charges a small fee to provide journalist email addresses. Sometimes visiting their twitter pages will link to a personal site, upon which they will share an email address.
Of course all is not lost if you don’t find an email in the bio. There are two good services for finding emails, https://app.voilanorbert.com/ and https://hunter.io/. For Voila Norbert enter the author name and the website you found their article on. The majority of the time you search for an author on a major publication Norbert will return an accurate email address. If it doesn’t try Hunter.io.
On Hunter.io enter the domain name and click on Personal Only. Then scroll through the results to find the author’s email. I’ve found Norbert to be more accurate overall but between the two you will find most major author’s email addresses.
Email, by the way, is not necessarily the best way to engage a journalist. Many are avid Twitter users. Follow them and engage – that means read/retweet/favorite their tweets; reply to their questions, and generally be helpful BEFORE you pitch them. Later when you email them, you won’t be just a random email address.
Don’t spam
Now that you have all these email addresses (possibly thousands if you purchased a list) – do NOT spam. It is incredibly tempting to think “I could try to figure out which of these folks would be interested, but if I just email all of them, I’ll save myself time and be more likely to get some of them to respond.” Don’t do it.
Follow them and engage – that means read/retweet/favorite their tweets; reply to their questions, and generally be helpful BEFORE you pitch them.
First, you’ll want to tailor your pitch to the individual. Second, it’s a small world and you’ll be known as someone who spams – reputation is golden. Also, don’t call journalists. Unless you know them or they’ve said they’re open to calls, you’re most likely to just annoy them.
Build a relationship
Play the long game. You may be focusing just on the launch and hoping to get this one story covered, but if you don’t quickly flame-out, you will have many more opportunities to tell interesting stories that you’ll want the press to cover. Be honest and don’t exaggerate.
When you have 500 users it’s tempting to say, “We’ve got thousands!” Don’t. The good journalists will see through it and it’ll likely come back to bite you later. If you don’t know something, say “I don’t know but let me find out for you.” Most journalists want to write interesting stories that their readers will appreciate. Help them do that. Build deeper relationships with 5 – 10 journalists, rather than spamming thousands.
Stay organized
It doesn’t need to be complicated, but keep a spreadsheet that includes the name, publication, and contact info of the journalists you care about. Then, use it to keep track of who you’ve pitched, who’s responded, whether you’ve sent them the materials they need, and whether they intend to write/have written.
Make their life easy
Journalists have a million PR people emailing them, are actively engaging with readers on Twitter and in the comments, are tracking their metrics, are working their sources…and all the while needing to publish new articles. They’re busy. Make their life easy and they’re more likely to engage with yours.
Get to know them
Before sending them a pitch, know what they’ve written in the space. If you tell them how your story relates to ones they’ve written, it’ll help them put the story in context, and enable them to possibly link back to a story they wrote before.
Prepare your materials
Journalists will need somewhere to get more info (prepare a fact sheet), a URL to link to, and at least one image (ideally a few to choose from.) A fact sheet gives bite-sized snippets of information they may need about your startup or product: what it is, how big the market is, what’s the pricing, who’s on the team, etc. The URL is where their reader will get the product or more information from you. It doesn’t have to be live when you’re pitching, but you should be able to tell what the URL will be. The images are ones that they could embed in the article: a product screenshot, a CEO or team photo, an infographic. Scan the types of images included in their articles. Don’t send any of these in your pitch, but have them ready. Studies, stats, customer/partner/investor quotes are also good to have.
Pitch
A pitch has to be short and compelling.
Subject Line
Think back to the headline you want. Is it really compelling? Can you shorten it to a subject line? Include what’s happening and when. For Mike Arrington at Techcrunch, our first subject line was “Startup doing an ‘online time machine’”. Later I would include, “launching June 6th”.
For John Timmer at ArsTechnica, it was “Demographics data re: your 4/17 article”. Why? Because he wrote an article titled “WiFi popular with the young people; backups, not so much”. Since we had run a demographics survey on backups, I figured as a science editor he’d be interested in this additional data.
Body
A few key things about the body of the email. It should be short and to the point, no more than a few sentences. Here was my actual, original pitch email to John:
Hey John,
We’re launching Backblaze next week which provides a Time Machine-online type of service. As part of doing some research I read your article about backups not being popular with young people and that you had wished Accenture would have given you demographics. In prep for our invite-only launch I sponsored Harris Interactive to get demographic data on who’s doing backups and if all goes well, I should have that data on Friday.
Next week starts Backup Awareness Month (and yes, probably Clean Your House Month and Brush Your Teeth Month)…but nonetheless…good time to remind readers to backup with a bit of data?
Would you be interested in seeing/talking about the data when I get it?
Would you be interested in getting a sneak peak at Backblaze? (I could give you some invite codes for your readers as well.)
Gleb Budman
CEO and Co-Founder
Backblaze, Inc.
Automatic, Secure, High-Performance Online Backup
Cell: XXX-XXX-XXXX
The Good: It said what we’re doing, why this relates to him and his readers, provides him information he had asked for in an article, ties to something timely, is clearly tailored for him, is pitched by the CEO and Co-Founder, and provides my cell.
The Bad: It’s too long.
I got better later. Here’s an example:
Subject: Does temperature affect hard drive life?
Hi Peter, there has been much debate about whether temperature affects how long a hard drive lasts. Following up on the Backblaze analyses of how long do drives last & which drives last the longest (that you wrote about) we’ve now analyzed the impact of heat on the nearly 40,000 hard drives we have and found that…
We’re going to publish the results this Monday, 5/12 at 5am California-time. Want a sneak peak of the analysis?
Timing
A common question is “When should I launch?” What day, what time? I prefer to launch on Tuesday at 8am California-time. Launching earlier in the week gives breathing room for the news to live longer. While your launch may be a single article posted and that’s that, if it ends up a larger success, earlier in the week allows other journalists (including ones who are in other countries) to build on the story. Monday announcements can be tough because the journalists generally need to have their stories finished by Friday, and while ideally everything is buttoned up beforehand, startups sometimes use the weekend as overflow before a launch.
The 8am California-time is because it allows articles to be published at the beginning of the day West Coast and around lunch-time East Coast. Later and you risk it being past publishing time for the day. We used to launch at 5am in order to be morning for the East Coast, but it did not seem to have a significant benefit in coverage or impact, but did mean that the entire internal team needed to be up at 3am or 4am. Sometimes that’s critical, but I prefer to not burn the team out when it’s not.
Finally, try to stay clear of holidays, major announcements and large conferences. If Apple is coming out with their next iPhone, many of the tech journalists will be busy at least a couple days prior and possibly a week after. Not always obvious, but if you can, find times that are otherwise going to be slow for news.
Follow-up
There is a fine line between persistence and annoyance. I once had a journalist write me after we had an announcement that was covered by the press, “Why didn’t you let me know?! I would have written about that!” I had sent him three emails about the upcoming announcement to which he never responded.
My general rule is 3 emails.
Ugh. However, my takeaway from this isn’t that I should send 10 emails to every journalist. It’s that sometimes these things happen.
My general rule is 3 emails. If I’ve identified a specific journalist that I think would be interested and have a pitch crafted for her, I’ll send her the email ideally 2 weeks prior to the announcement. I’ll follow-up a week later, and one more time 2 days prior. If she ever says, “I’m not interested in this topic,” I note it and don’t email her on that topic again.
If a journalist wrote, I read the article and engage in the comments (or someone on our team, such as our social guy, @YevP does). We’ll often promote the story through our social channels and email our employees who may choose to share the story as well. This helps us, but also helps the journalist get their story broader reach. Again, the goal is to build a relationship with the journalists your space. If there’s something relevant to your customers that the journalist wrote, you’re providing a service to your customers AND helping the journalist get the word out about the article.
At times the stories also end up shared on sites such as Hacker News, Reddit, Slashdot, or become active conversations on Twitter. Again, we try to engage there and respond to questions (when we do, we are always clear that we’re from Backblaze.)
And finally, I’ll often send a short thank you to the journalist.
Getting Your First 1,000 Customers With Press
As I mentioned at the beginning, there is more than one way to get your first 1,000 customers. My favorite is working with the press to share your story. If you figure out your compelling story, find the right journalists, make their life easy, pitch and follow-up, you stand a high likelyhood of getting coverage and customers. Better yet, that coverage will provide credibility for your company, and if done right, will establish you as a resource for the press for the future.
Like any muscle, this process takes working out. The first time may feel a bit daunting, but just take the steps one at a time. As you do this a few times, the process will be easier and you’ll know who to reach out and quickly determine what stories will be compelling.
Amazon Lightsail lets you launch Virtual Private Servers on AWS with just a few clicks. With prices starting at $5 per month, Lightsail takes care of the heavy lifting and gives you a simple way to build and host applications. As I showed you in my re:Invent post (Amazon Lightsail – The Power of AWS, the Simplicity of a VPS), you can choose a configuration from a menu and launch a virtual machine preconfigured with SSD-based storage, DNS management, and a static IP address.
Since we launched in November, many customers have used Lightsail to launch Virtual Private Servers. For example, Monash University is using Amazon Lightsail to rapidly re-platform a number of CMS services in a simple and cost-effective manner. They have already migrated 50 workloads and are now thinking of creating an internal CMS service based on Lightsail to allow staff and students to create their own CMS instances in a self-service manner.
Today we are expanding Lightsail into nine more AWS Regions and launching a new, global console.
New Regions At re:Invent we made Lightsail available in the US East (Northern Virginia) Region. Earlier this month we added support for several additional Regions in the US and Europe. Today we are launching Lightsail in four of our Asia Pacific Regions, bringing the total to ten. Here’s the full list:
US East (Northern Virginia)
US West (Oregon)
US East (Ohio)
EU (London)
EU (Frankfurt)
EU (Ireland)
Asia Pacific (Mumbai)
Asia Pacific (Tokyo)
Asia Pacific (Singapore)
Asia Pacific (Sydney)
Global Console The updated Lightsail console makes it easy for you to create and manage resources in one or more Regions. I simply choose the desired Region when I create a new instance:
I can see all of my instances and static IP addresses on the same page, no matter what Region they are in:
And I can perform searches that span all of my resources and Regions. All of my LAMP stacks:
Or all of my resources in the EU (Ireland) Region:
I can perform a similar search on the Snapshots tab:
A new DNS zones tab lets me see my existing zones and create new ones:
Creation of SSH keypairs is now specific to a Region:
I can manage my key pairs on a Region-by-Region basis:
Static IP addresses are also specific to a particular Region:
Available Now You can use the new Lightsail console and create resources in all ten Regions today!
Inmates at a medium-security Ohio prison secretly assembled two functioning computers, hid them in the ceiling, and connected them to the Marion Correctional Institution’s network. The hard drives were loaded with pornography, a Windows proxy server, VPN, VOIP and anti-virus software, the Tor browser, password hacking and e-mail spamming tools, and the open source packet analyzer Wireshark.
Clearly there’s a lot about prison security, or the lack thereof, that I don’t know. This article reveals some of it.
The collective thoughts of the interwebz
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.