Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/02/living_in_a_sma.html
In “The House that Spied on Me,” Kashmir Hill outfits her home to be as “smart” as possible and writes about the results.
Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/02/living_in_a_sma.html
In “The House that Spied on Me,” Kashmir Hill outfits her home to be as “smart” as possible and writes about the results.
Post Syndicated from Robert Graham original http://blog.erratasec.com/2018/02/blame-privacy-activists-for-memo.html
Former FBI agent Asha Rangappa @AshaRangappa_ has a smart post debunking the Nunes Memo, then takes it all back again with an op-ed on the NYTimes blaming us privacy activists. She presents an obviously false narrative that the FBI and FISA courts are above suspicion.
I know from first hand experience the FBI is corrupt. In 2007, they threatened me, trying to get me to cancel a talk that revealed security vulnerabilities in a large corporation’s product. Such abuses occur because there is no transparency and oversight. FBI agents write down our conversation in their little notebooks instead of recording it, so that they can control the narrative of what happened, presenting their version of the converstion (leaving out the threats). In this day and age of recording devices, this is indefensible.
She writes “I know firsthand that it’s difficult to get a FISA warrant“. Yes, the process was difficult for her, an underling, to get a FISA warrant. The process is different when a leader tries to do the same thing.
I know this first hand having casually worked as an outsider with intelligence agencies. I saw two processes in place: one for the flunkies, and one for those above the system. The flunkies constantly complained about how there is too many process in place oppressing them, preventing them from getting their jobs done. The leaders understood the system and how to sidestep those processes.
That’s not to say the Nunes Memo has merit, but it does point out that privacy advocates have a point in wanting more oversight and transparency in such surveillance of American citizens.
Blaming us privacy advocates isn’t the way to go. It’s not going to succeed in tarnishing us, but will push us more into Trump’s camp, causing us to reiterate that we believe the FBI and FISA are corrupt.
Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/01/after_section_7.html
For over a decade, civil libertarians have been fighting government mass surveillance of innocent Americans over the Internet. We’ve just lost an important battle. On January 18, President Trump signed the renewal of Section 702, domestic mass surveillance became effectively a permanent part of US law.
Section 702 was initially passed in 2008, as an amendment to the Foreign Intelligence Surveillance Act of 1978. As the title of that law says, it was billed as a way for the NSA to spy on non-Americans located outside the United States. It was supposed to be an efficiency and cost-saving measure: the NSA was already permitted to tap communications cables located outside the country, and it was already permitted to tap communications cables from one foreign country to another that passed through the United States. Section 702 allowed it to tap those cables from inside the United States, where it was easier. It also allowed the NSA to request surveillance data directly from Internet companies under a program called PRISM.
The problem is that this authority also gave the NSA the ability to collect foreign communications and data in a way that inherently and intentionally also swept up Americans’ communications as well, without a warrant. Other law enforcement agencies are allowed to ask the NSA to search those communications, give their contents to the FBI and other agencies and then lie about their origins in court.
In 1978, after Watergate had revealed the Nixon administration’s abuses of power, we erected a wall between intelligence and law enforcement that prevented precisely this kind of sharing of surveillance data under any authority less restrictive than the Fourth Amendment. Weakening that wall is incredibly dangerous, and the NSA should never have been given this authority in the first place.
Arguably, it never was. The NSA had been doing this type of surveillance illegally for years, something that was first made public in 2006. Section 702 was secretly used as a way to paper over that illegal collection, but nothing in the text of the later amendment gives the NSA this authority. We didn’t know that the NSA was using this law as the statutory basis for this surveillance until Edward Snowden showed us in 2013.
Civil libertarians have been battling this law in both Congress and the courts ever since it was proposed, and the NSA’s domestic surveillance activities even longer. What this most recent vote tells me is that we’ve lost that fight.
Section 702 was passed under George W. Bush in 2008, reauthorized under Barack Obama in 2012, and now reauthorized again under Trump. In all three cases, congressional support was bipartisan. It has survived multiple lawsuits by the Electronic Frontier Foundation, the ACLU, and others. It has survived the revelations by Snowden that it was being used far more extensively than Congress or the public believed, and numerous public reports of violations of the law. It has even survived Trump’s belief that he was being personally spied on by the intelligence community, as well as any congressional fears that Trump could abuse the authority in the coming years. And though this extension lasts only six years, it’s inconceivable to me that it will ever be repealed at this point.
So what do we do? If we can’t fight this particular statutory authority, where’s the new front on surveillance? There are, it turns out, reasonable modifications that target surveillance more generally, and not in terms of any particular statutory authority. We need to look at US surveillance law more generally.
First, we need to strengthen the minimization procedures to limit incidental collection. Since the Internet was developed, all the world’s communications travel around in a single global network. It’s impossible to collect only foreign communications, because they’re invariably mixed in with domestic communications. This is called “incidental” collection, but that’s a misleading name. It’s collected knowingly, and searched regularly. The intelligence community needs much stronger restrictions on which American communications channels it can access without a court order, and rules that require they delete the data if they inadvertently collect it. More importantly, “collection” is defined as the point the NSA takes a copy of the communications, and not later when they search their databases.
Second, we need to limit how other law enforcement agencies can use incidentally collected information. Today, those agencies can query a database of incidental collection on Americans. The NSA can legally pass information to those other agencies. This has to stop. Data collected by the NSA under its foreign surveillance authority should not be used as a vehicle for domestic surveillance.
The most recent reauthorization modified this lightly, forcing the FBI to obtain a court order when querying the 702 data for a criminal investigation. There are still exceptions and loopholes, though.
Third, we need to end what’s called “parallel construction.” Today, when a law enforcement agency uses evidence found in this NSA database to arrest someone, it doesn’t have to disclose that fact in court. It can reconstruct the evidence in some other manner once it knows about it, and then pretend it learned of it that way. This right to lie to the judge and the defense is corrosive to liberty, and it must end.
Pressure to reform the NSA will probably first come from Europe. Already, European Union courts have pointed to warrantless NSA surveillance as a reason to keep Europeans’ data out of US hands. Right now, there is a fragile agreement between the EU and the United States – called “Privacy Shield” — that requires Americans to maintain certain safeguards for international data flows. NSA surveillance goes against that, and it’s only a matter of time before EU courts start ruling this way. That’ll have significant effects on both government and corporate surveillance of Europeans and, by extension, the entire world.
Further pressure will come from the increased surveillance coming from the Internet of Things. When your home, car, and body are awash in sensors, privacy from both governments and corporations will become increasingly important. Sooner or later, society will reach a tipping point where it’s all too much. When that happens, we’re going to see significant pushback against surveillance of all kinds. That’s when we’ll get new laws that revise all government authorities in this area: a clean sweep for a new world, one with new norms and new fears.
It’s possible that a federal court will rule on Section 702. Although there have been many lawsuits challenging the legality of what the NSA is doing and the constitutionality of the 702 program, no court has ever ruled on those questions. The Bush and Obama administrations successfully argued that defendants don’t have legal standing to sue. That is, they have no right to sue because they don’t know they’re being targeted. If any of the lawsuits can get past that, things might change dramatically.
Meanwhile, much of this is the responsibility of the tech sector. This problem exists primarily because Internet companies collect and retain so much personal data and allow it to be sent across the network with minimal security. Since the government has abdicated its responsibility to protect our privacy and security, these companies need to step up: Minimize data collection. Don’t save data longer than absolutely necessary. Encrypt what has to be saved. Well-designed Internet services will safeguard users, regardless of government surveillance authority.
For the rest of us concerned about this, it’s important not to give up hope. Everything we do to keep the issue in the public eye – and not just when the authority comes up for reauthorization again in 2024 — hastens the day when we will reaffirm our rights to privacy in the digital age.
This essay previously appeared in the Washington Post.
Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/01/detecting_drone.html
This is clever:
Researchers at Ben Gurion University in Beer Sheva, Israel have built a proof-of-concept system for counter-surveillance against spy drones that demonstrates a clever, if not exactly simple, way to determine whether a certain person or object is under aerial surveillance. They first generate a recognizable pattern on whatever subject — a window, say — someone might want to guard from potential surveillance. Then they remotely intercept a drone’s radio signals to look for that pattern in the streaming video the drone sends back to its operator. If they spot it, they can determine that the drone is looking at their subject.
In other words, they can see what the drone sees, pulling out their recognizable pattern from the radio signal, even without breaking the drone’s encrypted video.
The details have to do with the way drone video is compressed:
The researchers’ technique takes advantage of an efficiency feature streaming video has used for years, known as “delta frames.” Instead of encoding video as a series of raw images, it’s compressed into a series of changes from the previous image in the video. That means when a streaming video shows a still object, it transmits fewer bytes of data than when it shows one that moves or changes color.
That compression feature can reveal key information about the content of the video to someone who’s intercepting the streaming data, security researchers have shown in recent research, even when the data is encrypted.
Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/01/susan_landaus_n.html
Susan Landau has written a terrific book on cybersecurity threats and why we need strong crypto. Listening In: Cybersecurity in an Insecure Age. It’s based in part on her 2016 Congressional testimony in the Apple/FBI case; it examines how the Digital Revolution has transformed society, and how law enforcement needs to — and can — adjust to the new realities. The book is accessible to techies and non-techies alike, and is strongly recommended.
And if you’ve already read it, give it a review on Amazon. Reviews sell books, and this one needs more of them.
Post Syndicated from Ernesto original https://torrentfreak.com/judge-issues-devastating-order-bittorrent-copyright-troll-180110/
In recent years, file-sharers around the world have been pressured to pay significant settlement fees, or face legal repercussions.
These so-called “copyright trolling” efforts have been a common occurrence in the United States since the turn of the last decade.
Increasingly, however, courts are growing weary of these cases. Many districts have turned into no-go zones for copyright trolls and the people behind Prenda law were arrested and are being prosecuted in a criminal case.
In the Western District of Washington, the tide also appears to have turned. After Venice PI, a copyright holder of the film “Once Upon a Time in Venice”, sued a man who later passed away, concerns were raised over the validity of the evidence.
Venice PI responded to the concerns with a declaration explaining its data gathering technique and assuring the Court that false positives are out of the question.
That testimony didn’t help much though, as a recently filed minute order shows this week. The order applies to a dozen cases and prohibits the company from reaching out to any defendants until further notice, as there are several alarming issues that have to be resolved first.
One of the problems is that Venice PI declared that it’s owned by a company named Lost Dog Productions, which in turn is owned by Voltage Productions. Interestingly, these companies don’t appear in the usual records.
“A search of the California Secretary of State’s online database, however, reveals no registered entity with the name ‘Lost Dog’ or ‘Lost Dog Productions’,” the Court notes.
“Moreover, although ‘Voltage Pictures, LLC’ is registered with the California Secretary of State, and has the same address as Venice PI, LLC, the parent company named in plaintiff’s corporate disclosure form, ‘Voltage Productions, LLC,’ cannot be found in the California Secretary of State’s online database and does not appear to exist.”
In other words, the company that filed the lawsuit, as well as its parent company, are extremely questionable.
While the above is a reason for concern, it’s just the tip of the iceberg. The Court not only points out administrative errors, but it also has serious doubts about the evidence collection process. This was carried out by the German company MaverickEye, which used the tracking technology of another German company, GuardaLey.
GuardaLey CEO Benjamin Perino, who claims that he coded the tracking software, wrote a declaration explaining that the infringement detection system at issue “cannot yield a false positive.” However, the Court doubts this statement and Perino’s qualifications in general.
“Perino has been proffered as an expert, but his qualifications consist of a technical high school education and work experience unrelated to the peer-to-peer file-sharing technology known as BitTorrent,” the Court writes.
“Perino does not have the qualifications necessary to be considered an expert in the field in question, and his opinion that the surveillance program is incapable of error is both contrary to common sense and inconsistent with plaintiff’s counsel’s conduct in other matters in this district. Plaintiff has not submitted an adequate offer of proof”
It seems like the Court would prefer to see an assessment from a qualified independent expert instead of the person who wrote the software. For now, this means that the IP-address evidence, in these cases, is not good enough. That’s quite a blow for the copyright holder.
If that wasn’t enough the Court also highlights another issue that’s possibly even more problematic. When Venice PI requested the subpoenas to identify alleged pirates, they relied on declarations from Daniel Arheidt, a consultant for MaverickEye.
These declarations fail to mention, however, that MaverickEye has the proper paperwork to collect IP addresses.
“Nowhere in Arheidt’s declarations does he indicate that either he or MaverickEye is licensed in Washington to conduct private investigation work,” the order reads.
This is important, as doing private investigator work without a license is a gross misdemeanor in Washington. The copyright holder was aware of this requirement because it was brought up in related cases in the past.
“Plaintiff’s counsel has apparently been aware since October 2016, when he received a letter concerning LHF Productions, Inc. v. Collins, C16-1017 RSM, that Arheidt might be committing a crime by engaging in unlicensed surveillance of Washington citizens, but he did not disclose this fact to the Court.”
The order is very bad news for Venice PI. The company had hoped to score a few dozen easy settlements but the tables have now been turned. The Court instead asks the company to explain the deficiencies and provide additional details. In the meantime, the copyright holder is urged not to spend or transfer any of the settlement money that has been collected thus far.
The latter indicates that Venice PI might have to hand defendants their money back, which would be pretty unique.
The order suggests that the Judge is very suspicious of these trolling activities. In a footnote there’s a link to a Fight Copyright Trolls article which revealed that the same counsel dismissed several cases, allegedly to avoid having IP-address evidence scrutinized.
Even more bizarrely, in another footnote the Court also doubts if MaverickEye’s aforementioned consultant, Daniel Arheidt, actually exists.
“The Court has recently become aware that Arheidt is the latest in a series of German declarants (Darren M. Griffin, Daniel Macek, Daniel Susac, Tobias Fieser, Michael Patzer) who might be aliases or even fictitious.
“Plaintiff will not be permitted to rely on Arheidt’s declarations or underlying data without explaining to the Court’s satisfaction Arheidt’s relationship to the above-listed declarants and producing proof beyond a reasonable doubt of Arheidt’s existence,” the court adds.
These are serious allegations, to say the least.
If a copyright holder uses non-existent companies and questionable testimony from unqualified experts after obtaining evidence illegally to get a subpoena backed by a fictitious person….something’s not quite right.
A copy of the minute order, which affects a series of cases, is available here (pdf).
Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/01/detecting_adblo.html
Interesting research on the prevalence of adblock blockers: “Measuring and Disrupting Anti-Adblockers Using Differential Execution Analysis“:
Abstract: Millions of people use adblockers to remove intrusive and malicious ads as well as protect themselves against tracking and pervasive surveillance. Online publishers consider adblockers a major threat to the ad-powered “free” Web. They have started to retaliate against adblockers by employing anti-adblockers which can detect and stop adblock users. To counter this retaliation, adblockers in turn try to detect and filter anti-adblocking scripts. This back and forth has prompted an escalating arms race between adblockers and anti-adblockers.
Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/01/fake_santa_surv.html
Reka makes a “decorative Santa cam,” meaning that it’s not a real camera. Instead, it just gets children used to being under constant surveillance.
Our Santa Cam has a cute Father Christmas and mistletoe design, and a red, flashing LED light which will make the most logical kids suspend their disbelief and start to believe!
Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/12/santa_claus_is_.html
Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/12/surveillance_in_3.html
Post Syndicated from Andy original https://torrentfreak.com/apple-ceo-is-optimistic-vpn-apps-will-return-to-china-app-store-171206/
During the final days of July, Apple was forced to remove many of the most-used VPN applications from its Chinese App Store. In a short email from the company, VPN providers and software developers were told that VPN applications are considered illegal in China.
“We are writing to notify you that your application will be removed from the China App Store because it includes content that is illegal in China, which is not in compliance with the App Store Review Guidelines,” Apple informed the affected VPNs.
While the position on the ground doesn’t appear to have changed in the interim, Apple Chief Executive Tim Cook today expressed optimism that the VPN apps would eventually be restored to their former positions on China’s version of the App Store.
“My hope over time is that some of the things, the couple of things that’s been pulled, come back,” Cook said. “I have great hope on that and great optimism on that.”
According to Reuters, Cook said that he always tries to find ways to work together to settle differences and if he gets criticized for that “so be it.”
Speaking at the Fortune Forum in the Chinese city of Guangzhou, Cook said that he believes strongly in freedoms. But back home in the US, Apple has been strongly criticized for not doing enough to uphold freedom of speech and communication in China.
Back in October, two US senators wrote to Cook asking why the company had removed the VPN apps from the company’s store in China.
“VPNs allow users to access the uncensored Internet in China and other countries that restrict Internet freedom. If these reports are true, we are concerned that Apple may be enabling the Chinese government’s censorship and surveillance of the Internet,” senators Ted Cruz and Patrick Leahy wrote.
“While Apple’s many contributions to the global exchange of information are admirable, removing VPN apps that allow individuals in China to evade the Great Firewall and access the Internet privately does not enable people in China to ‘speak up’.”
They were comments Senator Leahy underlined again yesterday.
“American tech companies have become leading champions of free expression. But that commitment should not end at our borders,” Leahy told CNBC.
“Global leaders in innovation, like Apple, have both an opportunity and a moral obligation to promote free expression and other basic human rights in countries that routinely deny these rights.”
Whether the optimism expressed by Cook today is based on discussions with the Chinese government is unknown. However, it seems unlikely that authorities would be willing to significantly compromise on their dedication to maintaining the Great Firewall, which not only controls access to locally controversial content but also seeks to boost the success of Chinese companies.
Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/11/nsa_red_disk_da.html
The disk image, when unpacked and loaded, is a snapshot of a hard drive dating back to May 2013 from a Linux-based server that forms part of a cloud-based intelligence sharing system, known as Red Disk. The project, developed by INSCOM’s Futures Directorate, was slated to complement the Army’s so-called distributed common ground system (DCGS), a legacy platform for processing and sharing intelligence, surveillance, and reconnaissance information.
Red Disk was envisioned as a highly customizable cloud system that could meet the demands of large, complex military operations. The hope was that Red Disk could provide a consistent picture from the Pentagon to deployed soldiers in the Afghan battlefield, including satellite images and video feeds from drones trained on terrorists and enemy fighters, according to a Foreign Policy report.
Red Disk was a modular, customizable, and scalable system for sharing intelligence across the battlefield, like electronic intercepts, drone footage and satellite imagery, and classified reports, for troops to access with laptops and tablets on the battlefield. Marking files found in several directories imply the disk is “top secret,” and restricted from being shared to foreign intelligence partners.
A couple of points. One, this isn’t particularly sensitive. It’s an intelligence distribution system under development. It’s not raw intelligence. Two, this doesn’t seem to be classified data. Even the article hedges, using the unofficial term of “highly sensitive.” Three, it doesn’t seem that Chris Vickery, the researcher that discovered the data, has published it.
Chris Vickery, director of cyber risk research at security firm UpGuard, found the data and informed the government of the breach in October. The storage server was subsequently secured, though its owner remains unknown.
This doesn’t feel like a big deal to me.
Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/11/warrant_protect.html
The cell phones we carry with us constantly are the most perfect surveillance device ever invented, and our laws haven’t caught up to that reality. That might change soon.
This week, the Supreme Court will hear a case with profound implications on your security and privacy in the coming years. The Fourth Amendment’s prohibition of unlawful search and seizure is a vital right that protects us all from police overreach, and the way the courts interpret it is increasingly nonsensical in our computerized and networked world. The Supreme Court can either update current law to reflect the world, or it can further solidify an unnecessary and dangerous police power.
The case centers on cell phone location data and whether the police need a warrant to get it, or if they can use a simple subpoena, which is easier to obtain. Current Fourth Amendment doctrine holds that you lose all privacy protections over any data you willingly share with a third party. Your cellular provider, under this interpretation, is a third party with whom you’ve willingly shared your movements, 24 hours a day, going back months — even though you don’t really have any choice about whether to share with them. So police can request records of where you’ve been from cell carriers without any judicial oversight. The case before the court, Carpenter v. United States, could change that.
Traditionally, information that was most precious to us was physically close to us. It was on our bodies, in our homes and offices, in our cars. Because of that, the courts gave that information extra protections. Information that we stored far away from us, or gave to other people, afforded fewer protections. Police searches have been governed by the “third-party doctrine,” which explicitly says that information we share with others is not considered private.
The Internet has turned that thinking upside-down. Our cell phones know who we talk to and, if we’re talking via text or e-mail, what we say. They track our location constantly, so they know where we live and work. Because they’re the first and last thing we check every day, they know when we go to sleep and when we wake up. Because everyone has one, they know whom we sleep with. And because of how those phones work, all that information is naturally shared with third parties.
More generally, all our data is literally stored on computers belonging to other people. It’s our e-mail, text messages, photos, Google docs, and more all in the cloud. We store it there not because it’s unimportant, but precisely because it is important. And as the Internet of Things computerizes the rest our lives, even more data will be collected by other people: data from our health trackers and medical devices, data from our home sensors and appliances, data from Internet-connected “listeners” like Alexa, Siri, and your voice-activated television.
All this data will be collected and saved by third parties, sometimes for years. The result is a detailed dossier of your activities more complete than any private investigator – or police officer – could possibly collect by following you around.
The issue here is not whether the police should be allowed to use that data to help solve crimes. Of course they should. The issue is whether that information should be protected by the warrant process that requires the police to have probable cause to investigate you and get approval by a court.
Warrants are a security mechanism. They prevent the police from abusing their authority to investigate someone they have no reason to suspect of a crime. They prevent the police from going on “fishing expeditions.” They protect our rights and liberties, even as we willingly give up our privacy to the legitimate needs of law enforcement.
The third-party doctrine never made a lot of sense. Just because I share an intimate secret with my spouse, friend, or doctor doesn’t mean that I no longer consider it private. It makes even less sense in today’s hyper-connected world. It’s long past time the Supreme Court recognized that a months’-long history of my movements is private, and my e-mails and other personal data deserve the same protections, whether they’re on my laptop or on Google’s servers.
This essay previously appeared in the Washington Post.
EDITED TO ADD (12/1): Good commentary on the Supreme Court oral arguments.
Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/11/websites_use_se.html
The security researchers at Princeton are posting
You may know that most websites have third-party analytics scripts that record which pages you visit and the searches you make. But lately, more and more sites use “session replay” scripts. These scripts record your keystrokes, mouse movements, and scrolling behavior, along with the entire contents of the pages you visit, and send them to third-party servers. Unlike typical analytics services that provide aggregate statistics, these scripts are intended for the recording and playback of individual browsing sessions, as if someone is looking over your shoulder.
The stated purpose of this data collection includes gathering insights into how users interact with websites and discovering broken or confusing pages. However the extent of data collected by these services far exceeds user expectations; text typed into forms is collected before the user submits the form, and precise mouse movements are saved, all without any visual indication to the user. This data can’t reasonably be expected to be kept anonymous. In fact, some companies allow publishers to explicitly link recordings to a user’s real identity.
The researchers will post more details on their blog; I’ll link to them when they’re published.
Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/11/motherboard_dig.html
This digital security guide by Motherboard is very good. I put alongside EFF’s “Surveillance Self-Defense” and John Scott-Railton’s “Digital Security Low Hanging Fruit.” There’s also “Digital Security and Privacy for Human Rights Defenders.”
There are too many of these….
Post Syndicated from Andy original https://torrentfreak.com/us-senators-ask-apple-why-vpn-apps-were-removed-in-china-171020/
As part of what is now clearly a crackdown on Great Firewall-evading tools and services, during the summer Chinese government pressure reached technology giant Apple.
On or around July 29, Apple removed many of the most-used VPN applications from its Chinese app store. In a short email from the company, VPN providers were informed that VPN applications are considered illegal in China.
“We are writing to notify you that your application will be removed from the China App Store because it includes content that is illegal in China, which is not in compliance with the App Store Review Guidelines,” Apple informed the affected VPNs.
Now, in a letter sent to Apple CEO Tim Cook, US senators Ted Cruz and Patrick Leahy express concern at the move by Apple, noting that if reports of the software removals are true, the company could be assisting China’s restrictive approach to the Internet.
“VPNs allow users to access the uncensored Internet in China and other countries that restrict Internet freedom. If these reports are true, we are concerned that Apple may be enabling the Chines government’s censorship and surveillance of the Internet.”
Describing China as a country with “an abysmal human rights record, including with respect to the rights of free expression and free access to information, both online and offline”, the senators cite Reporters Without Borders who previously labeled the country as “the enemy of the Internet”.
While senators Cruz and Leahy go on to praise Apple for its contribution to the spread of information, they criticize the company for going along with the wishes of the Chinese government as it seeks to suppress knowledge and communication.
“While Apple’s many contributions to the global exchange of information are admirable, removing VPN apps that allow individuals in China to evade the Great Firewall and access the Internet privately does not enable people in China to ‘speak up’,” the senators write.
“To the contrary, if Apple complies with such demands from the Chinese government it inhibits free expression for users across China, particularly in light of the Cyberspace Administration of China’s new regulations targeting online anonymity.”
In January, a notice published by China’s Ministry of Industry and Information Technology said that the government had indeed launched a 14-month campaign to crack down on local ‘unauthorized’ Internet platforms.
This means that all VPN services have to be pre-approved by the Government if they want to operate in China. And the aggression against VPNs and their providers didn’t stop there.
In September, a Chinese man who sold Great Firewall-evading VPN software via a website was sentenced to nine months in prison by a Chinese court. Just weeks later, a software developer who set up a VPN for his own use but later sold access to the service was arrested and detained for three days.
This emerging pattern is clearly a concern for the senators who are now demanding that Tim Cook responds to ten questions (pdf), including whether Apple raised concerns about China’s VPN removal demands and details of how many apps were removed from its store. The senators also want to see copies of any pro-free speech statements Apple has made in China.
Whether the letter will make any difference on the ground in China remains to be seen, but the public involvement of the senators and technology giant Apple is certain to thrust censorship and privacy further into the public eye.
Post Syndicated from Andy original https://torrentfreak.com/purevpn-explains-how-it-helped-the-fbi-catch-a-cyberstalker-171016/
Early October, Ryan S. Lin, 24, of Newton, Massachusetts, was arrested on suspicion of conducting “an extensive cyberstalking campaign” against a 24-year-old Massachusetts woman, as well as her family members and friends.
The Department of Justice described Lin’s offenses as a “multi-faceted” computer hacking and cyberstalking campaign. Launched in April 2016 when he began hacking into the victim’s online accounts, Lin allegedly obtained personal photographs and sensitive information about her medical and sexual histories and distributed that information to hundreds of other people.
Details of what information the FBI compiled on Lin can be found in our earlier report but aside from his alleged crimes (which are both significant and repugnant), it was PureVPN’s involvement in the case that caused the most controversy.
In a report compiled by an FBI special agent, it was revealed that the Hong Kong-based company’s logs helped the authorities net the alleged criminal.
“Significantly, PureVPN was able to determine that their service was accessed by the same customer from two originating IP addresses: the RCN IP address from the home Lin was living in at the time, and the software company where Lin was employed at the time,” the agent’s affidavit reads.
Among many in the privacy community, this revelation was met with disappointment. On the PureVPN website the company claims to carry no logs and on a general basis, it’s expected that so-called “no-logging” VPN providers should provide people with some anonymity, at least as far as their service goes. Now, several days after the furor, the company has responded to its critics.
In a fairly lengthy statement, the company begins by confirming that it definitely doesn’t log what websites a user views or what content he or she downloads.
However, that’s only half the problem. While it doesn’t log user activity (what sites people visit or content they download), it does log the IP addresses that customers use to access the PureVPN service. These, given the right circumstances, can be matched to external activities thanks to logs carried by other web companies.
PureVPN talks about logs held by Google’s Gmail service to illustrate its point.
“A network log is automatically generated every time a user visits a website. For the sake of this example, let’s say a user logged into their Gmail account. Every time they accessed Gmail, the email provider created a network log,” the company explains.
“If you are using a VPN, Gmail’s network log would contain the IP provided by PureVPN. This is one half of the picture. Now, if someone asks Google who accessed the user’s account, Google would state that whoever was using this IP, accessed the account.
“If the user was connected to PureVPN, it would be a PureVPN IP. The inquirer [in the Lin case, the FBI] would then share timestamps and network logs acquired from Google and ask them to be compared with the network logs maintained by the VPN provider.”
Now, if PureVPN carried no logs – literally no logs – it would not be able to help with this kind of inquiry. That was the case last year when the FBI approached Private Internet Access for information and the company was unable to assist.
However, as is made pretty clear by PureVPN’s explanation, the company does log user IP addresses and timestamps which reveal when a user was logged on to the service. It doesn’t matter that PureVPN doesn’t log what the user allegedly did online, since the third-party service already knows that information to the precise second.
Following the example, GMail knows that a user sent an email at 10:22am on Monday October 16 from a PureVPN IP address. So, if PureVPN is approached by the FBI, the company can confirm that User X was using the same IP address at exactly the same time, and his home IP address was XXX.XX.XXX.XX. Effectively, the combined logs link one IP address to the other and the user is revealed. It’s that simple.
It is for this reason that in TorrentFreak’s annual summary of no-logging VPN providers, the very first question we ask every single company reads as follows:
Do you keep ANY logs which would allow you to match an IP-address and a time stamp to a user/users of your service? If so, what information do you hold and for how long?
Clearly, if a company says “yes we log incoming IP addresses and associated timestamps”, any claim to total user anonymity is ended right there and then.
While not completely useless (a logging service will still stop the prying eyes of ISPs and similar surveillance, while also defeating throttling and site-blocking), if you’re a whistle-blower with a job or even your life to protect, this level of protection is entirely inadequate.
The take-home points from this controversy are numerous, but perhaps the most important is for people to read and understand VPN provider logging policies.
Secondly, and just as importantly, VPN providers need to be extremely clear about the information they log. Not tracking browsing or downloading activities is all well and good, but if home IP addresses and timestamps are stored, this needs to be made clear to the customer.
Finally, VPN users should not be evil. There are plenty of good reasons to stay anonymous online but cyberstalking, death threats and ruining people’s lives are not included. Fortunately, the FBI have offline methods for catching this type of offender, and long may that continue.
PureVPN’s blog post is available here.
Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/10/my_blogging.html
Blog regulars will notice that I haven’t been posting as much lately as I have in the past. There are two reasons. One, it feels harder to find things to write about. So often it’s the same stories over and over. I don’t like repeating myself. Two, I am busy writing a book. The title is still: Click Here to Kill Everybody: Peril and Promise in a Hyper-Connected World. The book is a year late, and as a very different table of contents than it had in 2016. I have been writing steadily since mid-August. The book is due to the publisher at the end of March 2018, and will be published in the beginning of September.
This is the current table of contents:
So that’s what’s been happening.
Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/10/responsible-encryption-fallacies.html
Deputy Attorney General Rod Rosenstein gave a speech recently calling for “Responsible Encryption” (aka. “Crypto Backdoors”). It’s full of dangerous ideas that need to be debunked.
But the Mira case demonstrated the opposite, how law enforcement is not needed. They made no arrests in the case. A year later, they still haven’t a clue who did it.
Conversely, we technologists have fixed the major infrastructure issues. Specifically, those affected by the DNS outage have moved to multiple DNS providers, including a high-capacity DNS provider like Google and Amazon who can handle such large attacks easily.
In other words, we the people fixed the major Mirai problem, and law-enforcement didn’t.
Our society has never had a system where evidence of criminal wrongdoing was totally impervious to detection
Of course our society has places “impervious to detection”, protected by both legal and natural barriers.
An example of a legal barrier is how spouses can’t be forced to testify against each other. This barrier is impervious.
A better example, though, is how so much of government, intelligence, the military, and law enforcement itself is impervious. If prosecutors could gather evidence everywhere, then why isn’t Rosenstein prosecuting those guilty of CIA torture?
Oh, you say, government is a special exception. If that were the case, then why did Rosenstein dedicate a precious third of his speech discussing the “rule of law” and how it applies to everyone, “protecting people from abuse by the government”. It obviously doesn’t, there’s one rule of government and a different rule for the people, and the rule for government means there’s lots of places law enforcement can’t go to gather evidence.
Likewise, the crypto backdoor Rosenstein is demanding for citizens doesn’t apply to the President, Congress, the NSA, the Army, or Rosenstein himself.
Then there are the natural barriers. The police can’t read your mind. They can only get the evidence that is there, like partial fingerprints, which are far less reliable than full fingerprints. They can’t go backwards in time.
I mention this because encryption is a natural barrier. It’s their job to overcome this barrier if they can, to crack crypto and so forth. It’s not our job to do it for them.
It’s like the camera that increasingly comes with TVs for video conferencing, or the microphone on Alexa-style devices that are always recording. This suddenly creates evidence that the police want our help in gathering, such as having the camera turned on all the time, recording to disk, in case the police later gets a warrant, to peer backward in time what happened in our living rooms. The “nothing is impervious” argument applies here as well. And it’s equally bogus here. By not helping police by not recording our activities, we aren’t somehow breaking some long standing tradit
And this is the scary part. It’s not that we are breaking some ancient tradition that there’s no place the police can’t go (with a warrant). Instead, crypto backdoors breaking the tradition that never before have I been forced to help them eavesdrop on me, even before I’m a suspect, even before any crime has been committed. Sure, laws like CALEA force the phone companies to help the police against wrongdoers — but here Rosenstein is insisting I help the police against myself.
Rosenstein repeats the frequent claim that encryption upsets the balance between privacy/safety:
Warrant-proof encryption defeats the constitutional balance by elevating privacy above public safety.
This is laughable, because technology has swung the balance alarmingly in favor of law enforcement. Far from “Going Dark” as his side claims, the problem we are confronted with is “Going Light”, where the police state monitors our every action.
You are surrounded by recording devices. If you walk down the street in town, outdoor surveillance cameras feed police facial recognition systems. If you drive, automated license plate readers can track your route. If you make a phone call or use a credit card, the police get a record of the transaction. If you stay in a hotel, they demand your ID, for law enforcement purposes.
And that’s their stuff, which is nothing compared to your stuff. You are never far from a recording device you own, such as your mobile phone, TV, Alexa/Siri/OkGoogle device, laptop. Modern cars from the last few years increasingly have always-on cell connections and data recorders that record your every action (and location).
Even if you hike out into the country, when you get back, the FBI can subpoena your GPS device to track down your hidden weapon’s cache, or grab the photos from your camera.
And this is all offline. So much of what we do is now online. Of the photographs you own, fewer than 1% are printed out, the rest are on your computer or backed up to the cloud.
Your phone is also a GPS recorder of your exact position all the time, which if the government wins the Carpenter case, they police can grab without a warrant. Tagging all citizens with a recording device of their position is not “balance” but the premise for a novel more dystopic than 1984.
If suspected of a crime, which would you rather the police searched? Your person, houses, papers, and physical effects? Or your mobile phone, computer, email, and online/cloud accounts?
The balance of privacy and safety has swung so far in favor of law enforcement that rather than debating whether they should have crypto backdoors, we should be debating how to add more privacy protections.
Rosenstein’s claim may be true, that a lot of criminals will go free because the other electronic data isn’t convincing enough. But I’d need to see that claim backed up with hard studies, not thrown out for emotional impact.
No one calls any of those functions [like key recovery] a “back door.” In fact, those capabilities are marketed and sought out by many users.
But that’s because the term “backdoor” refers less to how it’s done and more to who is doing it. If I set up a recovery password with Apple, I’m the one doing it to myself, so we don’t call it a backdoor. If it’s the police, spies, hackers, or criminals, then we call it a “backdoor” — even it’s identical technology.
Wikipedia uses the key escrow feature of the 1990s Clipper Chip as a prime example of what everyone means by “backdoor“. By “no one”, Rosenstein is including Wikipedia, which is obviously incorrect.
Though in truth, it’s not going to be the same technology. The needs of law enforcement are different than my personal key escrow/backup needs. In particular, there are unsolvable problems, such as a backdoor that works for the “legitimate” law enforcement in the United States but not for the “illegitimate” police states like Russia and China.
The question is whether a “provider” like Telegram, a Russian company beyond US law, provides this feature. Or, by extension, whether individuals should be free to install whatever software they want, regardless of provider.
If the, for some reason, the US is able to convince all such providers (including Telegram) to install a backdoor, then it still doesn’t solve the problem, as uses can just build their own end-to-end encryption app that has no provider. It’s like email: some use the major providers like GMail, others setup their own email server.
Ultimately, this means that any law mandating “crypto backdoors” is going to target users not providers. Rosenstein tries to make a comparison with what plain-old telephone companies have to do under old laws like CALEA, but that’s not what’s happening here. Instead, for such rules to have any effect, they have to punish users for what they install, not providers.
Of course they [Apple] do. They are in the business of selling products and making money.
We [the DoJ] use a different measure of success. We are in the business of preventing crime and saving lives.
A better phrasing would have been:
They are in the business of giving customers what they want.
We are in the business of giving voters what they want.
Companies are willing to make accommodations when required by the government. Recent media reports suggest that a major American technology company developed a tool to suppress online posts in certain geographic areas in order to embrace a foreign government’s censorship policies.
Companies are willing to acquiesce to vile requests made by police-states. Therefore, they should acquiesce to our vile police-state requests.
There is no constitutional right to sell warrant-proof encryption.
That’s why we have the Bill of Rights, to protect the 49% against abuse by the 51%. Regardless whether the Supreme Court agrees the current Constitution, it is the sort right that might exist regardless of what the Constitution says.
Those of us who swear to protect the rule of law have a different motivation. We are obliged to speak the truth.
The truth is that “going dark” threatens to disable law enforcement and enable criminals and terrorists to operate with impunity.
But he’s not obliged to tell his spouse his honest opinion whether that new outfit makes them look fat. Likewise, Rosenstein knows his opinion on public policy doesn’t fall into this category. He can say with impunity that either global warming doesn’t exist, or that it’ll cause a biblical deluge within 5 years. Both are factually untrue, but it’s not going to get him fired.
And this particular claim is also exaggerated bunk. While everyone agrees encryption makes law enforcement’s job harder than with backdoors, nobody honestly believes it can “disable” law enforcement. While everyone agrees that encryption helps terrorists, nobody believes it can enable them to act with “impunity”.
I feel bad here. It’s a terrible thing to question your opponent’s character this way. But Rosenstein made this unavoidable when he clearly, with no ambiguity, put his integrity as Deputy Attorney General on the line behind the statement that “going dark threatens to disable law enforcement and enable criminals and terrorists to operate with impunity”. I feel it’s a bald face lie, but you don’t need to take my word for it. Read his own words yourself and judge his integrity.
Rosenstein’s speech includes repeated references to ideas like “oath”, “honor”, and “duty”. It reminds me of Col. Jessup’s speech in the movie “A Few Good Men”.
If you’ll recall, it was rousing speech, “you want me on that wall” and “you use words like honor as a punchline”. Of course, since he was violating his oath and sending two privates to death row in order to avoid being held accountable, it was Jessup himself who was crapping on the concepts of “honor”, “oath”, and “duty”.
And so is Rosenstein. He imagines himself on that wall, doing albeit terrible things, justified by his duty to protect citizens. He imagines that it’s he who is honorable, while the rest of us not, even has he utters bald faced lies to further his own power and authority.
We activists oppose crypto backdoors not because we lack honor, or because we are criminals, or because we support terrorists and child molesters. It’s because we value privacy and government officials who get corrupted by power. It’s not that we fear Trump becoming a dictator, it’s that we fear bureaucrats at Rosenstein’s level becoming drunk on authority — which Rosenstein demonstrably has. His speech is a long train of corrupt ideas pursuing the same object of despotism — a despotism we oppose.
In other words, we oppose crypto backdoors because it’s not a tool of law enforcement, but a tool of despotism.
Post Syndicated from Ujjwal Ratan original https://aws.amazon.com/blogs/big-data/build-a-schema-on-read-analytics-pipeline-using-amazon-athena/
A robust analytics platform is critical for an organization’s success. However, an analytical system is only as good as the data that is used to power it. Wrong or incomplete data can derail analytical projects completely. Moreover, with varied data types and sources being added to the analytical platform, it’s important that the platform is able to keep up. This means the system has to be flexible enough to adapt to changing data types and volumes.
For a platform based on relational databases, this would mean countless data model updates, code changes, and regression testing, all of which can take a very long time. For a decision support system, it’s imperative to provide the correct answers quickly. Architects must design analytical platforms with the following criteria:
Schema-on-read is a unique approach for storing and querying datasets. It reverses the order of things when compared to schema-on-write in that the data can be stored as is and you apply a schema at the time that you read it. This approach has changed the time to value for analytical projects. It means you can immediately load data and can query it to do exploratory activities.
In this post, I show how to build a schema-on-read analytical pipeline, similar to the one used with relational databases, using Amazon Athena. The approach is completely serverless, which allows the analytical platform to scale as more data is stored and processed via the pipeline.
Part of the reason why it’s so difficult to build analytical platforms are the challenges associated with data integration. This is usually done as a series of Extract Transform Load (ETL) jobs that pull data from multiple varied sources and integrate them into a central repository. The multiple steps of ETL can be viewed as a pipeline where raw data is fed in one end and integrated data accumulates at the other.
Two major hurdles complicate the development of ETL pipelines:
Put this in context by looking at the healthcare industry. Building an analytical pipeline for healthcare usually involves working with multiple datasets that cover care quality, provider operations, patient clinical records, and patient claims data. Customers are now looking for even more data sources that may contain valuable information that can be analyzed. Examples include social media data, data from devices and sensors, and biometric and human-generated data such as notes from doctors. These new data sources do not rely on fixed schemas, so integrating them by conventional ETL jobs is much more difficult.
The volume of data available for analysis is growing as well. According to an article published by NCBI, the US healthcare system reached a collective data volume of 150 exabytes in 2011. With the current rate of growth, it is expected to quickly reach zettabyte scale, and soon grow into the yottabytes.
The traditional data warehouses of the early 2000s, like the one shown in the following diagram, were based on a standard four-layer approach of ingest, stage, store, and report. This involved building and maintaining huge data models that suited certain sources and analytical queries. This type of design is based on a schema-on-write approach, where you write data into a predefined schema and read data by querying it.
As we progressed to more varied datasets and analytical requirements, the predefined schemas were not able to keep up. Moreover, by the time the models were built and put into production, the requirements had changed.
Schema-on-read provides much needed flexibility to an analytical project. Not having to rely on physical schemas improves the overall performance when it comes to high volume data loads. Because there are no physical constraints for the data, it works well with datasets with undefined data structures. It gives customers the power to experiment and try out different options.
Athena is a serverless analytical query engine that allows you to start querying data stored in Amazon S3 instantly. It supports standard formats like CSV and Parquet and integrates with Amazon QuickSight, which allows you to build interactive business intelligence reports.
The Amazon Athena User Guide provides multiple best practices that should be considered when using Athena for analytical pipelines at production scale. There are various performance tuning techniques that you can apply to speed up query performance. For more information, see Top 10 Performance Tuning Tips for Amazon Athena.
To demonstrate this solution, I use the healthcare dataset from the Centers for Disease Control (CDC) Behavioral Risk Factor Surveillance system (BRFSS). It gathers data via telephone surveys for health-related risk behaviors and conditions, and the use of preventive services. The dataset is available as zip files from the CDC FTP portal for general download and analysis. There is also a user guide with comprehensive details about the program and the process of collecting data.
The following diagram shows the schema-on-read pipeline that demonstrates this solution.
In this architecture, S3 is the central data repository for the CSV files, which are divided by behavioral risk factors like smoking, drinking, obesity, and high blood pressure.
Athena is used to collectively query the CSV files in S3. It uses the AWS Glue Data Catalog to maintain the schema details and applies it at the time of querying the data.
The dataset is further filtered and transformed into a subset that is specifically used for reporting with Amazon QuickSight.
As new data files become available, they are incrementally added to the S3 bucket and the subset query automatically appends them to the end of the table. The dashboards in Amazon QuickSight are refreshed with the new values of calculated metrics.
To use Athena in an analytical pipeline, you have to consider how to design it for initial data ingestion and the subsequent incremental ingestions. Moreover, you also have to decide on the mechanism to trigger a particular stage in the pipeline, which can either be scheduled or event-based.
For the purposes of this post, you build a simple pipeline where the data is:
This approach is highly customizable and can be used for building more complex pipelines with many more steps.
The data ingestion into S3 is fairly straightforward. The files can be ingested via the S3 command line interface (CLI), API, or the AWS Management Console. The data files are in CSV format and already divided by the behavioral condition. To improve performance, I recommend that you use partitioned data with Athena, especially when dealing with large volumes. You can use pre-partitioned data in S3 or build partitions later in the process. The example you are working with has a total of 247 CSV files storing about 205 MB of data across them, but typical production scale deployment would be much larger.
To automate the pipeline, you can make it either event-based or schedule-based. If you take the event-based approach, you can make use of S3 events to trigger an action when the files are uploaded. The event triggers an action, using an AWS Lambda function that corresponds to another step in the pipeline. Traditional ETL jobs have to rely on mechanisms like database triggers to enable this, which can cause additional performance overhead.
If you choose to go with a scheduled-based approach, you can use Lambda with scheduled events. The schedule is managed via a cron expression and the Lambda function is used to run the next step of the pipeline. This is suitable for workloads similar to a scheduled batch ETL job.
To filter and transform the dataset, first look at the overall counts and structure of the data. This allows you to choose the columns that are important for a given report and use the filter clause to extract a subset of the data. Transforming the data as you progress through the pipeline ensures that you are only exposing relevant data to the reporting layer, to optimize performance.
To look at the entire dataset, create a table in Athena to go across the entire data volume. This can be done using the following query:
Replace YourBucket and YourPrefix with your corresponding values.
In this case, there are a total of ~1.4 million records, which you get by running a simple COUNT(*) query on the table.
From here, you can run multiple analysis queries on the dataset. For example, you can find the number of records that fall into a certain behavioral risk, or the state that has the highest number of diabetic patients recorded. These metrics provide data points that help to determine the attributes that would be needed from a reporting perspective.
As you can see, this is much simpler compared to a schema-on-write approach where the analysis of the source dataset is much more difficult. This solution allows you to design the reporting platform in accordance with the questions you are looking to answer from your data. The source data analysis is the first step to design a good analytical platform and this approach allows customers to do that much earlier in the project lifecycle.
After you have completed the source data analysis, the next step is to filter out the required data and transform it to create a reporting database. This is synonymous to the data mart in a standard analytical pipeline. Based on the analysis carried out in the previous step, you might notice some mismatches with the data headers. You might also identify the filter clauses to apply to the dataset to get to your reporting data.
Athena automatically saves query results in S3 for every run. The default bucket for this is created in the following format:
aws-athena-query-results-<AWS Account ID>-<AWS Region>
Athena creates a prefix for each saved query and stores the result set as CSV files organized by dates. You can use this feature to filter out result datasets and store them in an S3 bucket for reporting.
To enable this, create queries that can filter out and transform the subset of data on which to report. For this use case, create three separate queries to filter out unwanted data and fix the column headers:
You can save these queries in Athena so that you can get to the query results easily every time they are executed. The following screenshot is an example of the results when query 1 is executed four times.
The next step is to copy these results over to a new bucket for creating your reporting table. This can be done by running an S3 CP command from the CLI or API, as shown below:
Note the prefix structure in which Athena stores query results. It creates a separate prefix for each day in which the query is executed, and stores the corresponding CSV and metadata file for each run. Copy the result set over to a new prefix “Reporting_Data”. Use the “exclude” and “include” option of S3 CP to only copy the CSV files and use “recursive” to copy all the files from the run.
You can replace the value of the saved query name from “Query1” to “Query2” or “Query3” to copy all data resulting from those queries to the same target prefix. For pipelines that require more complicated transformations, divide the query transformation into multiple steps and execute them based on events or schedule them, as described in the earlier data staging step.
After the filtered results sets are copied as CSV files into the new Reporting_Data prefix, create a new table in Athena that is used specifically for BI reporting. This can be done using a create table statement similar to the one below:
This table can now act as a source for a dashboard on Amazon QuickSight, which is a straightforward process to enable. When you choose a new data source, Athena shows up as an option and Amazon QuickSight automatically detects the tables in Athena that are exposed for querying. Here are the data sources supported by Athena at the time of this post:
After choosing Athena, give a name to the data source and choose the database. The tables available for querying automatically show up in the list.
If you choose “BRFSS_REPORTING”, you can create custom metrics using the columns in the reporting table, which can then be used in reports and dashboards.
For more information about features and visualizations, see the Amazon QuickSight User Guide.
To build a complete pipeline, think about ingesting data incrementally for new records as they become available. To enable this, make sure that the data can be incrementally ingested into the reporting schema and that the reporting metrics are refreshed on each run of the report. To demonstrate this, look at a scenario where the new dataset is ingested in a periodic basis into S3, and which has to be included into the reporting schema when calculating the metrics.
Look at the number of records in the reporting table before the incremental ingestion.
The results are as follows:
Use Query1 as an example transformation query that can isolate the incremental load. Here is a view of the query result bucket before Query1 runs.
After the incremental data is ingested in S3, trigger an event (or pre-schedule) an execution of Query1 in Athena, which results in the csv result set and the metadata file as shown below.
Next, trigger (or schedule) the copy command to copy the incremental records file into the reporting prefix. This is easily automated by using the predefined structure in which Athena saves the query results on S3. On checking the records count in the reporting table after copying, you get an increased count.
This shows that 96,919 records were added to our reporting table that can be used in metric calculations.
This process can be implemented programmatically to add incremental records into the reporting table every time new records are ingested into the staging area. As a result, you can simulate an end-to-end analytical pipeline that runs based on events or is scheduled to run as a batch workload.
Using Athena, you can benefit from the advantages of a schema-on-read analytical application. You can combine other AWS services like Lambda and Amazon QuickSight to build an end-to-end analytical pipeline.
However, it’s important to note that a schema-on-read analytical pipeline may not be the answer for all use cases. Carefully consider the choices between schema-on-read and schema-on-write. The source systems you are working with play a critical role in making that decision. For example:
You can also choose to go hybrid. Offload a part of the pipeline that deals with unstructured flat datasets into a schema-on-read architecture, and integrate the output into the main schema-on-write pipeline at a later stage. AWS provides multiple options to build analytical pipelines that suit various use cases. For more information, read about the AWS big data and analytics services.
If you have questions or suggestions, please comment below.
Don’t forget to check out the top 10 tips for Amazon Athena that can improve query performance.
Ujjwal Ratan is a healthcare and life sciences Solutions Architect at AWS. He has worked with organizations ranging from large enterprises to smaller startups on problems related to distributed computing, analytics and machine learning. In his free time, he enjoys listening to (and playing) music and taking unplanned road trips with his family.
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.