Five Mistakes Everyone Makes With Cloud Backup

Post Syndicated from Peter Cohen original https://www.backblaze.com/blog/5-common-cloud-backup-mistakes/

cloud backup error

Cloud-based storage and file sync services are ubiquitous: Everywhere we turn new services pop up (and often shut down), promising free or low-cost storage of everything and anything on our computers and mobile devices.

When you depend on the cloud it’s very easy to get lulled into a false sense of security. Don’t. Here are five common mistakes all of us make with cloud backup and sync services. I’ve added suggestions for how to avoid these pitfalls.

Assuming the Cloud Is Backing Things Up

“I have iCloud or Google Drive, so everything’s backed up.”

Some cloud backup and file sync services make it really easy to put files online, but they may not be all the files you need. Don’t just assume the cloud services you use are doing a complete backup of your device – check to see what is actually being backed up. The services you use may only back up specific folders or directories on your computer’s hard drive.

Read this for more info on how Backblaze backs up.

There’s a big difference between file backup services and sync services, by the way. Which brings me to my next point:

Confusing Sync for Backup

“I don’t need backup. I’ve got my files synced.”

Sync service enables you to keep consistent contents between multiple devices – think Dropbox or iCloud Drive, for example. Make one change to the contents of that shared info, and the same thing happens across all devices, including file changes and deletions. Depending on how you have syncing and sharing set-up you can delete a file on one device and have it disappear on all the other shared devices.

I’ve also found it handy to have a backup service that enables you to restore multiple versions. In point of fact, Dropbox lets you restore previous versions. Apple’s Time Machine, built into the Mac, does this too. So does Backblaze (we keep track of multiple versions up to 30 days). Not to say you shouldn’t use Dropbox, we do! We wrote about how we are complimentary services.

Thinking One Backup Is Enough

“Hey, I’m backing up to the cloud. That’s better than nothing, right?”

It’s better than nothing but it’s not enough. You want a local backup too. That’s why I recommend a 3-2-1 Backup strategy. In addition to the “live” copy of the data on your hard drive, make sure you have a local backup, and use the cloud for offsite storage. Likewise, if you’re only storing data on a local backup, you’re putting all your eggs in that basket. Add offsite backup to complete your backup strategy. Conversely, if you only store your data in the cloud, you’re susceptible to those services being down as well. So having a local copy can keep you productive even if your favorite service is temporarily down.

Leaving Things Insecure

“I’m not backing up anything important enough for hackers to bother with.”

With identity theft on the rise, the security of all of your data online should be paramount. Strong encryption is important, so make sure it’s supported by the services you depend on.

Even if a bad actor doesn’t want your data, they still may want your computer for nefarious purposes, like driving a botnet used to launch a DDOS (Distributed Denial of Service) attack. That’s exactly what recently happened to Dyn, a company that provides core Internet services for other popular Internet services like Twitter and Spotify.

Make sure to protect your computer with strong passwords, practice safe surfing and keep your computer updated with the latest software. Also check periodically for malware and get rid of it when you find it.

Thinking That it’s Taken Care Of

“I have a backup strategy in place, so I don’t have to think about it anymore.”

I think it’s wise to observe an old aphorism: “Trust but verify.”

There’s absolutely nothing wrong with developing an automated backup strategy. But it’s vitally important to periodically test your backups to make sure they’re doing what they’re supposed to.

You should test your most important, mission-critical data first. Tax returns? Important legal documents? Irreplaceable baby pictures? Make sure the files that are important to you are retrievable and intact by actually trying to recover them. Find out more about how to test your backup.

Backblaze too. Test all your backups – we even recommend it in our Best Practices.

Got more cloud backup myths to bust? Share them with me in the comments!

 

The post Five Mistakes Everyone Makes With Cloud Backup appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Use Apache Flink on Amazon EMR

Post Syndicated from Craig Foster original https://aws.amazon.com/blogs/big-data/use-apache-flink-on-amazon-emr/

Craig Foster is a Big Data Engineer with Amazon EMR

Apache Flink is a parallel data processing engine that customers are using to build real time, big data applications. Flink enables you to perform transformations on many different data sources, such as Amazon Kinesis Streams or the Apache Cassandra database.  It provides both batch and streaming APIs. Also, Flink has some SQL support for these stream and batch datasets. Most of Flink’s API actions are very similar to the transformations on distributed object collections found in Apache Hadoop or Apache Spark. Flink’s API is categorized into DataSets and DataStreams. DataSets are transformations on sets or collections of distributed data, while DataStreams are transformations on streaming data like those found in Amazon Kinesis.

Flink is a pure data streaming runtime engine, which means that it employs pipeline parallelism to perform operations on results of previous data transforms in real time. This means that multiple operations are performed concurrently. The Flink runtime handles exchanging data between these transformation pipelines. Also, while you may write a batch application, the same Flink streaming dataflow runtime implements it.

The Flink runtime consists of two different types of daemons: JobManagers, which are responsible for coordinating scheduling, checkpoint, and recovery functions, and TaskManagers, which are the worker processes that execute tasks and transfer data between streams in an application. Each application has one JobManager and at least one TaskManager.

You can scale the number of TaskManagers but also control parallelism further by using something called a “task slot.” In Flink-on-YARN, the JobManagers are co-located with the YARN ApplicationMaster, while each TaskManager is located in separate YARN containers allocated for the application.

Today we are making it even easier to run Flink on AWS as it is now natively supported in Amazon EMR 5.1.0. EMR supports running Flink-on-YARN so you can create either a long-running cluster that accepts multiple jobs or a short-running Flink session in a transient cluster that helps reduce your costs by only charging you for the time that you use.

You can also configure a cluster with Flink installed using the EMR configuration API with configuration classifications for logging and configuration parameters.

o_Flink_2

You can start using Flink on EMR today directly from the EMR console or using the CLI invocation below.

aws emr create-cluster --release-label emr-5.1.0 \
--applications Name=Flink \
--region us-east-1 \
--log-uri s3://myLogUri \
--instance-type m3.xlarge \
--instance-count 1 \ 
--service-role EMR_DefaultRole \
--ec2-attributes KeyName=YourKeyName,InstanceProfile=EMR_EC2_DefaultRole

To learn more about Apache Flink, see the Apache Flink documentation and to learn more about Flink on EMR, see the Flink topic in the Amazon EMR Release Guide.


Related

Use Spark 2.0, Hive 2.1 on Tez, and the latest from the Hadoop ecosystem on Amazon EMR release 5.0

emr-5

 

Security advisories for Friday

Post Syndicated from ris original http://lwn.net/Articles/705666/rss

Arch Linux has updated lib32-gdk-pixbuf2 (denial of service).

Debian has updated curl (multiple vulnerabilities) and memcached (code execution).

Fedora has updated kdepimlibs
(F24: three vulnerabilities), libwebp (F24:
integer overflows), and quagga (F24;
F23: three vulnerabilities).

Gentoo has updated libreoffice (multiple vulnerabilities) and oracle-jre-bin (multiple vulnerabilities).

Mageia has updated bind (denial
of service), kernel-tmb (multiple
vulnerabilities), php-adodb (two
vulnerabilities), and rpm (code execution
from 2014).

openSUSE has updated jasper
(13.2: multiple vulnerabilities, one from 2008).

Oracle has updated kernel 4.1.12 (OL7; OL6: code
execution), kernel 3.8.13 (OL7; OL6: code execution).

Red Hat has updated docker
(RHEL7: privilege escalation).

Scientific Linux has updated bind
(SL5,6: denial of service) and bind97 (SL5:
denial of service).

Slackware has updated bind (denial of service) and curl (multiple vulnerabilities).

SUSE has updated java-1_8_0-ibm
(SLE12-SP1: three vulnerabilities) and xen
(SOSC5, SMP2.1, SM2.1, SLE11-SP3: multiple vulnerabilities).

Ubuntu has updated curl (multiple vulnerabilities).

Detecting landmines – with spinach

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/detecting-landmines-with-spinach/

Forget sniffer dogs…we need to talk about spinach.

The team at MIT (Massachusetts Institute of Technology) have been working to transform spinach plants into a means of detection in the fight against buried munitions such as landmines.

Plant-to-human communication

MIT engineers have transformed spinach plants into sensors that can detect explosives and wirelessly relay that information to a handheld device similar to a smartphone. (Learn more: http://news.mit.edu/2016/nanobionic-spinach-plants-detect-explosives-1031) Watch more videos from MIT: http://www.youtube.com/user/MITNewsOffice?sub_confirmation=1 The Massachusetts Institute of Technology is an independent, coeducational, privately endowed university in Cambridge, Massachusetts.

Nanoparticles, plus tiny tubes called carbon nanotubes, are embedded into the spinach leaves where they pick up nitro-aromatics, chemicals found in the hidden munitions.

It takes the spinach approximately ten minutes to absorb water from the ground, including the nitro-aromatics, which then bind to the polymer material wrapped around the nanotube.

But where does the Pi come into this?

The MIT team shine a laser onto the leaves, detecting the altered fluorescence of the light emitted by the newly bonded tubes. This light is then read by a Raspberry Pi fitted with an infrared camera, resulting in a precise map of where hidden landmines are located. This signal can currently be picked up within a one-mile radius, with plans to increase the reach in future.

detecting landmines with spinach

You can also physically hack a smartphone to replace the Raspberry Pi… but why would you want to do that?

The team at MIT have already used the tech to detect hydrogen peroxide, TNT, and sarin, while co-author Prof. Michael Strano advises that the same setup can be used to detect “virtually anything”.

“The plants could be use for defence applications, but also to monitor public spaces for terrorism-related activities, since we show both water and airborne detection”

More information on the paper can be found at the MIT website.

The post Detecting landmines – with spinach appeared first on Raspberry Pi.

CodePipeline Update – Build Continuous Delivery Workflows for CloudFormation Stacks

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/codepipeline-update-build-continuous-delivery-workflows-for-cloudformation-stacks/

When I begin to write about a new way for you to become more productive by using two AWS services together, I think about a 1980’s TV commercial for Reese’s Peanut Butter Cups! The intersection of two useful services or two delicious flavors creates a new one that is even better.

Today’s chocolate / peanut butter intersection takes place where AWS CodePipeline meets AWS CloudFormation. You can now use CodePipeline to build a continuous delivery pipeline for CloudFormation stacks. When you practice continuous delivery, each code change is automatically built, tested, and prepared for release to production. In most cases, the continuous delivery release process includes a combination of manual and automatic approval steps. For example, code that successfully passes through a series of automated tests can be routed to a development or product manager for final review and approval before it is pushed to production.

This important combination of features allows you to use the infrastructure as code model while gaining all of the advantages of continuous delivery. Each time you change a CloudFormation template, CodePipeline can initiate a workflow that will build a test stack, test it, await manual approval, and then push the changes to production.The workflow can create and manipulate stacks in many different ways:

As you will soon see, the workflow can take advantage of advanced CloudFormation features such as the ability to generate and then apply change sets (read New – Change Sets for AWS CloudFormation to learn more) to an operational stack.

The Setup
In order to learn more about this feature, I used a CloudFormation template to set up my continuous delivery pipeline (this is yet another example of infrastructure as code). This template (available here and described in detail here) sets up a full-featured pipeline. When I use the template to create my pipeline, I specify the name of an S3 bucket and the name of a source file:

The SourceS3Key points to a ZIP file that is enabled for S3 versioning. This file contains the CloudFormation template (I am using the WordPress Single Instance example) that will be deployed via the pipeline that I am about to create. It can also contain other deployment artifacts such as configuration or parameter files; here’s what mine looks like:

The entire continuous delivery pipeline is ready just a few seconds after I click on Create Stack. Here’s what it looks like:

The Action
At this point I have used CloudFormation to set up my pipeline. With the stage set (so to speak), now I can show you how this pipeline makes use of the new CloudFormation actions.

Let’s focus on the second stage, TestStage. Triggered by the first stage, this stage uses CloudFormation to create a test stack:

The stack is created using parameter values from the test-stack-configuration.json file in my ZIP. Since you can use different configuration files for each CloudFormation action, you can use the same template for testing and production.

After the stack is up and running, the ApproveTestStack step is used to await manual approval (it says “Waiting for approval above.”). Playing the role of the dev manager, I verify that the test stack behaves and performs as expected, and then approve it:

After approval, the DeleteTestStack step deletes the test stack.

Now we are just about ready to deploy to production. ProdStage creates a CloudFormation change set and then submits it for manual approval. This stage uses the parameter values from the prod-stack-configuration.json file in my ZIP. I can use the parameters to launch a modestly sized test environment on a small EC2 instance and a large production environment from the same template.

Now I’m playing the role of the big boss, responsible for keeping the production site up and running. I review the change set in order to make sure that I understand what will happen when I deploy to production. This is the first time that I am running the pipeline, so the change set indicates that an EC2 instance and a security group will be created:

And then I approve it:

With the change set approved, it is applied to the existing production stack in the ExecuteChangeSet step. Applying the change to an  existing stack keeps existing resources in play where possible and avoids a wholesale restart of the application. This is generally more efficient and less disruptive than replacing the entire stack. It keeps in-memory caches warmed up and avoids possible bursts of cold-start activity.

Implementing a Change
Let’s say that I decide to support HTTPS. In order to do this, I need to add port 443 to my application’s security group. I simply edit the CloudFormation template, put it into a fresh ZIP, and upload it to S3. Here’s what I added to my template:

      - CidrIp: 0.0.0.0/0
        FromPort: '443'
        IpProtocol: tcp
        ToPort: '443'

Then I return to the Console and see that CodePipeline has already detected the change and set the pipeline in to motion:

The pipeline runs again, I approve the test stack, and then inspect the change set, confirming that it will simply modify an existing security group:

One quick note before I go. The CloudFormation template for the pipeline creates an IAM role and uses it to create the test and deployment stacks (this is a new feature; read about the AWS CloudFormation Service Role to learn more). For best results, you should delete the stacks before you delete the pipeline. Otherwise, you’ll need to re-create the role in order to delete the stacks.

There’s More
I’m just about out of space and time, but I’ll briefly summarize a couple of other aspects of this new capability.

Parameter Overrides – When you define a CloudFormation action, you may need to exercise additional control over the parameter values that are defined for the template. You can do this by opening up the Advanced pane and entering any desired parameter overrides:

Artifact References – In some situations you may find that you need to reference an attribute of an artifact that was produced by an earlier stage of the pipeline. For example, suppose that an early stage of your pipeline copies a Lambda function to an S3 bucket and calls the resulting artifact LambdaFunctionSource. Here’s how you would retrieve the bucket name and the object key from the attribute using a parameter override:

{
  "BucketName" : { "Fn::GetArtifactAtt" : ["LambdaFunctionSource", "BucketName"]},
  "ObjectKey" : { "Fn::GetArtifactAtt" : ["LambdaFunctionSource", "ObjectKey"]}
}

Access to JSON Parameter – You can use the new Fn::GetParam function to retrieve a value from a JSON-formatted file that is included in an artifact.

Note that Fn:GetArtifactAtt and Fn::GetParam are designed to be used within the parameter overrides.

S3 Bucket Versioning – The first step of my pipeline (the Source action) refers to an object in an S3 bucket. By enabling S3 versioning for the object, I simply upload a new version of my template after each change:

If I am using S3 as my source, I must use versioning (uploading a new object over the existing one is not supported). I can also use AWS CodeCommit or a GitHub repo as my source.

Create Pipeline Wizard
I started out this blog post by using a CloudFormation template to create my pipeline. I can also click on Create pipeline in the console and build my initial pipeline (with source, build, and beta deployment stages) using a wizard. The wizard now allows me to select CloudFormation as my deployment provider. I can create or update a stack or a change set in the beta deployment stage:

Available Now
This new feature is available now and you can start using it today. To learn more, check out the CodePipeline Documentation.


Jeff;

 

Internet Archive turns 20, gives birthday gifts to the world (Opensource.com)

Post Syndicated from ris original http://lwn.net/Articles/705593/rss

Opensource.com covers
the Internet Archive’s 20th birthday celebration. “Of all the projects announced during the event though, by far one of the most exciting and impressive is the newly released ability to search the complete contents of all text items on the Internet Archive. Nine million text items, covering hundreds of years of human history, are now searchable in an instant.

Red Hat Enterprise Linux 7.3

Post Syndicated from ris original http://lwn.net/Articles/705587/rss

Red Hat has announced
the release
of Red Hat Enterprise Linux 7.3. “This update to Red Hat’s flagship Linux operating system includes new features and enhancements built around performance, security, and reliability. The release also introduces new capabilities around Linux containers and the Internet of Things (IoT), designed to help early enterprise adopters use existing investments as they scale to meet new business demands.

Thursday’s security updates

Post Syndicated from ris original http://lwn.net/Articles/705557/rss

Arch Linux has updated curl (multiple vulnerabilities), lib32-curl (multiple vulnerabilities), lib32-libcurl-compat (multiple vulnerabilities), lib32-libcurl-gnutls (multiple vulnerabilities), libcurl-compat (multiple vulnerabilities), libcurl-gnutls (multiple vulnerabilities), tar (file overwrite), and tomcat6 (redirect HTTP traffic).

CentOS has updated bind (C6; C5: denial
of service) and bind97 (C5: denial of service).

Debian-LTS has updated bind9 (denial of service), bsdiff (denial of service), qemu (multiple vulnerabilities), spip (multiple vulnerabilities), and xen (information leak/corruption).

Mageia has updated openjpeg2 (multiple vulnerabilities).

openSUSE has updated bash (13.2:
code execution), ghostscript (Leap42.1:
insufficient parameter check), libxml2
(Leap42.1: code execution), and openslp
(Leap42.1: two vulnerabilities).

Oracle has updated bind (OL6; OL5:
denial of service) and bind97 (OL5: denial of service).

Red Hat has updated 389-ds-base
(RHEL7: three vulnerabilities), bind (RHEL7; RHEL5,6: denial of service), bind97 (RHEL5: denial of service), curl (RHEL7: three vulnerabilities), dhcp (RHEL7: denial of service), firewalld (RHEL7: authentication bypass), fontconfig (RHEL7: privilege escalation), gimp (RHEL7: use-after-free), glibc (RHEL7: three vulnerabilities), kernel (RHEL7: multiple vulnerabilities), kernel-rt (RHEL7: multiple vulnerabilities),
krb5 (RHEL7: two vulnerabilities), libguestfs and virt-p2v (RHEL7: information
leak), libreoffice (RHEL7: code execution),
libreswan (RHEL7: denial of service), libvirt (RHEL7: three vulnerabilities), mariadb (RHEL7: multiple vulnerabilities), mod_nss (RHEL7: invalid handling of +CIPHER
operator), nettle (RHEL7: multiple
vulnerabilities), NetworkManager (RHEL7:
information leak), ntp (RHEL7: multiple
vulnerabilities), openssh (RHEL7: privilege
escalation), pacemaker (RHEL7: denial of
service), pacemaker (RHEL7: privilege
escalation), pcs (RHEL7: two
vulnerabilities), php (RHEL7: multiple
vulnerabilities), poppler (RHEL7: code
execution), postgresql (RHEL7: two
vulnerabilities), powerpc-utils-python
(RHEL7: code execution), python (RHEL7:
code execution), qemu-kvm (RHEL7: two
vulnerabilities), resteasy-base (RHEL7:
code execution), squid (RHEL7: multiple
denial of service flaws), subscription-manager (RHEL7: information
disclosure), sudo (RHEL7: information
disclosure), systemd (RHEL7: denial of
service), tomcat (RHEL7: multiple
vulnerabilities), util-linux (RHEL7: denial
of service), and wget (RHEL7: code execution).

SUSE has updated bind (SLES-Pi-12-SP2; SOSC5, SMP2.1, SM2.1, SLE11-SP2,3,4: denial of
service) and curl (SLE11-SP4: multiple vulnerabilities).

Ubuntu has updated memcached
(code execution), nvidia-graphics-drivers-367 (16.04, 14.04,
12.04: privilege escalation), and openjdk-8
(16.10, 16.04: multiple vulnerabilities).

Results from the Linux Foundation Technical Advisory Board election

Post Syndicated from corbet original http://lwn.net/Articles/705486/rss

The 2016 Linux Foundation Technical
Advisory Board
election was held
November 2 at the combined Kernel Summit and Linux Plumbers Conference
events. Incumbent members Chris Mason and Peter Anvin were re-elected to
the board; they will be joined by new members Olof Johansson, Dan Williams,
and Rik van Riel. Thanks are due to outgoing members Grant Likely, Kristen
Accardi, and John Linville.

CERN Coding Pi Science Event

Post Syndicated from Laura Clay original https://www.raspberrypi.org/blog/cern-coding-pi-science-event/

Laura: MagPi founder and Scottish Pi event organiser extraordinaire Dr. William Bell has sent us this report from the home of the World Wide Web itself…

CERN is the heart of particle physics research, where scientists are working to discover new phenomena using high-energy equipment. These research challenges have driven inventions, such as the internet and superconducting magnets used by the Large Hadron Collider. Theoretical calculations and experimental analyses are both heavily reliant on computer programming, so it’s a great place to host a Raspberry Pi programming event.

20161007_134130

Babbage outside CERN

This year, Brice Copy organised a Coding Pi Science event on the 7th and 8th of October. Working together with long-term Pi supporter Alan McCullagh, he invited three teams to prepare kits to build and program with attendees. To motivate the teams and the other attendees, there were a series of talks on Friday evening; these included a general introduction to the CERN Micro Club and the EU Code Week, as well as a motivational talk on why computer programming is so important for scientific research. Each team then gave an overview of their project, in preparation for the workshop the next day.

On Saturday morning, the teams, volunteers, children, parents, and teachers started to build a muon detector (Muon Hunter), a robotic arm (Poppy Ergo Jr.), or a programmable WiFi car (GianoPi). The idea was to build a kit together with the team leaders and other volunteers, and then take the kit home to program it. These three kits provide different challenges: the Muon Hunter kit requires some soldering and uses a C programming interface, the Poppy Ergo Jr. snaps together and is driven using Snap, and the GianoPi needs soldering and is controlled by Blockly.

Programming Poppy Ergo Jr. in MicroClub Robotics

Programming Poppy Ergo Jr. in MicroClub Robotics

The Muon Hunter was designed by Mihaly Vadai, in collaboration with the CERN Micro Club. The kit includes two Geiger-Müller tubes to detect ionising radiation, a circuit board that produces the 400 volts needed to bias the tubes and read the signals, and an ARM microcontroller to form the coincidence between the two tubes. The circuit board can be directly connected to a Raspberry Pi to read out the signals and produce plots of the data.

Poppy Ergo Jr. was invented by the Flowers team at Inria Bordeaux, and presented by Stephanie Noirpoudre and Theo Segonds. Their projects are designed to encourage children to learn about computer programming through interacting with robots. The kit includes 3D-printed parts and several servo motors controlled by a Raspberry Pi mounted in the base of the robot. A Camera Module can be used to check the colour of objects, and forms part of their Snap programming examples.

GianoPi was designed by Stefania Saladino. It consist of four servo motors, multi-directional wheels, an ultrasonic sensor, a Pi Zero, a servo control HAT from Adafruit, a WiFi adapter, a battery pack, and some electronics to allow the kit to be easily turned on or off. Brice Copy created the software to interface with the GianoPi using Raspbuggy, which is a Blocky application. Similar to the Poppy Ergo Jr., the GianoPi is controlled over a network connection, allowing the robot to be remotely accessed.

Building GianoPi

Building GianoPi

It was an engaging weekend of soldering, building, and programming; hopefully, these kits will encourage even more exciting projects in the future. Alan certainly had fun trying to find a good place to put Babbage, too…

Babbage gets everywhere...

Babbage gets everywhere…

The post CERN Coding Pi Science Event appeared first on Raspberry Pi.

Teaching a Neural Network to Encrypt

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2016/11/teaching_a_neur.html

Researchers have trained a neural network to encrypt its communications.

In their experiment, computers were able to make their own form of encryption using machine learning, without being taught specific cryptographic algorithms. The encryption was very basic, especially compared to our current human-designed systems. Even so, it is still an interesting step for neural nets, which the authors state “are generally not meant to be great at cryptography:.

This story is more about AI and neural networks than it is about cryptography. The algorithm isn’t any good, but is a perfect example of what I’ve heard called “Schneier’s Law“: Anyone can design a cipher that they themselves cannot break.

Research paper. Note that the researchers work at Google.

In which I have to debunk a second time

Post Syndicated from Robert Graham original http://blog.erratasec.com/2016/11/in-which-i-have-to-debunk-second-time.html

So Slate is doubling-down on their discredited story of a secret Trump server. Tip for journalists: if you are going to argue against an expert debunking your story, try to contact that expert first, so they don’t have to do what I’m going to do here, showing obvious flaws. Also, pay attention to the data.

The experts didn’t find anything

The story claims:

“I spoke with many DNS experts. They found the evidence strongly suggestive of a relationship between the Trump Organization and the bank”.

No, he didn’t. He gave experts limited information and asked them whether it’s consistent with a conspiracy theory. He didn’t ask if it was “suggestive” of the conspiracy theory, or that this was the best theory that fit the data.

This is why “experts” quoted in the press need to go through “media training”, to avoid getting your reputation harmed by bad journalists who try their best to put words in your mouth. You’ll be trained to recognize bad journalists like this, and how not to get sucked into their fabrications.

Jean Camp isn’t an expert

On the other hand, Jean Camp isn’t an expert. I’ve never heard of her before. She gets details wrong. Take for example in this blogpost where she discusses lookups for the domain mail.trump-email.com.moscow.alfaintra.net. She says:

This query is unusual in that is merges two hostnames into one. It makes the most sense as a human error in inserting a new hostname in some dialog window, but neglected to hit the backspace to delete the old hostname.

Uh, no. It’s normal DNS behavior with non-FQDNs. If the lookup for a name fails, computers will try again, pasting the local domain on the end. In other words, when Twitter’s DNS was taken offline by the DDoS attack a couple weeks ago, those monitoring DNS saw a zillion lookups for names like “www.twitter.com.example.com“.

I’ve reproduced this on my desktop by configuring the suffix moscow.alfaintra.net.

I then pinged “mail1.trump-email.com” and captured the packets. As you can see, after the initial lookups fail, Windows tried appending the suffix.

I don’t know what Jean Camp is an expert of, but this is sorta a basic DNS concept. It’s surprising she’d get it wrong. Of course, she may be an expert in DNS who simply had a brain fart (this happens to all of us), but looking across her posts and tweets, she doesn’t seem to be somebody who has a lot of experience with DNS. Sorry for impugning her credibility, but that’s the way the story is written. It demands that we trust the quoted “experts”. 
Call up your own IT department at Slate. Ask your IT nerds if this is how DNS operates. Note: I’m saying your average, unremarkable IT nerds can debunk an “expert” you quote in your story.
Understanding “spam” and “blacklists”

The new article has a paragraph noting that the IP address doesn’t appear on spam blocklists:

Was the server sending spam—unsolicited mail—as opposed to legitimate commercial marketing? There are databases that assiduously and comprehensively catalog spam. I entered the internet protocal address for mail1.trump-email.com to check if it ever showed up in Spamhaus and DNSBL.info. There were no traces of the IP address ever delivering spam.

This is a profound misunderstanding of how these things work.

Colloquially, we call those sending mass marketing emails, like Cendyn, “spammers”. But those running blocklists have a narrower definition. If  emails contain an option to “opt-out” of future emails, then it’s technically not “spam”.

Cendyn is constantly getting added to blocklists when people complain. They spend considerable effort contacting the many organizations maintaining blocklists, proving they do “opt-outs”, and getting “white-listed” instead of “black-listed”. Indeed, the entire spam-blacklisting industry is a bit of scam — getting white-listed often involves a bit of cash.

Those maintaining blacklists only go back a few months. The article is in error saying there’s no record ever of Cendyn sending spam. Instead, if an address comes up clean, it means there’s no record for the past few months. And, if Cendyn is in the white-lists, there would be no record of “spam” at all, anyway.

As somebody who frequently scans the entire Internet, I’m constantly getting on/off blacklists. It’s a real pain. At the moment, my scanner address “209.126.230.71” doesn’t appear to be on any blacklists. Next time a scan kicks off, it’ll probably get added — but only by a few, because most have white-listed it.

There is no IP address limitation

The story repeats the theory, which I already debunked, that the server has a weird configuration that limits who can talk to it:

The scientists theorized that the Trump and Alfa Bank servers had a secretive relationship after testing the behavior of mail1.trump-email.com using sites like Pingability. When they attempted to ping the site, they received the message “521 lvpmta14.lstrk.net does not accept mail from you.”

No, that’s how Listrake (who is the one who actually controls the server) configures all their marketing servers. Anybody can confirm this themselves by ping all the servers in this range:
In case you don’t want to do scans yourself, you can look up on Shodan and see that there’s at least 4000 servers around the Internet who give the same error message.

Again, go back to Chris Davis in your original story ask him about this. He’ll confirm that there’s nothing nefarious or weird going on here, that it’s just how Listrak has decided to configure all it’s spam-sending engines.

Either this conspiracy goes much deeper, with hundreds of servers involved, or this is a meaningless datapoint.
Where did the DNS logs come from?
Tea Leaves and Jean Camp are showing logs of private communications. Where did these logs come from? This information isn’t public. It means somebody has done something like hack into Alfa Bank. Or it means researchers who monitor DNS (for maintaing DNS, and for doing malware research) have broken their NDAs and possibly the law.
The data is incomplete and inconsistent. Those who work for other companies, like Dyn, claim it doesn’t match their own data. We have good reason to doubt these logs. There’s a good chance that the source doesn’t have as comprehensive a view as “Tea Leaves” claim. There’s also a good chance the data has been manipulated.
Specifically, I have as source who claims records for trump-email.com were changed in June, meaning either my source or Tea Leaves is lying.
Until we know more about the source of the data, it’s impossible to believe the conclusions that only Alfa Bank was doing DNS lookups.

By the way, if you are a company like Alfa Bank, and you don’t want the “research” community from seeing leaked intranet DNS requests, then you should probably reconfigure your DNS resolvers. You’ll want to look into RFC7816 “query minimization”, supported by the Unbound and Knot resolvers.

Do the graphs show interesting things?

The original “Tea Leaves” researchers are clearly acting in bad faith. They are trying to twist the data to match their conclusions. For example, in the original article, they claim that peaks in the DNS activity match campaign events. But looking at the graph, it’s clear these are unrelated. It display the common cognitive bias of seeing patterns that aren’t there.
Likewise, they claim that the timing throughout the day matches what you’d expect from humans interacting back and forth between Moscow and New York. No. This is what the activity looks like, graphing the number of queries by hour:
As you can see, there’s no pattern. When workers go home at 5pm in New York City, it’s midnight in Moscow. If humans were involved, you’d expect an eight hour lull during that time. Likewise, when workers arrive at 9am in New York City, you expect a spike in traffic for about an hour until workers in Moscow go home. You see none of that here. What you instead see is a random distribution throughout the day — the sort of distribution you’d expect if this were DNS lookups from incoming spam.
The point is that we know the original “Tea Leaves” researchers aren’t trustworthy, that they’ve convinced themselves of things that just aren’t there.
Does Trump control the server in question?

OMG, this post asks the question, after I’ve debunked the original story, and still gotten the answer wrong.
The answer is that Listrak controls the server. Not even Cendyn controls it, really, they just contract services from Listrak. In other words, not only does Trump not control it, the next level company (Cendyn) also doesn’t control it.
Does Trump control the domain in question?
OMG, this new story continues to make the claim the Trump Organization controls the domain trump-email.com, despite my debunking that Cendyn controls the domain.
Look at the WHOIS info yourself. All the contact info goes to Cendyn. It fits the pattern Cendyn chooses for their campaigns.
  • trump-email.com
  • mjh-email.com
  • denihan-email.com
  • hyatt-email.com
Cendyn even spells “Trump Orgainzation” wrong.

There’s a difference between a “server” and a “name”

The article continues to make trivial technical errors, like confusing what a server is with what a domain name is. For example:

One of the intriguing facts in my original piece was that the Trump server was shut down on Sept. 23, two days after the New York Times made inquiries to Alfa Bank

The server has never been shutdown. Instead, the name “mail1.trump-email.com” was removed from Cendyn’s DNS servers.
It’s impossible to debunk everything in these stories because they garble the technical details so much that it’s impossible to know what the heck they are claiming.
Why did Cendyn change things after Alfa Bank was notified?

It’s a curious coincidence that Cendyn changed their DNS records a couple days after the NYTimes contacted Alfa Bank.
But “coincidence” is all it is. I have years of experience with investigating data breaches. I know that such coincidences abound. There’s always weird coincidence that you are certain are meaningful, but which by the end of the investigation just aren’t.
The biggest source of coincidences is that IT is always changing things and always messing things up. It’s the nature of IT. Thus, you’ll always see a change in IT that matches some other event. Those looking for conspiracies ignore the changes that don’t match, and focus on the one that does, so it looms suspiciously.
As I’ve mentioned before, I have source that says Cendyn changed things around in June. This makes me believe that “Tea Leaves” is editing changes to highlight the one in September.
In any event, many people have noticed that the registrar email “Emily McMullin” has the same last name as Evan McMullin running against Trump in Utah. This supports my point: when you do hacking investigations, you find irrelevant connections all over the freakin’ place.
“Experts stand by their analysis”

This new article states:

I’ve checked back with eight of the nine computer scientists and engineers I consulted for my original story, and they all stood by their fundamental analysis

Well, of course, they don’t want to look like idiots. But notice the subtle rephrasing of the question: the experts stand by their analysis. It doesn’t mean the same thing as standing behind the reporters analysis. The experts made narrow judgements, which even I stand behind as mostly correct, given the data they were given at the time. None of them were asked whether the entire conspiracy theory holds up.
What you should ask is people like Chris Davis or Paul Vixie whether they stand behind my analysis in the past two posts. Or really, ask any expert. I’ve documented things in sufficient clarity. For example, go back to Chris Davis and ask him again about the “limited IP address” theory, and whether it holds up against my scan of that data center above.
Conclusion

Other major news outlets all passed on the story, because even non experts know it’s flawed. The data means nothing. The Slate journalist nonetheless went forward with the story, tricking experts, and finding some non-experts.
But as I’ve shown, given a complete technical analysis, the story falls apart. Most of what’s strange is perfectly normal. The data itself (the DNS logs) are untrustworthy. It builds upon unknown things (like how the mail server rejects IP address) as “unknowable” things that confirm the conspiracy, when they are in fact simply things unknown at the current time, which can become knowable with a little research.

What I show in my first post, and this post, is more data. This data shows context. This data explains the unknowns that Slate present. Moreover, you don’t have to trust me — anybody can replicate my work and see for themselves.


Mesa 13.0.0 released

Post Syndicated from jake original http://lwn.net/Articles/705420/rss

The Mesa project has announced version 13.0.0 of the 3D graphics library that provides an open-source implementation of OpenGL. “This release has huge amount of features, but without a doubt the biggest
ones are:
Vulkan driver for hardware supported by the AMDGPU kernel driver [and]
OpenGL 4.4/4.5 capability, yet the drivers may expose lower version due to
pending Khronos CTS validation.

Eben Moglen on GPL Compliance and Building Communities: What Works (Linux.com)

Post Syndicated from corbet original http://lwn.net/Articles/705416/rss

Linux.com has a
transcript of Eben Moglen’s talk
in New York on October 28.
I have some fine clients and wonderful friends in this movement who
have been getting rather angry recently. There is a lot of anger in the
world, in fact, in politics. Our political movement is not the only one
suffering from anger at the moment. But some of my angry friends, dear
friends, friends I really care for, have come to the conclusion that
they’re on a jihad for free software. And I will say this after decades of
work—whatever else will be the drawbacks in other areas of life—the problem
in our neighborhood is that jihad does not scale.

There is a video of the talk available as well.

New – Sending Metrics for Amazon Simple Email Service (SES)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-sending-metrics-for-amazon-simple-email-service-ses/

Amazon Simple Email Service (SES) focuses on deliverability – getting email through to the intended recipients. In my launch blog post (Introducing the Amazon Simple Email Service), I noted that several factors influence delivery, including the level of trust that you have earned with multiple Internet Service Providers (ISPs) and the number of complaints and bounces that you generate.

Today we are launching a new set of sending metrics for SES. There are two aspects to this feature:

Event Stream – You can configure SES to publish a JSON-formatted record to Amazon Kinesis Firehose each time a significant event (sent, rejected, delivered, bounced, or complaint generated) occurs.

Metrics – You can configure SES to publish aggregate metrics to Amazon CloudWatch. You can add one or more tags to each message and use them as CloudWatch dimensions. Tagging messages gives you the power to track deliverability based on campaign, team, department, and so forth.  You can then use this information to fine-tune your messages and your email strategy.

To learn more, read about Email Sending Metrics on the Amazon Simple Email Service Blog.


Jeff;

Month in Review: October 2016

Post Syndicated from Derek Young original https://aws.amazon.com/blogs/big-data/month-in-review-october-2016/

Another month of big data solutions on the Big Data Blog. Take a look at our summaries below and learn, comment, and share. Thanks for reading!

Building Event-Driven Batch Analytics on AWS
Modern businesses typically collect data from internal and external sources at various frequencies throughout the day. In this post, you learn an elastic and modular approach for how to collect, process, and analyze data for event-driven applications in AWS.

How Eliza Corporation Moved Healthcare Data to the Cloud
Eliza Corporation, a company that focuses on health engagement management, acts on behalf of healthcare organizations such as hospitals, clinics, pharmacies, and insurance companies. This allows them to engage people at the right time, with the right message, and in the right medium. By meeting them where they are in life, Eliza can capture relevant metrics and analyze the overall value provided by healthcare. In this post, you explore some of the practical challenges faced during the implementation of the data lake for Eliza and the corresponding details of the ways NorthBay solved these issues with AWS.

Optimizing Amazon S3 for High Concurrency in Distributed Workloads
This post demonstrates how to optimize Amazon S3 for an architecture commonly used to enable genomic data analyses. Although the focus of this post is on genomic data analyses, the optimization can be used in any discipline that has individual source data that must be analyzed together at scale.

Running sparklyr – RStudio’s R Interface to Spark on Amazon EMR
Sparklyr is an R interface to Spark that allows users to use Spark as the backend for dplyr, one of the most popular data manipulation packages. Sparklyr provides interfaces to Spark packages and also allows users to query data in Spark using SQL and develop extensions for the full Spark API. This short post shows you how to run RStudio and sparklyr on EMR.

Fact or Fiction: Google BigQuery Outperforms Amazon Redshift as an Enterprise Data Warehouse?
One of the great things about the cloud is the transparency that customers have in testing and debunking overstated performance claims and misleading “benchmark” tests. This transparency encourages the best cloud vendors to publish clear and repeatable performance metrics, making it faster and easier for their customers to select the right cloud service for a given workload. To verify Google’s recent performance claims with our own testing, we ran the full TPC-H benchmark, consisting of all 22 queries, using a 10 TB dataset on Amazon Redshift against the latest version of BigQuery.

Using pgpool and Amazon ElastiCache for Query Caching with Amazon Redshift
It is easy to implement a caching solution using pgpool with Amazon Redshift and Amazon Elasticache. This solution significantly improves the end-user experience and alleviate the load on your cluster by orders of magnitude. In this blog post, we’ll use a real customer scenario to show you how to create a caching layer in front of Amazon Redshift using pgpool and Amazon ElastiCache.

FROM THE ARCHIVE

Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE (July 2016)
Managing a hybrid cluster of both CPU and GPU instances poses challenges because cluster managers such as Yarn/Mesos do not natively support GPUs. Even if they did have native GPU support, the open source deep learning libraries would have to be re-written to work with the cluster manager API. This post discusses an alternate solution; namely, running separate CPU and GPU clusters, and driving the end-to-end modeling process from Apache Spark.

 

———————————————–

Want to learn more about Big Data or Streaming Data? Check out our Big Data and Streaming data educational pages.

Leave a comment below to let us know what big data topics you’d like to see next on the AWS Big Data Blog.

The collective thoughts of the interwebz

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close