Tag Archives: website

Connect Veeam to the B2 Cloud: Episode 3 — Using OpenDedup

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/opendedup-for-cloud-storage/

Veeam backup to Backblaze B2 logo

In this, the third post in our series on connecting Veeam with Backblaze B2 Cloud Storage, we discuss how to back up your VMs to B2 using Veeam and OpenDedup. In our previous posts, we covered how to connect Veeam to the B2 cloud using Synology, and how to connect Veeam with B2 using StarWind VTL.

Deduplication and OpenDedup

Deduplication is simply the process of eliminating redundant data on disk. Deduplication reduces storage space requirements, improves backup speed, and lowers backup storage costs. The dedup field used to be dominated by a few big-name vendors who sold dedup systems that were too expensive for most of the SMB market. Then an open-source challenger came along in OpenDedup, a project that produced the Space Deduplication File System (SDFS). SDFS provides many of the features of commercial dedup products without their cost.

OpenDedup provides inline deduplication that can be used with applications such as Veeam, Veritas Backup Exec, and Veritas NetBackup.

Features Supported by OpenDedup:

  • Variable Block Deduplication to cloud storage
  • Local Data Caching
  • Encryption
  • Bandwidth Throttling
  • Fast Cloud Recovery
  • Windows and Linux Support

Why use Veeam with OpenDedup to Backblaze B2?

With your VMs backed up to B2, you have a number of options to recover from a disaster. If the unexpected occurs, you can quickly restore your VMs from B2 to the location of your choosing. You also have the option to bring up cloud compute through B2’s compute partners, thereby minimizing any loss of service and ensuring business continuity.

Veeam logo + OpenDedup logo + Backblaze B2 logo

Backblaze’s B2 is an ideal solution for backing up Veeam’s backup repository due to B2’s combination of low-cost and high availability. Users of B2 save up to 75% compared to other cloud solutions such as Microsoft Azure, Amazon AWS, or Google Cloud Storage. When combined with OpenDedup’s no-cost deduplication, you’re got an efficient and economical solution for backing up VMs to the cloud.

How to Use OpenDedup with B2

For step-by-step instructions for how to set up OpenDedup for use with B2 on Windows or Linux, see Backblaze B2 Enabled on the OpenDedup website.

Are you backing up Veeam to B2 using one of the solutions we’ve written about in this series? If you have, we’d love to hear from you in the comments.

View all posts in the Veeam series.

The post Connect Veeam to the B2 Cloud: Episode 3 — Using OpenDedup appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Sending Inaudible Commands to Voice Assistants

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/05/sending_inaudib.html

Researchers have demonstrated the ability to send inaudible commands to voice assistants like Alexa, Siri, and Google Assistant.

Over the last two years, researchers in China and the United States have begun demonstrating that they can send hidden commands that are undetectable to the human ear to Apple’s Siri, Amazon’s Alexa and Google’s Assistant. Inside university labs, the researchers have been able to secretly activate the artificial intelligence systems on smartphones and smart speakers, making them dial phone numbers or open websites. In the wrong hands, the technology could be used to unlock doors, wire money or buy stuff online ­– simply with music playing over the radio.

A group of students from University of California, Berkeley, and Georgetown University showed in 2016 that they could hide commands in white noise played over loudspeakers and through YouTube videos to get smart devices to turn on airplane mode or open a website.

This month, some of those Berkeley researchers published a research paper that went further, saying they could embed commands directly into recordings of music or spoken text. So while a human listener hears someone talking or an orchestra playing, Amazon’s Echo speaker might hear an instruction to add something to your shopping list.

Details on a New PGP Vulnerability

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/05/details_on_a_ne.html

A new PGP vulnerability was announced today. Basically, the vulnerability makes use of the fact that modern e-mail programs allow for embedded HTML objects. Essentially, if an attacker can intercept and modify a message in transit, he can insert code that sends the plaintext in a URL to a remote website. Very clever.

The EFAIL attacks exploit vulnerabilities in the OpenPGP and S/MIME standards to reveal the plaintext of encrypted emails. In a nutshell, EFAIL abuses active content of HTML emails, for example externally loaded images or styles, to exfiltrate plaintext through requested URLs. To create these exfiltration channels, the attacker first needs access to the encrypted emails, for example, by eavesdropping on network traffic, compromising email accounts, email servers, backup systems or client computers. The emails could even have been collected years ago.

The attacker changes an encrypted email in a particular way and sends this changed encrypted email to the victim. The victim’s email client decrypts the email and loads any external content, thus exfiltrating the plaintext to the attacker.

A few initial comments:

1. Being able to intercept and modify e-mails in transit is the sort of thing the NSA can do, but is hard for the average hacker. That being said, there are circumstances where someone can modify e-mails. I don’t mean to minimize the seriousness of this attack, but that is a consideration.

2. The vulnerability isn’t with PGP or S/MIME itself, but in the way they interact with modern e-mail programs. You can see this in the two suggested short-term mitigations: “No decryption in the e-mail client,” and “disable HTML rendering.”

3. I’ve been getting some weird press calls from reporters wanting to know if this demonstrates that e-mail encryption is impossible. No, this just demonstrates that programmers are human and vulnerabilities are inevitable. PGP almost certainly has fewer bugs than your average piece of software, but it’s not bug free.

3. Why is anyone using encrypted e-mail anymore, anyway? Reliably and easily encrypting e-mail is an insurmountably hard problem for reasons having nothing to do with today’s announcement. If you need to communicate securely, use Signal. If having Signal on your phone will arouse suspicion, use WhatsApp.

I’ll post other commentaries and analyses as I find them.

EDITED TO ADD (5/14): News articles.

Slashdot thread.

Some notes on eFail

Post Syndicated from Robert Graham original https://blog.erratasec.com/2018/05/some-notes-on-efail.html

I’ve been busy trying to replicate the “eFail” PGP/SMIME bug. I thought I’d write up some notes.

PGP and S/MIME encrypt emails, so that eavesdroppers can’t read them. The bugs potentially allow eavesdroppers to take the encrypted emails they’ve captured and resend them to you, reformatted in a way that allows them to decrypt the messages.

Disable remote/external content in email

The most important defense is to disable “external” or “remote” content from being automatically loaded. This is when HTML-formatted emails attempt to load images from remote websites. This happens legitimately when they want to display images, but not fill up the email with them. But most of the time this is illegitimate, they hide images on the webpage in order to track you with unique IDs and cookies. For example, this is the code at the end of an email from politician Bernie Sanders to his supporters. Notice the long random number assigned to track me, and the width/height of this image is set to one pixel, so you don’t even see it:

Such trackers are so pernicious they are disabled by default in most email clients. This is an example of the settings in Thunderbird:

The problem is that as you read email messages, you often get frustrated by the fact the error messages and missing content, so you keep adding exceptions:

The correct defense against this eFail bug is to make sure such remote content is disabled and that you have no exceptions, or at least, no HTTP exceptions. HTTPS exceptions (those using SSL) are okay as long as they aren’t to a website the attacker controls. Unencrypted exceptions, though, the hacker can eavesdrop on, so it doesn’t matter if they control the website the requests go to. If the attacker can eavesdrop on your emails, they can probably eavesdrop on your HTTP sessions as well.

Some have recommended disabling PGP and S/MIME completely. That’s probably overkill. As long as the attacker can’t use the “remote content” in emails, you are fine. Likewise, some have recommend disabling HTML completely. That’s not even an option in any email client I’ve used — you can disable sending HTML emails, but not receiving them. It’s sufficient to just disable grabbing remote content, not the rest of HTML email rendering.

I couldn’t replicate the direct exfiltration

There rare two related bugs. One allows direct exfiltration, which appends the decrypted PGP email onto the end of an IMG tag (like one of those tracking tags), allowing the entire message to be decrypted.

An example of this is the following email. This is a standard HTML email message consisting of multiple parts. The trick is that the IMG tag in the first part starts the URL (blog.robertgraham.com/…) but doesn’t end it. It has the starting quotes in front of the URL but no ending quotes. The ending will in the next chunk.

The next chunk isn’t HTML, though, it’s PGP. The PGP extension (in my case, Enignmail) will detect this and automatically decrypt it. In this case, it’s some previous email message I’ve received the attacker captured by eavesdropping, who then pastes the contents into this email message in order to get it decrypted.

What should happen at this point is that Thunderbird will generate a request (if “remote content” is enabled) to the blog.robertgraham.com server with the decrypted contents of the PGP email appended to it. But that’s not what happens. Instead, I get this:

I am indeed getting weird stuff in the URL (the bit after the GET /), but it’s not the PGP decrypted message. Instead what’s going on is that when Thunderbird puts together a “multipart/mixed” message, it adds it’s own HTML tags consisting of lines between each part. In the email client it looks like this:

The HTML code it adds looks like:

That’s what you see in the above URL, all this code up to the first quotes. Those quotes terminate the quotes in the URL from the first multipart section, causing the rest of the content to be ignored (as far as being sent as part of the URL).

So at least for the latest version of Thunderbird, you are accidentally safe, even if you have “remote content” enabled. Though, this is only according to my tests, there may be a work around to this that hackers could exploit.

STARTTLS

In the old days, email was sent plaintext over the wire so that it could be passively eavesdropped on. Nowadays, most providers send it via “STARTTLS”, which sorta encrypts it. Attackers can still intercept such email, but they have to do so actively, using man-in-the-middle. Such active techniques can be detected if you are careful and look for them.
Some organizations don’t care. Apparently, some nation states are just blocking all STARTTLS and forcing email to be sent unencrypted. Others do care. The NSA will passively sniff all the email they can in nations like Iraq, but they won’t actively intercept STARTTLS messages, for fear of getting caught.
The consequence is that it’s much less likely that somebody has been eavesdropping on you, passively grabbing all your PGP/SMIME emails. If you fear they have been, you should look (e.g. send emails from GMail and see if they are intercepted by sniffing the wire).

You’ll know if you are getting hacked

If somebody attacks you using eFail, you’ll know. You’ll get an email message formatted this way, with multipart/mixed components, some with corrupt HTML, some encrypted via PGP. This means that for the most part, your risk is that you’ll be attacked only once — the hacker will only be able to get one message through and decrypt it before you notice that something is amiss. Though to be fair, they can probably include all the emails they want decrypted as attachments to the single email they sent you, so the risk isn’t necessarily that you’ll only get one decrypted.
As mentioned above, a lot of attackers (e.g. the NSA) won’t attack you if its so easy to get caught. Other attackers, though, like anonymous hackers, don’t care.
Somebody ought to write a plugin to Thunderbird to detect this.

Summary

It only works if attackers have already captured your emails (though, that’s why you use PGP/SMIME in the first place, to guard against that).
It only works if you’ve enabled your email client to automatically grab external/remote content.
It seems to not be easily reproducible in all cases.
Instead of disabling PGP/SMIME, you should make sure your email client hast remote/external content disabled — that’s a huge privacy violation even without this bug.

Notes: The default email client on the Mac enables remote content by default, which is bad:

Police Launch Investigation into Huge Pirate Manga Site Mangamura

Post Syndicated from Andy original https://torrentfreak.com/police-launch-investigation-into-huge-pirate-manga-site-mangamura-180514/

Back in March, Japan’s Chief Cabinet Secretary Yoshihide Suga said that the government was considering measures to prohibit access to pirate sites.

While protecting all content is the overall aim, it became clear that the government was determined to protect Japan’s successful manga and anime industries.

It didn’t take long for a reaction. On Friday April 13, the government introduced emergency website blocking measures, seeking cooperation from the country’s ISPs.

NTT Communications Corp., NTT Docomo Inc. and NTT Plala Inc., quickly announced they would block three leading pirate sites – Mangamura, AniTube! and MioMio which have a huge following in Japan. However, after taking the country by storm during the past two years, Mangamura had already called it quits.

On April 17, in the wake of the government announcement, Mangamura disappeared. It’s unclear whether its vanishing act was directly connected to recent developments but a program on national public broadcasting organization NHK, which claimed to have traced the site’s administrators back to the United States, Ukraine, and other regions, can’t have helped.

Further details released this morning reveal the intense pressure Mangamura was under. With 100 million visits a month it was bound to attract attention and according to Mainichi, several publishing giants ran out of patience last year and reported the platform to the authorities.

Kodansha, Japan’s largest publisher, and three other companies filed criminal complaints with Fukuoka Prefectural Police, Oita Prefectural Police, and other law enforcement departments, claiming the site violated their rights.

“The complaints, which were lodged against an unknown suspect or suspects, were filed on behalf of manga artists who are copyright holders to the pirated works, including Hajime Isayama and Eiichiro Oda, known for their wildly popular ‘Shingeki no Kyojin’ (‘Attack on Titan,’ published by Kodansha) and ‘One Piece’ (Shueisha Inc.), respectively,” the publication reports.

Mangamura launch in January 2016 and became a huge hit in Japan. Anti-piracy group Content Overseas Distribution Association (CODA), which counts publishing giant Kodansha among its members, reports that between September 2017 and February 2018, the site was accessed 620 million times.

Based on a “one visit, one manga title read” formula, CODA estimates that the site caused damages to the manga industry of 319.2 billion yen – around US$2.91 billion.

As a result, police are now stepping up their efforts to identify Mangamura’s operators. Whether that will prove fruitful will remain to be seen but in the meantime, Japan’s site-blocking efforts continue to cause controversy.

As reported last month, lawyer and NTT customer Yuichi Nakazawa launched legal action against NTT, demanding that the corporation immediately end its site-blocking operations.

“NTT’s decision was made arbitrarily on the site without any legal basis. No matter how legitimate the objective of copyright infringement is, it is very dangerous,” Nakazawa told TorrentFreak.

“I felt that ‘freedom,’ which is an important value of the Internet, was threatened. Actually, when the interruption of communications had begun, the company thought it would be impossible to reverse the situation, so I filed a lawsuit at this stage.”

Japan’s Constitution and its Telecommunications Business Act both have “no censorship” clauses, meaning that site-blocking has the potential to be ruled illegal. It’s also illegal in Japan to invade the privacy of Internet users’ communications, which some observers have argued is necessary if users are to be prevented from accessing pirate sites.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

‘Anonymous’ Hackers Deface Russian Govt. Site to Protest Web-Blocking (NSFW)

Post Syndicated from Andy original https://torrentfreak.com/anonymous-hackers-deface-russian-govt-site-to-protest-web-blocking-nsfw-180512/

Last month, Russian authorities demonstrated that when an entity breaks local Internet rules, no stone will be left unturned to make them pay, whatever the cost.

The disaster waiting to happen began when encrypted messaging service Telegram refused to hand over its encryption keys to the state. In response, the Federal Security Service filed a lawsuit, which it won, compelling it Telegram do so. With no response, Roscomnadzor obtained a court order to have Telegram blocked.

In a massive response, Russian ISPs – at Roscomnadzor’s behest – began mass-blocking IP addresses on a massive scale. Millions of IP addresses belong to Amazon, Google and other innocent parties were rendered inaccessible in Russia, causing chaos online.

Even VPN providers were targeted for facilitating access to Telegram but while the service strained under the pressure, it never went down and continues to function today.

In the wake of the operation there has been some attempt at a cleanup job, with Roscomnadzor announcing this week that it had unblocked millions of IP addresses belonging to Google.

“As part of a package of the measures to enforce the court’s decision on Telegram, Roskomnadzor has removed six Google subnets (more than 3.7 million IP-addresses) from the blocklist,” the telecoms watchdog said in a statement.

“In this case, the IP addresses of Telegram, which are part of these subnets, are fully installed and blocked. Subnets are unblocked in order to ensure the correct operation of third-party Internet resources.”

But while Roscomnadzor attempts to calm the seas, those angered by Russia’s carpet-bombing of the Internet were determined to make their voices heard. Hackers attacked the website of the Federal Agency for International Cooperation this week, defacing it with scathing criticism combined with NSFW suggestions and imagery.

“Greetings, Roskomnadzor,” the message began.

“Your recent destructive actions towards the Russian internet sector have led us to believe that you are nothing but a bunch of incompetent mindless worms. You shall not be able to continue this pointless vandalism any further.”

Signing off with advice to consider the defacement as a “final warning”, the hackers disappeared into the night after leaving a simple signature.

“Yours, Anonymous,” they wrote.

But the hackers weren’t done yet. In a NSFW cartoon strip that probably explains itself, ‘Anonymous’ suggested that Roscomnadzor should perhaps consider blocking itself, with the implement depicted in the final frame.

“Anus, block yourself Roscomnadzor”

But while Russia’s attack on Telegram raises eyebrows worldwide, the actions of those in authority continue to baffle.

Last week, Prime Minister Dmitry Medvedev’s press secretary, Natalia Timakova, publicly advised a colleague to circumvent the Telegram blockade using a VPN, effectively undermining the massive efforts of the authorities. This week the head of Roscomnadzor only added to the confusion.

Effectively quashing rumors that he’d resigned due to the Telegram fiasco, Alexander Zharov had a conversation with the editor-in-chief of radio station ‘Says Moscow’.

During the liason, which took place during the Victory Parade in Red Square, Zharov was asked how he could be contacted. When Telegram was presented as a potential method, Zharov confirmed that he could be reached via the platform.

Finally, in a move that’s hoped could bring an end to the attack on the platform and others like it, Telegram filed an appeal this week challenging a decision by the Supreme Court of Russia which allows the Federal Security Service to demand access to encryption keys.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Introducing the AWS Machine Learning Competency for Consulting Partners

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/introducing-the-aws-machine-learning-competency-for-consulting-partners/

Today I’m excited to announce a new Machine Learning Competency for Consulting Partners in the Amazon Partner Network (APN). This AWS Competency program allows APN Consulting Partners to demonstrate a deep expertise in machine learning on AWS by providing solutions that enable machine learning and data science workflows for their customers. This new AWS Competency is in addition to the Machine Learning comptency for our APN Technology Partners, that we launched at the re:Invent 2017 partner summit.

These APN Consulting Partners help organizations solve their machine learning and data challenges through:

  • Providing data services that help data scientists and machine learning practitioners prepare their enterprise data for training.
  • Platform solutions that provide data scientists and machine learning practitioners with tools to take their data, train models, and make predictions on new data.
  • SaaS and API solutions to enable predictive capabilities within customer applications.

Why work with an AWS Machine Learning Competency Partner?

The AWS Competency Program helps customers find the most qualified partners with deep expertise. AWS Machine Learning Competency Partners undergo a strict validation of their capabilities to demonstrate technical proficiency and proven customer success with AWS machine learning tools.

If you’re an AWS customer interested in machine learning workloads on AWS, check out our AWS Machine Learning launch partners below:

 

Interested in becoming an AWS Machine Learning Competency Partner?

APN Partners with experience in Machine Learning can learn more about becoming an AWS Machine Learning Competency Partner here. To learn more about the benefits of joining the AWS Partner Network, see our APN Partner website.

Thanks to the AWS Partner Team for their help with this post!
Randall

Bell/TSN Letter to University Connects Site-Blocking Support to Students’ Futures

Post Syndicated from Andy original https://torrentfreak.com/bell-tsn-letter-to-university-connects-site-blocking-support-to-students-futures-180510/

In January, a coalition of Canadian companies called on local telecoms regulator CRTC to implement a website-blocking regime in Canada.

The coalition, Fairplay Canada, is a collection of organizations and companies with ties to the entertainment industries and includes Bell, Cineplex, Directors Guild of Canada, Maple Leaf Sports and Entertainment, Movie Theatre Association of Canada, and Rogers Media. Its stated aim is to address Canada’s online piracy problems.

While CTRC reviews FairPlay Canada’s plans, the coalition has been seeking to drum up support for the blocking regime, encouraging a diverse range of supporters to send submissions endorsing the project. Of course, building a united front among like-minded groups is nothing out of the ordinary but a situation just uncovered by Canadian law Professor Micheal Geist, one of the most vocal opponents of the proposed scheme, is bound to raise eyebrows.

Geist discovered a submission by Brian Hutchings, who works as Vice-President, Administration at Brock University in Ontario. Dated March 22, 2018, it notes that one of the university’s most sought-after programs is Sports Management, which helps Brock’s students to become “the lifeblood” of Canada’s sport and entertainment industries.

“Our University is deeply alarmed at how piracy is eroding an industry that employs so many of our co-op students and graduates. Piracy is a serious, pervasive threat that steals creativity, undermines investment in content development and threatens the survival of an industry that is also part of our national identity,” the submission reads.

“Brock ardently supports the FairPlay Canada coalition of more than 25 organizations involved in every aspect of Canada’s film, TV, radio, sports entertainment and music industries. Specifically, we support the coalition’s request that the CRTC introduce rules that would disable access in Canada to the most egregious piracy sites, similar to measures that have been taken in the UK, France and Australia. We are committed to assist the members of the coalition and the CRTC in eliminating the theft of digital content.”

The letter leaves no doubt that Brock University as a whole stands side-by-side with Fairplay Canada but according to a subsequent submission signed by Michelle Webber, President, Brock University Faculty Association (BUFA), nothing could be further from the truth.

Noting that BUFA unanimously supports the position of the Canadian Association of University Teachers which opposes the FairPlay proposal, Webber adds that BUFA stands in opposition to the submission by Brian Hutchings on behalf of Brock University.

“Vice President Hutching’s intervention was undertaken without consultation with the wider Brock University community, including faculty, librarians, and Senate; therefore, his submission should not be seen as indicative of the views of Brock University as a whole.”

BUFA goes on to stress the importance of an open Internet to researchers and educators while raising concerns that the blocking proposals could threaten the principles of net neutrality in Canada.

While the undermining of Hutching’s position is embarrassing enough, via access to information laws Geist has also been able to reveal the chain of events that prompted the Vice-President to write a letter of support on behalf of the whole university.

It began with an email sent by former Brock professor Cheri Bradish to Mark Milliere, TSN’s Senior Vice President and General Manager, with Hutchings copied in. The idea was to connect the pair, with the suggestion that supporting the site-blocking plan would help to mitigate the threat to “future work options” for students.

What followed was a direct email from Mark Milliere to Brian Hutchings, in which the former laid out the contributions his company makes to the university, while again suggesting that support for site-blocking would be in the long-term interests of students seeking employment in the industry.

On March 23, Milliere wrote to Hutchings again, thanking him for “a terrific letter” and stating that “If you need anything from TSN, just ask.”

This isn’t the first time that Bell has asked those beholden to the company to support its site-blocking plans.

Back in February it was revealed that the company had asked its own employees to participate in the site-blocking submission process, without necessarily revealing their affiliations with the company.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

AWS Online Tech Talks – May and Early June 2018

Post Syndicated from Devin Watson original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-may-and-early-june-2018/

AWS Online Tech Talks – May and Early June 2018  

Join us this month to learn about some of the exciting new services and solution best practices at AWS. We also have our first re:Invent 2018 webinar series, “How to re:Invent”. Sign up now to learn more, we look forward to seeing you.

Note – All sessions are free and in Pacific Time.

Tech talks featured this month:

Analytics & Big Data

May 21, 2018 | 11:00 AM – 11:45 AM PT Integrating Amazon Elasticsearch with your DevOps Tooling – Learn how you can easily integrate Amazon Elasticsearch Service into your DevOps tooling and gain valuable insight from your log data.

May 23, 2018 | 11:00 AM – 11:45 AM PTData Warehousing and Data Lake Analytics, Together – Learn how to query data across your data warehouse and data lake without moving data.

May 24, 2018 | 11:00 AM – 11:45 AM PTData Transformation Patterns in AWS – Discover how to perform common data transformations on the AWS Data Lake.

Compute

May 29, 2018 | 01:00 PM – 01:45 PM PT – Creating and Managing a WordPress Website with Amazon Lightsail – Learn about Amazon Lightsail and how you can create, run and manage your WordPress websites with Amazon’s simple compute platform.

May 30, 2018 | 01:00 PM – 01:45 PM PTAccelerating Life Sciences with HPC on AWS – Learn how you can accelerate your Life Sciences research workloads by harnessing the power of high performance computing on AWS.

Containers

May 24, 2018 | 01:00 PM – 01:45 PM PT – Building Microservices with the 12 Factor App Pattern on AWS – Learn best practices for building containerized microservices on AWS, and how traditional software design patterns evolve in the context of containers.

Databases

May 21, 2018 | 01:00 PM – 01:45 PM PTHow to Migrate from Cassandra to Amazon DynamoDB – Get the benefits, best practices and guides on how to migrate your Cassandra databases to Amazon DynamoDB.

May 23, 2018 | 01:00 PM – 01:45 PM PT5 Hacks for Optimizing MySQL in the Cloud – Learn how to optimize your MySQL databases for high availability, performance, and disaster resilience using RDS.

DevOps

May 23, 2018 | 09:00 AM – 09:45 AM PT.NET Serverless Development on AWS – Learn how to build a modern serverless application in .NET Core 2.0.

Enterprise & Hybrid

May 22, 2018 | 11:00 AM – 11:45 AM PTHybrid Cloud Customer Use Cases on AWS – Learn how customers are leveraging AWS hybrid cloud capabilities to easily extend their datacenter capacity, deliver new services and applications, and ensure business continuity and disaster recovery.

IoT

May 31, 2018 | 11:00 AM – 11:45 AM PTUsing AWS IoT for Industrial Applications – Discover how you can quickly onboard your fleet of connected devices, keep them secure, and build predictive analytics with AWS IoT.

Machine Learning

May 22, 2018 | 09:00 AM – 09:45 AM PTUsing Apache Spark with Amazon SageMaker – Discover how to use Apache Spark with Amazon SageMaker for training jobs and application integration.

May 24, 2018 | 09:00 AM – 09:45 AM PTIntroducing AWS DeepLens – Learn how AWS DeepLens provides a new way for developers to learn machine learning by pairing the physical device with a broad set of tutorials, examples, source code, and integration with familiar AWS services.

Management Tools

May 21, 2018 | 09:00 AM – 09:45 AM PTGaining Better Observability of Your VMs with Amazon CloudWatch – Learn how CloudWatch Agent makes it easy for customers like Rackspace to monitor their VMs.

Mobile

May 29, 2018 | 11:00 AM – 11:45 AM PT – Deep Dive on Amazon Pinpoint Segmentation and Endpoint Management – See how segmentation and endpoint management with Amazon Pinpoint can help you target the right audience.

Networking

May 31, 2018 | 09:00 AM – 09:45 AM PTMaking Private Connectivity the New Norm via AWS PrivateLink – See how PrivateLink enables service owners to offer private endpoints to customers outside their company.

Security, Identity, & Compliance

May 30, 2018 | 09:00 AM – 09:45 AM PT – Introducing AWS Certificate Manager Private Certificate Authority (CA) – Learn how AWS Certificate Manager (ACM) Private Certificate Authority (CA), a managed private CA service, helps you easily and securely manage the lifecycle of your private certificates.

June 1, 2018 | 09:00 AM – 09:45 AM PTIntroducing AWS Firewall Manager – Centrally configure and manage AWS WAF rules across your accounts and applications.

Serverless

May 22, 2018 | 01:00 PM – 01:45 PM PTBuilding API-Driven Microservices with Amazon API Gateway – Learn how to build a secure, scalable API for your application in our tech talk about API-driven microservices.

Storage

May 30, 2018 | 11:00 AM – 11:45 AM PTAccelerate Productivity by Computing at the Edge – Learn how AWS Snowball Edge support for compute instances helps accelerate data transfers, execute custom applications, and reduce overall storage costs.

June 1, 2018 | 11:00 AM – 11:45 AM PTLearn to Build a Cloud-Scale Website Powered by Amazon EFS – Technical deep dive where you’ll learn tips and tricks for integrating WordPress, Drupal and Magento with Amazon EFS.

 

 

 

 

Firefox 60 released

Post Syndicated from ris original https://lwn.net/Articles/754040/rss

Mozilla has released Firefox 60. From the release
notes
: “Firefox 60 offers something for everyone and a little
something extra for everyone who deploys Firefox in an enterprise environment. This release includes changes that give you more content and more ways to customize your New Tab/Firefox Home. It also introduces support for the Web Authentication API, which means you can log in to websites in Firefox with USB tokens like YubiKey.
Firefox 60 also brings a new policy engine and Group Policy support for
enterprise deployments. For more info about why and how to use Firefox in
the enterprise, see this blog post.

Hello World Issue 5: Engineering

Post Syndicated from Russell Barnes original https://www.raspberrypi.org/blog/hello-world-issue-5/

Join us as we celebrate the Year of Engineering in the newest issue of Hello World, our magazine for computing and digital making educators.

 

Inspiring future engineers

We’ve brought together a wide range of experts to share their ideas and advice on how to bring engineering to your classroom — read issue 5 to find out the best ways to inspire the next generation.



Plus we’ve got plenty on GP and Scratch, we answer your latest questions, and we bring you our usual collection of useful features, guides, and lesson plans.

Highlights of issue 5 include:

  • The bluffers’ guide to putting together a tech-themed school trip
  • Inclusion, and coding for the visually impaired
  • Getting students interested in databases
  • Why copying may not always be a bad thing

How to get Hello World #5

Hello World is available as a free download under a Creative Commons license for everyone in world who is interested in computer science and digital making education. Get the latest issue as a PDF file straight from the Hello World website.

We’re currently offering free print copies of the magazine to serving educators in the UK. This offer is open to teachers, Code Club and CoderDojo volunteers, teaching assistants, teacher trainers, and others who help children and young people learn about computing and digital making. Subscribe to have your free print magazine posted directly to your home, or subscribe digitally — 20000 educators have already signed up to receive theirs!

Get in touch!

You could write for us about your experiences as an educator, and share your advice with the community. Wherever you are in the world, get in touch by emailing our editorial team about your article idea — we would love to hear from you!

Hello World magazine is a collaboration between the Raspberry Pi Foundation and Computing At School, which is part of the British Computing Society.

The post Hello World Issue 5: Engineering appeared first on Raspberry Pi.

Bad Software Is Our Fault

Post Syndicated from Bozho original https://techblog.bozho.net/bad-software-is-our-fault/

Bad software is everywhere. One can even claim that every software is bad. Cool companies, tech giants, established companies, all produce bad software. And no, yours is not an exception.

Who’s to blame for bad software? It’s all complicated and many factors are intertwined – there’s business requirements, there’s organizational context, there’s lack of sufficient skilled developers, there’s the inherent complexity of software development, there’s leaky abstractions, reliance on 3rd party software, consequences of wrong business and purchase decisions, time limitations, flawed business analysis, etc. So yes, despite the catchy title, I’m aware it’s actually complicated.

But in every “it’s complicated” scenario, there’s always one or two factors that are decisive. All of them contribute somehow, but the major drivers are usually a handful of things. And in the case of base software, I think it’s the fault of technical people. Developers, architects, ops.

We don’t seem to care about best practices. And I’ll do some nasty generalizations here, but bear with me. We can spend hours arguing about tabs vs spaces, curly bracket on new line, git merge vs rebase, which IDE is better, which framework is better and other largely irrelevant stuff. But we tend to ignore the important aspects that span beyond the code itself. The context in which the code lives, the non-functional requirements – robustness, security, resilience, etc.

We don’t seem to get security. Even trivial stuff such as user authentication is almost always implemented wrong. These days Twitter and GitHub realized they have been logging plain-text passwords, for example, but that’s just the tip of the iceberg. Too often we ignore the security implications.

“But the business didn’t request the security features”, one may say. The business never requested 2-factor authentication, encryption at rest, PKI, secure (or any) audit trail, log masking, crypto shredding, etc., etc. Because the business doesn’t know these things – we do and we have to put them on the backlog and fight for them to be implemented. Each organization has its specifics and tech people can influence the backlog in different ways, but almost everywhere we can put things there and prioritize them.

The other aspect is testing. We should all be well aware by now that automated testing is mandatory. We have all the tools in the world for unit, functional, integration, performance and whatnot testing, and yet many software projects lack the necessary test coverage to be able to change stuff without accidentally breaking things. “But testing takes time, we don’t have it”. We are perfectly aware that testing saves time, as we’ve all had those “not again!” recurring bugs. And yet we think of all sorts of excuses – “let the QAs test it”, we have to ship that now, we’ll test it later”, “this is too trivial to be tested”, etc.

And you may say it’s not our job. We don’t define what has do be done, we just do it. We don’t define the budget, the scope, the features. We just write whatever has been decided. And that’s plain wrong. It’s not our job to make money out of our code, and it’s not our job to define what customers need, but apart from that everything is our job. The way the software is structured, the security aspects and security features, the stability of the code base, the way the software behaves in different environments. The non-functional requirements are our job, and putting them on the backlog is our job.

You’ve probably heard that every software becomes “legacy” after 6 months. And that’s because of us, our sloppiness, our inability to mitigate external factors and constraints. Too often we create a mess through “just doing our job”.

And of course that’s a generalization. I happen to know a lot of great professionals who don’t make these mistakes, who strive for excellence and implement things the right way. But our industry as a whole doesn’t. Our industry as a whole produces bad software. And it’s our fault, as developers – as the only people who know why a certain piece of software is bad.

In a talk of his, Bob Martin warns us of the risks of our sloppiness. We have been building websites so far, but we are more and more building stuff that interacts with the real world, directly and indirectly. Ultimately, lives may depend on our software (like the recent unfortunate death caused by a self-driving car). And I’ll agree with Uncle Bob that it’s high time we self-regulate as an industry, before some technically incompetent politician decides to do that.

How, I don’t know. We’ll have to think more about it. But I’m pretty sure it’s our fault that software is bad, and no amount of blaming the management, the budget, the timing, the tools or the process can eliminate our responsibility.

Why do I insist on bashing my fellow software engineers? Because if we start looking at software development with more responsibility; with the fact that if it fails, it’s our fault, then we’re more likely to get out of our current bug-ridden, security-flawed, fragile software hole and really become the experts of the future.

The post Bad Software Is Our Fault appeared first on Bozho's tech blog.

Raspberry Pi in your favourite films and TV shows

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/raspberry-pi-films-tv/

If, like us, you’ve been bingeflixing your way through Netflix’s new show, Lost in Space, you may have noticed a Raspberry Pi being used as futuristic space tech.

Raspberry Pi Netflix Lost in Space

Danger, Will Robinson, that probably won’t work

This isn’t the first time a Pi has been used as a film or television prop. From Mr. Robot and Disney Pixar’s Big Hero 6 to Mr. Robot, Sense8, and Mr. Robot, our humble little computer has become quite the celeb.

Raspberry Pi Charlie Brooker Election Wipe
Raspberry Pi Big Hero 6
Raspberry Pi Netflix

Raspberry Pi Spy has been working hard to locate and document the appearance of the Raspberry Pi in some of our favourite shows and movies. He’s created this video covering 2010-2017:

Raspberry Pi TV and Film Appearances 2012-2017

Since 2012 the Raspberry Pi single board computer has appeared in a number of movies and TV shows. This video is a run through of those appearances where the Pi has been used as a prop.

For 2018 appearances and beyond, you can find a full list on the Raspberry Pi Spy website. If you’ve spotted an appearance that’s not on the list, tell us in the comments!

The post Raspberry Pi in your favourite films and TV shows appeared first on Raspberry Pi.

Analyze data in Amazon DynamoDB using Amazon SageMaker for real-time prediction

Post Syndicated from YongSeong Lee original https://aws.amazon.com/blogs/big-data/analyze-data-in-amazon-dynamodb-using-amazon-sagemaker-for-real-time-prediction/

Many companies across the globe use Amazon DynamoDB to store and query historical user-interaction data. DynamoDB is a fast NoSQL database used by applications that need consistent, single-digit millisecond latency.

Often, customers want to turn their valuable data in DynamoDB into insights by analyzing a copy of their table stored in Amazon S3. Doing this separates their analytical queries from their low-latency critical paths. This data can be the primary source for understanding customers’ past behavior, predicting future behavior, and generating downstream business value. Customers often turn to DynamoDB because of its great scalability and high availability. After a successful launch, many customers want to use the data in DynamoDB to predict future behaviors or provide personalized recommendations.

DynamoDB is a good fit for low-latency reads and writes, but it’s not practical to scan all data in a DynamoDB database to train a model. In this post, I demonstrate how you can use DynamoDB table data copied to Amazon S3 by AWS Data Pipeline to predict customer behavior. I also demonstrate how you can use this data to provide personalized recommendations for customers using Amazon SageMaker. You can also run ad hoc queries using Amazon Athena against the data. DynamoDB recently released on-demand backups to create full table backups with no performance impact. However, it’s not suitable for our purposes in this post, so I chose AWS Data Pipeline instead to create managed backups are accessible from other services.

To do this, I describe how to read the DynamoDB backup file format in Data Pipeline. I also describe how to convert the objects in S3 to a CSV format that Amazon SageMaker can read. In addition, I show how to schedule regular exports and transformations using Data Pipeline. The sample data used in this post is from Bank Marketing Data Set of UCI.

The solution that I describe provides the following benefits:

  • Separates analytical queries from production traffic on your DynamoDB table, preserving your DynamoDB read capacity units (RCUs) for important production requests
  • Automatically updates your model to get real-time predictions
  • Optimizes for performance (so it doesn’t compete with DynamoDB RCUs after the export) and for cost (using data you already have)
  • Makes it easier for developers of all skill levels to use Amazon SageMaker

All code and data set in this post are available in this .zip file.

Solution architecture

The following diagram shows the overall architecture of the solution.

The steps that data follows through the architecture are as follows:

  1. Data Pipeline regularly copies the full contents of a DynamoDB table as JSON into an S3
  2. Exported JSON files are converted to comma-separated value (CSV) format to use as a data source for Amazon SageMaker.
  3. Amazon SageMaker renews the model artifact and update the endpoint.
  4. The converted CSV is available for ad hoc queries with Amazon Athena.
  5. Data Pipeline controls this flow and repeats the cycle based on the schedule defined by customer requirements.

Building the auto-updating model

This section discusses details about how to read the DynamoDB exported data in Data Pipeline and build automated workflows for real-time prediction with a regularly updated model.

Download sample scripts and data

Before you begin, take the following steps:

  1. Download sample scripts in this .zip file.
  2. Unzip the src.zip file.
  3. Find the automation_script.sh file and edit it for your environment. For example, you need to replace 's3://<your bucket>/<datasource path>/' with your own S3 path to the data source for Amazon ML. In the script, the text enclosed by angle brackets—< and >—should be replaced with your own path.
  4. Upload the json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar file to your S3 path so that the ADD jar command in Apache Hive can refer to it.

For this solution, the banking.csv  should be imported into a DynamoDB table.

Export a DynamoDB table

To export the DynamoDB table to S3, open the Data Pipeline console and choose the Export DynamoDB table to S3 template. In this template, Data Pipeline creates an Amazon EMR cluster and performs an export in the EMRActivity activity. Set proper intervals for backups according to your business requirements.

One core node(m3.xlarge) provides the default capacity for the EMR cluster and should be suitable for the solution in this post. Leave the option to resize the cluster before running enabled in the TableBackupActivity activity to let Data Pipeline scale the cluster to match the table size. The process of converting to CSV format and renewing models happens in this EMR cluster.

For a more in-depth look at how to export data from DynamoDB, see Export Data from DynamoDB in the Data Pipeline documentation.

Add the script to an existing pipeline

After you export your DynamoDB table, you add an additional EMR step to EMRActivity by following these steps:

  1. Open the Data Pipeline console and choose the ID for the pipeline that you want to add the script to.
  2. For Actions, choose Edit.
  3. In the editing console, choose the Activities category and add an EMR step using the custom script downloaded in the previous section, as shown below.

Paste the following command into the new step after the data ­­upload step:

s3://#{myDDBRegion}.elasticmapreduce/libs/script-runner/script-runner.jar,s3://<your bucket name>/automation_script.sh,#{output.directoryPath},#{myDDBRegion}

The element #{output.directoryPath} references the S3 path where the data pipeline exports DynamoDB data as JSON. The path should be passed to the script as an argument.

The bash script has two goals, converting data formats and renewing the Amazon SageMaker model. Subsequent sections discuss the contents of the automation script.

Automation script: Convert JSON data to CSV with Hive

We use Apache Hive to transform the data into a new format. The Hive QL script to create an external table and transform the data is included in the custom script that you added to the Data Pipeline definition.

When you run the Hive scripts, do so with the -e option. Also, define the Hive table with the 'org.openx.data.jsonserde.JsonSerDe' row format to parse and read JSON format. The SQL creates a Hive EXTERNAL table, and it reads the DynamoDB backup data on the S3 path passed to it by Data Pipeline.

Note: You should create the table with the “EXTERNAL” keyword to avoid the backup data being accidentally deleted from S3 if you drop the table.

The full automation script for converting follows. Add your own bucket name and data source path in the highlighted areas.

#!/bin/bash
hive -e "
ADD jar s3://<your bucket name>/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar ; 
DROP TABLE IF EXISTS blog_backup_data ;
CREATE EXTERNAL TABLE blog_backup_data (
 customer_id map<string,string>,
 age map<string,string>, job map<string,string>, 
 marital map<string,string>,education map<string,string>, 
 default map<string,string>, housing map<string,string>,
 loan map<string,string>, contact map<string,string>, 
 month map<string,string>, day_of_week map<string,string>, 
 duration map<string,string>, campaign map<string,string>,
 pdays map<string,string>, previous map<string,string>, 
 poutcome map<string,string>, emp_var_rate map<string,string>, 
 cons_price_idx map<string,string>, cons_conf_idx map<string,string>,
 euribor3m map<string,string>, nr_employed map<string,string>, 
 y map<string,string> ) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' 
LOCATION '$1/';

INSERT OVERWRITE DIRECTORY 's3://<your bucket name>/<datasource path>/' 
SELECT concat( customer_id['s'],',', 
 age['n'],',', job['s'],',', 
 marital['s'],',', education['s'],',', default['s'],',', 
 housing['s'],',', loan['s'],',', contact['s'],',', 
 month['s'],',', day_of_week['s'],',', duration['n'],',', 
 campaign['n'],',',pdays['n'],',',previous['n'],',', 
 poutcome['s'],',', emp_var_rate['n'],',', cons_price_idx['n'],',',
 cons_conf_idx['n'],',', euribor3m['n'],',', nr_employed['n'],',', y['n'] ) 
FROM blog_backup_data
WHERE customer_id['s'] > 0 ; 

After creating an external table, you need to read data. You then use the INSERT OVERWRITE DIRECTORY ~ SELECT command to write CSV data to the S3 path that you designated as the data source for Amazon SageMaker.

Depending on your requirements, you can eliminate or process the columns in the SELECT clause in this step to optimize data analysis. For example, you might remove some columns that have unpredictable correlations with the target value because keeping the wrong columns might expose your model to “overfitting” during the training. In this post, customer_id  columns is removed. Overfitting can make your prediction weak. More information about overfitting can be found in the topic Model Fit: Underfitting vs. Overfitting in the Amazon ML documentation.

Automation script: Renew the Amazon SageMaker model

After the CSV data is replaced and ready to use, create a new model artifact for Amazon SageMaker with the updated dataset on S3.  For renewing model artifact, you must create a new training job.  Training jobs can be run using the AWS SDK ( for example, Amazon SageMaker boto3 ) or the Amazon SageMaker Python SDK that can be installed with “pip install sagemaker” command as well as the AWS CLI for Amazon SageMaker described in this post.

In addition, consider how to smoothly renew your existing model without service impact, because your model is called by applications in real time. To do this, you need to create a new endpoint configuration first and update a current endpoint with the endpoint configuration that is just created.

#!/bin/bash
## Define variable 
REGION=$2
DTTIME=`date +%Y-%m-%d-%H-%M-%S`
ROLE="<your AmazonSageMaker-ExecutionRole>" 


# Select containers image based on region.  
case "$REGION" in
"us-west-2" )
    IMAGE="174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:latest"
    ;;
"us-east-1" )
    IMAGE="382416733822.dkr.ecr.us-east-1.amazonaws.com/linear-learner:latest" 
    ;;
"us-east-2" )
    IMAGE="404615174143.dkr.ecr.us-east-2.amazonaws.com/linear-learner:latest" 
    ;;
"eu-west-1" )
    IMAGE="438346466558.dkr.ecr.eu-west-1.amazonaws.com/linear-learner:latest" 
    ;;
 *)
    echo "Invalid Region Name"
    exit 1 ;  
esac

# Start training job and creating model artifact 
TRAINING_JOB_NAME=TRAIN-${DTTIME} 
S3OUTPUT="s3://<your bucket name>/model/" 
INSTANCETYPE="ml.m4.xlarge"
INSTANCECOUNT=1
VOLUMESIZE=5 
aws sagemaker create-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}  --algorithm-specification TrainingImage=${IMAGE},TrainingInputMode=File --role-arn ${ROLE}  --input-data-config '[{ "ChannelName": "train", "DataSource": { "S3DataSource": { "S3DataType": "S3Prefix", "S3Uri": "s3://<your bucket name>/<datasource path>/", "S3DataDistributionType": "FullyReplicated" } }, "ContentType": "text/csv", "CompressionType": "None" , "RecordWrapperType": "None"  }]'  --output-data-config S3OutputPath=${S3OUTPUT} --resource-config  InstanceType=${INSTANCETYPE},InstanceCount=${INSTANCECOUNT},VolumeSizeInGB=${VOLUMESIZE} --stopping-condition MaxRuntimeInSeconds=120 --hyper-parameters feature_dim=20,predictor_type=binary_classifier  

# Wait until job completed 
aws sagemaker wait training-job-completed-or-stopped --training-job-name ${TRAINING_JOB_NAME}  --region ${REGION}

# Get newly created model artifact and create model
MODELARTIFACT=`aws sagemaker describe-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}  --query 'ModelArtifacts.S3ModelArtifacts' --output text `
MODELNAME=MODEL-${DTTIME}
aws sagemaker create-model --region ${REGION} --model-name ${MODELNAME}  --primary-container Image=${IMAGE},ModelDataUrl=${MODELARTIFACT}  --execution-role-arn ${ROLE}

# create a new endpoint configuration 
CONFIGNAME=CONFIG-${DTTIME}
aws sagemaker  create-endpoint-config --region ${REGION} --endpoint-config-name ${CONFIGNAME}  --production-variants  VariantName=Users,ModelName=${MODELNAME},InitialInstanceCount=1,InstanceType=ml.m4.xlarge

# create or update the endpoint
STATUS=`aws sagemaker describe-endpoint --endpoint-name  ServiceEndpoint --query 'EndpointStatus' --output text --region ${REGION} `
if [[ $STATUS -ne "InService" ]] ;
then
    aws sagemaker  create-endpoint --endpoint-name  ServiceEndpoint  --endpoint-config-name ${CONFIGNAME} --region ${REGION}    
else
    aws sagemaker  update-endpoint --endpoint-name  ServiceEndpoint  --endpoint-config-name ${CONFIGNAME} --region ${REGION}
fi

Grant permission

Before you execute the script, you must grant proper permission to Data Pipeline. Data Pipeline uses the DataPipelineDefaultResourceRole role by default. I added the following policy to DataPipelineDefaultResourceRole to allow Data Pipeline to create, delete, and update the Amazon SageMaker model and data source in the script.

{
 "Version": "2012-10-17",
 "Statement": [
 {
 "Effect": "Allow",
 "Action": [
 "sagemaker:CreateTrainingJob",
 "sagemaker:DescribeTrainingJob",
 "sagemaker:CreateModel",
 "sagemaker:CreateEndpointConfig",
 "sagemaker:DescribeEndpoint",
 "sagemaker:CreateEndpoint",
 "sagemaker:UpdateEndpoint",
 "iam:PassRole"
 ],
 "Resource": "*"
 }
 ]
}

Use real-time prediction

After you deploy a model into production using Amazon SageMaker hosting services, your client applications use this API to get inferences from the model hosted at the specified endpoint. This approach is useful for interactive web, mobile, or desktop applications.

Following, I provide a simple Python code example that queries against Amazon SageMaker endpoint URL with its name (“ServiceEndpoint”) and then uses them for real-time prediction.

=== Python sample for real-time prediction ===

#!/usr/bin/env python
import boto3
import json 

client = boto3.client('sagemaker-runtime', region_name ='<your region>' )
new_customer_info = '34,10,2,4,1,2,1,1,6,3,190,1,3,4,3,-1.7,94.055,-39.8,0.715,4991.6'
response = client.invoke_endpoint(
    EndpointName='ServiceEndpoint',
    Body=new_customer_info, 
    ContentType='text/csv'
)
result = json.loads(response['Body'].read().decode())
print(result)
--- output(response) ---
{u'predictions': [{u'score': 0.7528127431869507, u'predicted_label': 1.0}]}

Solution summary

The solution takes the following steps:

  1. Data Pipeline exports DynamoDB table data into S3. The original JSON data should be kept to recover the table in the rare event that this is needed. Data Pipeline then converts JSON to CSV so that Amazon SageMaker can read the data.Note: You should select only meaningful attributes when you convert CSV. For example, if you judge that the “campaign” attribute is not correlated, you can eliminate this attribute from the CSV.
  2. Train the Amazon SageMaker model with the new data source.
  3. When a new customer comes to your site, you can judge how likely it is for this customer to subscribe to your new product based on “predictedScores” provided by Amazon SageMaker.
  4. If the new user subscribes your new product, your application must update the attribute “y” to the value 1 (for yes). This updated data is provided for the next model renewal as a new data source. It serves to improve the accuracy of your prediction. With each new entry, your application can become smarter and deliver better predictions.

Running ad hoc queries using Amazon Athena

Amazon Athena is a serverless query service that makes it easy to analyze large amounts of data stored in Amazon S3 using standard SQL. Athena is useful for examining data and collecting statistics or informative summaries about data. You can also use the powerful analytic functions of Presto, as described in the topic Aggregate Functions of Presto in the Presto documentation.

With the Data Pipeline scheduled activity, recent CSV data is always located in S3 so that you can run ad hoc queries against the data using Amazon Athena. I show this with example SQL statements following. For an in-depth description of this process, see the post Interactive SQL Queries for Data in Amazon S3 on the AWS News Blog. 

Creating an Amazon Athena table and running it

Simply, you can create an EXTERNAL table for the CSV data on S3 in Amazon Athena Management Console.

=== Table Creation ===
CREATE EXTERNAL TABLE datasource (
 age int, 
 job string, 
 marital string , 
 education string, 
 default string, 
 housing string, 
 loan string, 
 contact string, 
 month string, 
 day_of_week string, 
 duration int, 
 campaign int, 
 pdays int , 
 previous int , 
 poutcome string, 
 emp_var_rate double, 
 cons_price_idx double,
 cons_conf_idx double, 
 euribor3m double, 
 nr_employed double, 
 y int 
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n' 
LOCATION 's3://<your bucket name>/<datasource path>/';

The following query calculates the correlation coefficient between the target attribute and other attributes using Amazon Athena.

=== Sample Query ===

SELECT corr(age,y) AS correlation_age_and_target, 
 corr(duration,y) AS correlation_duration_and_target, 
 corr(campaign,y) AS correlation_campaign_and_target,
 corr(contact,y) AS correlation_contact_and_target
FROM ( SELECT age , duration , campaign , y , 
 CASE WHEN contact = 'telephone' THEN 1 ELSE 0 END AS contact 
 FROM datasource 
 ) datasource ;

Conclusion

In this post, I introduce an example of how to analyze data in DynamoDB by using table data in Amazon S3 to optimize DynamoDB table read capacity. You can then use the analyzed data as a new data source to train an Amazon SageMaker model for accurate real-time prediction. In addition, you can run ad hoc queries against the data on S3 using Amazon Athena. I also present how to automate these procedures by using Data Pipeline.

You can adapt this example to your specific use case at hand, and hopefully this post helps you accelerate your development. You can find more examples and use cases for Amazon SageMaker in the video AWS 2017: Introducing Amazon SageMaker on the AWS website.

 


Additional Reading

If you found this post useful, be sure to check out Serving Real-Time Machine Learning Predictions on Amazon EMR and Analyzing Data in S3 using Amazon Athena.

 


About the Author

Yong Seong Lee is a Cloud Support Engineer for AWS Big Data Services. He is interested in every technology related to data/databases and helping customers who have difficulties in using AWS services. His motto is “Enjoy life, be curious and have maximum experience.”

 

 

CI/CD with Data: Enabling Data Portability in a Software Delivery Pipeline with AWS Developer Tools, Kubernetes, and Portworx

Post Syndicated from Kausalya Rani Krishna Samy original https://aws.amazon.com/blogs/devops/cicd-with-data-enabling-data-portability-in-a-software-delivery-pipeline-with-aws-developer-tools-kubernetes-and-portworx/

This post is written by Eric Han – Vice President of Product Management Portworx and Asif Khan – Solutions Architect

Data is the soul of an application. As containers make it easier to package and deploy applications faster, testing plays an even more important role in the reliable delivery of software. Given that all applications have data, development teams want a way to reliably control, move, and test using real application data or, at times, obfuscated data.

For many teams, moving application data through a CI/CD pipeline, while honoring compliance and maintaining separation of concerns, has been a manual task that doesn’t scale. At best, it is limited to a few applications, and is not portable across environments. The goal should be to make running and testing stateful containers (think databases and message buses where operations are tracked) as easy as with stateless (such as with web front ends where they are often not).

Why is state important in testing scenarios? One reason is that many bugs manifest only when code is tested against real data. For example, we might simply want to test a database schema upgrade but a small synthetic dataset does not exercise the critical, finer corner cases in complex business logic. If we want true end-to-end testing, we need to be able to easily manage our data or state.

In this blog post, we define a CI/CD pipeline reference architecture that can automate data movement between applications. We also provide the steps to follow to configure the CI/CD pipeline.

 

Stateful Pipelines: Need for Portable Volumes

As part of continuous integration, testing, and deployment, a team may need to reproduce a bug found in production against a staging setup. Here, the hosting environment is comprised of a cluster with Kubernetes as the scheduler and Portworx for persistent volumes. The testing workflow is then automated by AWS CodeCommit, AWS CodePipeline, and AWS CodeBuild.

Portworx offers Kubernetes storage that can be used to make persistent volumes portable between AWS environments and pipelines. The addition of Portworx to the AWS Developer Tools continuous deployment for Kubernetes reference architecture adds persistent storage and storage orchestration to a Kubernetes cluster. The example uses MongoDB as the demonstration of a stateful application. In practice, the workflow applies to any containerized application such as Cassandra, MySQL, Kafka, and Elasticsearch.

Using the reference architecture, a developer calls CodePipeline to trigger a snapshot of the running production MongoDB database. Portworx then creates a block-based, writable snapshot of the MongoDB volume. Meanwhile, the production MongoDB database continues serving end users and is uninterrupted.

Without the Portworx integrations, a manual process would require an application-level backup of the database instance that is outside of the CI/CD process. For larger databases, this could take hours and impact production. The use of block-based snapshots follows best practices for resilient and non-disruptive backups.

As part of the workflow, CodePipeline deploys a new MongoDB instance for staging onto the Kubernetes cluster and mounts the second Portworx volume that has the data from production. CodePipeline triggers the snapshot of a Portworx volume through an AWS Lambda function, as shown here

 

 

 

AWS Developer Tools with Kubernetes: Integrated Workflow with Portworx

In the following workflow, a developer is testing changes to a containerized application that calls on MongoDB. The tests are performed against a staging instance of MongoDB. The same workflow applies if changes were on the server side. The original production deployment is scheduled as a Kubernetes deployment object and uses Portworx as the storage for the persistent volume.

The continuous deployment pipeline runs as follows:

  • Developers integrate bug fix changes into a main development branch that gets merged into a CodeCommit master branch.
  • Amazon CloudWatch triggers the pipeline when code is merged into a master branch of an AWS CodeCommit repository.
  • AWS CodePipeline sends the new revision to AWS CodeBuild, which builds a Docker container image with the build ID.
  • AWS CodeBuild pushes the new Docker container image tagged with the build ID to an Amazon ECR registry.
  • Kubernetes downloads the new container (for the database client) from Amazon ECR and deploys the application (as a pod) and staging MongoDB instance (as a deployment object).
  • AWS CodePipeline, through a Lambda function, calls Portworx to snapshot the production MongoDB and deploy a staging instance of MongoDB• Portworx provides a snapshot of the production instance as the persistent storage of the staging MongoDB
    • The MongoDB instance mounts the snapshot.

At this point, the staging setup mimics a production environment. Teams can run integration and full end-to-end tests, using partner tooling, without impacting production workloads. The full pipeline is shown here.

 

Summary

This reference architecture showcases how development teams can easily move data between production and staging for the purposes of testing. Instead of taking application-specific manual steps, all operations in this CodePipeline architecture are automated and tracked as part of the CI/CD process.

This integrated experience is part of making stateful containers as easy as stateless. With AWS CodePipeline for CI/CD process, developers can easily deploy stateful containers onto a Kubernetes cluster with Portworx storage and automate data movement within their process.

The reference architecture and code are available on GitHub:

● Reference architecture: https://github.com/portworx/aws-kube-codesuite
● Lambda function source code for Portworx additions: https://github.com/portworx/aws-kube-codesuite/blob/master/src/kube-lambda.py

For more information about persistent storage for containers, visit the Portworx website. For more information about Code Pipeline, see the AWS CodePipeline User Guide.

Hard Drive Stats for Q1 2018

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-stats-for-q1-2018/

Backblaze Drive Stats Q1 2018

As of March 31, 2018 we had 100,110 spinning hard drives. Of that number, there were 1,922 boot drives and 98,188 data drives. This review looks at the quarterly and lifetime statistics for the data drive models in operation in our data centers. We’ll also take a look at why we are collecting and reporting 10 new SMART attributes and take a sneak peak at some 8 TB Toshiba drives. Along the way, we’ll share observations and insights on the data presented and we look forward to you doing the same in the comments.

Background

Since April 2013, Backblaze has recorded and saved daily hard drive statistics from the drives in our data centers. Each entry consists of the date, manufacturer, model, serial number, status (operational or failed), and all of the SMART attributes reported by that drive. Currently there are about 97 million entries totaling 26 GB of data. You can download this data from our website if you want to do your own research, but for starters here’s what we found.

Hard Drive Reliability Statistics for Q1 2018

At the end of Q1 2018 Backblaze was monitoring 98,188 hard drives used to store data. For our evaluation below we remove from consideration those drives which were used for testing purposes and those drive models for which we did not have at least 45 drives. This leaves us with 98,046 hard drives. The table below covers just Q1 2018.

Q1 2018 Hard Drive Failure Rates

Notes and Observations

If a drive model has a failure rate of 0%, it only means there were no drive failures of that model during Q1 2018.

The overall Annualized Failure Rate (AFR) for Q1 is just 1.2%, well below the Q4 2017 AFR of 1.65%. Remember that quarterly failure rates can be volatile, especially for models that have a small number of drives and/or a small number of Drive Days.

There were 142 drives (98,188 minus 98,046) that were not included in the list above because we did not have at least 45 of a given drive model. We use 45 drives of the same model as the minimum number when we report quarterly, yearly, and lifetime drive statistics.

Welcome Toshiba 8TB drives, almost…

We mentioned Toshiba 8 TB drives in the first paragraph, but they don’t show up in the Q1 Stats chart. What gives? We only had 20 of the Toshiba 8 TB drives in operation in Q1, so they were excluded from the chart. Why do we have only 20 drives? When we test out a new drive model we start with the “tome test” and it takes 20 drives to fill one tome. A tome is the same drive model in the same logical position in each of the 20 Storage Pods that make up a Backblaze Vault. There are 60 tomes in each vault.

In this test, we created a Backblaze Vault of 8 TB drives, with 59 of the tomes being Seagate 8 TB drives and 1 tome being the Toshiba drives. Then we monitored the performance of the vault and its member tomes to see if, in this case, the Toshiba drives performed as expected.

Q1 2018 Hard Drive Failure Rate — Toshiba 8TB

So far the Toshiba drive is performing fine, but they have been in place for only 20 days. Next up is the “pod test” where we fill a Storage Pod with Toshiba drives and integrate it into a Backblaze Vault comprised of like-sized drives. We hope to have a better look at the Toshiba 8 TB drives in our Q2 report — stay tuned.

Lifetime Hard Drive Reliability Statistics

While the quarterly chart presented earlier gets a lot of interest, the real test of any drive model is over time. Below is the lifetime failure rate chart for all the hard drive models which have 45 or more drives in operation as of March 31st, 2018. For each model, we compute their reliability starting from when they were first installed.

Lifetime Hard Drive Failure Rates

Notes and Observations

The failure rates of all of the larger drives (8-, 10- and 12 TB) are very good, 1.2% AFR (Annualized Failure Rate) or less. Many of these drives were deployed in the last year, so there is some volatility in the data, but you can use the Confidence Interval to get a sense of the failure percentage range.

The overall failure rate of 1.84% is the lowest we have ever achieved, besting the previous low of 2.00% from the end of 2017.

Our regular readers and drive stats wonks may have noticed a sizable jump in the number of HGST 8 TB drives (model: HUH728080ALE600), from 45 last quarter to 1,045 this quarter. As the 10 TB and 12 TB drives become more available, the price per terabyte of the 8 TB drives has gone down. This presented an opportunity to purchase the HGST drives at a price in line with our budget.

We purchased and placed into service the 45 original HGST 8 TB drives in Q2 of 2015. They were our first Helium-filled drives and our only ones until the 10 TB and 12 TB Seagate drives arrived in Q3 2017. We’ll take a first look into whether or not Helium makes a difference in drive failure rates in an upcoming blog post.

New SMART Attributes

If you have previously worked with the hard drive stats data or plan to, you’ll notice that we added 10 more columns of data starting in 2018. There are 5 new SMART attributes we are tracking each with a raw and normalized value:

  • 177 – Wear Range Delta
  • 179 – Used Reserved Block Count Total
  • 181- Program Fail Count Total or Non-4K Aligned Access Count
  • 182 – Erase Fail Count
  • 235 – Good Block Count AND System(Free) Block Count

The 5 values are all related to SSD drives.

Yes, SSD drives, but before you jump to any conclusions, we used 10 Samsung 850 EVO SSDs as boot drives for a period of time in Q1. This was an experiment to see if we could reduce boot up time for the Storage Pods. In our case, the improved boot up speed wasn’t worth the SSD cost, but it did add 10 new columns to the hard drive stats data.

Speaking of hard drive stats data, the complete data set used to create the information used in this review is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose, all we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone. It is free.

If you just want the summarized data used to create the tables and charts in this blog post, you can download the ZIP file containing the MS Excel spreadsheet.

Good luck and let us know if you find anything interesting.

[Ed: 5/1/2018 – Updated Lifetime chart to fix error in confidence interval for HGST 4TB drive, model: HDS5C4040ALE630]

The post Hard Drive Stats for Q1 2018 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

IoT Inspector Tool from Princeton

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/05/iot_inspector_t.html

Researchers at Princeton University have released IoT Inspector, a tool that analyzes the security and privacy of IoT devices by examining the data they send across the Internet. They’ve already used the tool to study a bunch of different IoT devices. From their blog post:

Finding #3: Many IoT Devices Contact a Large and Diverse Set of Third Parties

In many cases, consumers expect that their devices contact manufacturers’ servers, but communication with other third-party destinations may not be a behavior that consumers expect.

We have found that many IoT devices communicate with third-party services, of which consumers are typically unaware. We have found many instances of third-party communications in our analyses of IoT device network traffic. Some examples include:

  • Samsung Smart TV. During the first minute after power-on, the TV talks to Google Play, Double Click, Netflix, FandangoNOW, Spotify, CBS, MSNBC, NFL, Deezer, and Facebook­even though we did not sign in or create accounts with any of them.
  • Amcrest WiFi Security Camera. The camera actively communicates with cellphonepush.quickddns.com using HTTPS. QuickDDNS is a Dynamic DNS service provider operated by Dahua. Dahua is also a security camera manufacturer, although Amcrest’s website makes no references to Dahua. Amcrest customer service informed us that Dahua was the original equipment manufacturer.

  • Halo Smoke Detector. The smart smoke detector communicates with broker.xively.com. Xively offers an MQTT service, which allows manufacturers to communicate with their devices.

  • Geeni Light Bulb. The Geeni smart bulb communicates with gw.tuyaus.com, which is operated by TuYa, a China-based company that also offers an MQTT service.

We also looked at a number of other devices, such as Samsung Smart Camera and TP-Link Smart Plug, and found communications with third parties ranging from NTP pools (time servers) to video storage services.

Their first two findings are that “Many IoT devices lack basic encryption and authentication” and that “User behavior can be inferred from encrypted IoT device traffic.” No surprises there.

Boingboing post.

Related: IoT Hall of Shame.

Danish Traffic to Pirate Sites Increases 67% in Just a Year

Post Syndicated from Andy original https://torrentfreak.com/danish-traffic-to-pirate-sites-increases-67-in-just-a-year-180501/

For close to 20 years, rightsholders have tried to stem the tide of mainstream Internet piracy. Yet despite increasingly powerful enforcement tools, infringement continues on a grand scale.

While the problem is global, rightsholder groups often zoom in on their home turf, to see how the fight is progressing locally. Covering Denmark, the Rights Alliance Data Report 2017 paints a fairly pessimistic picture.

Published this week, the industry study – which uses SimilarWeb and MarkMonitor data – finds that Danes visited 2,000 leading pirate sites 596 million times in 2017. That represents a 67% increase over the 356 million visits to unlicensed platforms made by citizens during 2016.

The report notes that, at least in part, this explosive growth can be attributed to mobile-compatible sites and services, which make it easier than ever to consume illicit content on the move, as well as at home.

In a sea of unauthorized streaming sites, Rights Alliance highlights one platform above all the others as a particularly bad influence in 2017 – 123movies (also known as GoMovies and GoStream, among others).

“The popularity of this service rose sharply in 2017 from 40 million visits in 2016 to 175 million visits in 2017 – an increase of 337 percent, of which most of the traffic originates from mobile devices,” the report notes.

123movies recently announced its closure but before that the platform was subjected to web-blocking in several jurisdictions.

Rights Alliance says that Denmark has one of the most effective blocking systems in the world but that still doesn’t stop huge numbers of people from consuming pirate content from sites that aren’t yet blocked.

“Traffic to infringing sites is overwhelming, and therefore blocking a few sites merely takes the top of the illegal activities,” Rights Alliance chief Maria Fredenslund informs TorrentFreak.

“Blocking is effective by stopping 75% of traffic to blocked sites but certainly, an upscaled effort is necessary.”

Rights Alliance also views the promotion of legal services as crucial to its anti-piracy strategy so when people visit a blocked site, they’re also directed towards legitimate platforms.

“That is why we are working at the moment with Denmark’s Ministry of Culture and ISPs on a campaign ‘Share With Care 2′ which promotes legal services e.g. by offering a search function for legal services which will be placed in combination with the signs that are put on blocked websites,” the anti-piracy group notes.

But even with such measures in place, the thirst for unlicensed content is great. In 2017 alone, 500 of the most popular films and TV shows were downloaded from P2P networks like BitTorrent more than 15 million times from Danish IP addresses, that’s up from 11.9 million in 2016.

Given the dramatic rise in visits to pirate sites overall, the suggestion is that plenty of consumers are still getting through. Rights Alliance says that the number of people being restricted is also hampered by people who don’t use their ISP’s DNS service, which is the method used to block sites in Denmark.

Additionally, interest in VPNs and similar anonymization and bypass-capable technologies is on the increase. Between 3.5% and 5% of Danish Internet users currently use a VPN, a number that’s expected to go up. Furthermore, Rights Alliance reports greater interest in “closed” pirate communities.

“The data is based on closed [BitTorrent] networks. We also address the challenges with private communities on Facebook and other [social media] platforms,” Fredenslund explains.

“Due to the closed doors of these platforms it is not possible for us to say anything precisely about the amount of infringing activities there. However, we receive an increasing number of notices from our members who discover that their products are distributed illegally and also we do an increased monitoring of these platforms.”

But while more established technologies such as torrents and regular web-streaming continue in considerable volumes, newer IPTV-style services accessible via apps and dedicated platforms are also gaining traction.

“The volume of visitors to these services’ websites has been sharply rising in 2017 – an increase of 84 percent from January to December,” Rights Alliance notes.

“Even though the number of visitors does not say anything about actual consumption, as users usually only visit pages one time to download the program, the number gives an indication that the interest in IPTV is increasing.”

To combat this growth market, Rights Alliance says it wants to establish web-blockades against sites hosting the software applications.

Also on the up are visits to platforms offering live sports illegally. In 2017, Danish IP addresses made 2.96 million visits to these services, corresponding to almost 250,000 visits per month and representing an annual increase of 28%.

Rights Alliance informs TF that in future a ‘live’ blocking mechanism similar to the one used by the Premier League in the UK could be deployed in Denmark.

“We already have a dynamic blocking system, and we see an increasing demand for illegal TV products, so this could be a natural next step,” Fredenslund explains.

Another small but perhaps significant detail is how users are accessing pirate sites. According to the report, large volumes of people are now visiting platforms directly, with more than 50% doing so in preference to referrals from search engines such as Google.

In terms of deterrence, the Rights Alliance report sticks to the tried-and-tested approaches seen so often in the anti-piracy arena.

Firstly, the group notes that it’s increasingly encountering people who are paying for legal services such as Netflix and Spotify so believe that allows them to grab something extra from a pirate site. However, in common with similar organizations globally, the group counters that pirate sites can serve malware or have other nefarious business interests behind the scenes, so people should stay away.

Whether significant volumes will heed this advice will remain to be seen but if a 67% increase last year is any predictor of the future, piracy is here to stay – and then some. Rights Alliance says it is ready for the challenge but will need some assistance to achieve its goals.

“As it is evident from the traffic data, criminal activities are not something that we, private companies (right holders in cooperation with ISPs), can handle alone,” Fredenslund says.

“Therefore, we are very pleased that DK Government recently announced that the IP taskforce which was set down as a trial period has now been made permanent. In that regard it is important and necessary that the police will also obtain the authority to handle blocking of massively infringing websites. Police do not have the authority to carry out blocking as it is today.”

The full report is available here (Danish, pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.