Tag Archives: updated

Amazon Neptune Generally Available

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/amazon-neptune-generally-available/

Amazon Neptune is now Generally Available in US East (N. Virginia), US East (Ohio), US West (Oregon), and EU (Ireland). Amazon Neptune is a fast, reliable, fully-managed graph database service that makes it easy to build and run applications that work with highly connected datasets. At the core of Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with millisecond latencies. Neptune supports two popular graph models, Property Graph and RDF, through Apache TinkerPop Gremlin and SPARQL, allowing you to easily build queries that efficiently navigate highly connected datasets. Neptune can be used to power everything from recommendation engines and knowledge graphs to drug discovery and network security. Neptune is fully-managed with automatic minor version upgrades, backups, encryption, and fail-over. I wrote about Neptune in detail for AWS re:Invent last year and customers have been using the preview and providing great feedback that the team has used to prepare the service for GA.

Now that Amazon Neptune is generally available there are a few changes from the preview:

Launching an Amazon Neptune Cluster

Launching a Neptune cluster is as easy as navigating to the AWS Management Console and clicking create cluster. Of course you can also launch with CloudFormation, the CLI, or the SDKs.

You can monitor your cluster health and the health of individual instances through Amazon CloudWatch and the console.

Additional Resources

We’ve created two repos with some additional tools and examples here. You can expect continuous development on these repos as we add additional tools and examples.

  • Amazon Neptune Tools Repo
    This repo has a useful tool for converting GraphML files into Neptune compatible CSVs for bulk loading from S3.
  • Amazon Neptune Samples Repo
    This repo has a really cool example of building a collaborative filtering recommendation engine for video game preferences.

Purpose Built Databases

There’s an industry trend where we’re moving more and more onto purpose-built databases. Developers and businesses want to access their data in the format that makes the most sense for their applications. As cloud resources make transforming large datasets easier with tools like AWS Glue, we have a lot more options than we used to for accessing our data. With tools like Amazon Redshift, Amazon Athena, Amazon Aurora, Amazon DynamoDB, and more we get to choose the best database for the job or even enable entirely new use-cases. Amazon Neptune is perfect for workloads where the data is highly connected across data rich edges.

I’m really excited about graph databases and I see a huge number of applications. Looking for ideas of cool things to build? I’d love to build a web crawler in AWS Lambda that uses Neptune as the backing store. You could further enrich it by running Amazon Comprehend or Amazon Rekognition on the text and images found and creating a search engine on top of Neptune.

As always, feel free to reach out in the comments or on twitter to provide any feedback!

Randall

timeShift(GrafanaBuzz, 1w) Issue 46

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2018/05/24/timeshiftgrafanabuzz-1w-issue-46/

Welcome to TimeShift The day has finally arrived; GDPR is officially in effect! These new policies are meant to provide more transparency about the data companies collect on users, and how that data is used. I for one am just excited that the onslaught of "We’ve updated our privacy policy" emails arriving in my pummeled inbox is nearing its end.
Grafana Labs is no exception. We encourage you to check out our privacy policy, and if you have any questions, feel free to contact us at [email protected]

The Practical Effects of GDPR at Backblaze

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/the-practical-effects-of-gdpr-at-backblaze/


GDPR day, May 25, 2018, is nearly here. On that day, will your inbox explode with update notices, opt-in agreements, and offers from lawyers searching for GDPR violators? Perhaps all the companies on earth that are not GDPR ready will just dissolve into dust. More likely, there will be some changes, but business as usual will continue and we’ll all be more aware of data privacy. Let’s go with the last one.

What’s Different With GDPR at Backblaze

The biggest difference you’ll notice is a completely updated Privacy Policy. Last week we sent out a service email announcing the new Privacy Policy. Some people asked what was different. Basically everything. About 95% of the agreement was rewritten. In the agreement, we added in the appropriate provisions required by GDPR, and hopefully did a better job specifying the data we collect from you, why we collect it, and what we are going to do with it.

As a reminder, at Backblaze your data falls into two catagories. The first type of data is the data you store with us — stored data. These are the files and objects you upload and store, and as needed, restore. We do not share this data. We do not process this data, except as requested by you to store and restore the data. We do not analyze this data looking for keywords, tags, images, etc. No one outside of Backblaze has access to this data unless you explicitly shared the data by providing that person access to one or more files.

The second type of data is your account data. Some of your account data is considered personal data. This is the information we collect from you to provide our Personal Backup, Business Backup and B2 Cloud Storage services. Examples include your email address to provide access to your account, or the name of your computer so we can organize your files like they are arranged on your computer to make restoration easier. We have written a number of Help Articles covering the different ways this information is collected and processed. In addition, these help articles outline the various “rights” granted via GDPR. We will continue to add help articles over the coming weeks to assist in making it easy to work with us to understand and exercise your rights.

What’s New With GDPR at Backblaze

The most obvious addition is the Data Processing Addendum (DPA). This covers how we protect the data you store with us, i.e. stored data. As noted above, we don’t do anything with your data, except store it and keep it safe until you need it. Now we have a separate document saying that.

It is important to note the new Data Processing Addendum is now incorporated by reference into our Terms of Service, which everyone agrees to when they sign up for any of our services. Now all of our customers have a shiny new Data Processing Agreement to go along with the updated Privacy Policy. We promise they are not long or complicated, and we encourage you to read them. If you have any questions, stop by our GDPR help section on our website.

Patience, Please

Every company we have dealt with over the last few months is working hard to comply with GDPR. It has been a tough road whether you tried to do it yourself or like Backblaze, hired an EU-based law firm for advice. Over the coming weeks and months as you reach out to discover and assert your rights, please have a little patience. We are all going through a steep learning curve as GDPR gets put into practice. Along the way there are certain to be some growing pains — give us a chance, we all want to get it right.

Regardless, at Backblaze we’ve been diligently protecting our customers’ data for over 11 years and nothing that will happen on May 25th will change that.

The post The Practical Effects of GDPR at Backblaze appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

RFC: LWN’s draft updated privacy policy

Post Syndicated from corbet original https://lwn.net/Articles/755089/rss

It is the season for web sites to be updating their privacy policies and
obtaining consent from their users for whatever data they collect. LWN,
being short of staff with the time or interest to work in this area, is
rather late to this game. The first step is an updated
privacy policy, which we’re now putting out for review. Little has changed
from the current version; we still don’t
collect much data, share data with others, or attempt to
monetize what we have in any way. We would like to ask interested readers
to have a look and let us know about any potential problems they see.

ExtraTorrent Replacement Displays Warning On Predecessor’s Shutdown Anniversary

Post Syndicated from Andy original https://torrentfreak.com/extratorrent-replacement-displays-warning-on-predecessors-shutdown-anniversary-180518/

Exactly one year ago, millions of users in the BitTorrent community went into mourning with the shock depature of one of its major players.

ExtraTorrent was founded in back in November 2006, at a time when classic platforms such as TorrentSpy and Mininova were dominating the torrent site landscape. But with dedication and determination, the site amassed millions of daily visitors, outperforming every other torrent site apart from the mighty Pirate Bay.

Then, on May 17, 2017, everything came crashing down.

“ExtraTorrent has shut down permanently,” a note in the site read. “ExtraTorrent with all mirrors goes offline. We permanently erase all data. Stay away from fake ExtraTorrent websites and clones. Thx to all ET supporters and torrent community. ET was a place to be….”

While ExtraTorrent staff couldn’t be more clear in advising people to stay away from clones, few people listened to their warnings. Within hours, new sites appeared claiming to be official replacements for the much-loved torrent site and people flocked to them in their millions.

One of those was ExtraTorrent.ag, a torrent site connected to the operators of EZTV.ag, which appeared as a replacement in the wake of the official EZTV’s demise. Graphically very similar to the original ExtraTorrent, the .ag ‘replacement’ had none of its namesake’s community or unique content. But that didn’t dent its popularity.

ExtraTorrent.ag

At the start of this week, ExtraTorrent.ag was one of the most popular torrent sites on the Internet. With an Alexa rank of around 2,200, it would’ve clinched ninth position in our Top 10 Torrent Sites report earlier this year. However, after registering the site’s domain a year ago, something seems to have gone wrong.

Yesterday, on the anniversary of ExtraTorrent’s shutdown and exactly a year after the ExtraTorrent.ag domain was registered, ExtraTorrent.ag disappeared only to be replaced by a generic landing page, as shown below.

ExtraTorrent.ag landing page

This morning, however, there appear to be additional complications. Accessing with Firefox produces the page above but attempting to do so with Chrome produces an ominous security warning.

Chrome warning

Indeed, those protected by MalwareBytes won’t be able to access the page at all, since ExtraTorrent.ag redirects to the domain FindBetterResults.com, which the anti-malware app flags as malicious.

The change was reported to TF by the operator of domain unblocking site Unblocked.lol, which offers torrent site proxies as well as access to live TV and sports.

“I noticed when I started receiving emails saying ExtraTorrent was redirecting to some parked domain. When I jumped on the PC and checked myself it was just redirecting to a blank page,” he informs us.

“First I thought they’d blocked our IP address so I used some different ones. But I soon discovered the domain was in fact parked.”

So what has happened to this previously-functioning domain?

Whois records show that ExtraTorrent.ag was created on May 17, 2017 and appears to have been registered for a year. Yesterday, on May 17, 2018, the domain was updated to list what could potentially be a new owner, with an expiry date of May 17, 2019.

Once domains have expired, they usually enter an ‘Auto-Renew Grace Period’ for up to 45 days. This is followed by a 30-day ‘Redemption Grace Period’. At the end of this second period, domains cannot be renewed and are released for third-parties to register. That doesn’t appear to have been the case here.

So, to find out more about the sudden changes we reached out to the email address listed in the WHOIS report but received no response. Should we hear more we’ll update this report but in the meantime the Internet has lost one of its largest torrent sites and gained a rather pointless landing page with potential security risks.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

[$] A Gilectomy update

Post Syndicated from jake original https://lwn.net/Articles/754577/rss

In a rather short session at the 2018 Python Language Summit, Larry
Hastings updated attendees on the status of his Gilectomy project. The aim of that effort is
to remove the global
interpreter lock (GIL) from CPython. Since his status report at last year’s summit, little
has happened, which is part of why the session was so short. He hasn’t
given up on the overall idea, but it needs a new approach.

From Framework to Function: Deploying AWS Lambda Functions for Java 8 using Apache Maven Archetype

Post Syndicated from Ryosuke Iwanaga original https://aws.amazon.com/blogs/compute/from-framework-to-function-deploying-aws-lambda-functions-for-java-8-using-apache-maven-archetype/

As a serverless computing platform that supports Java 8 runtime, AWS Lambda makes it easy to run any type of Java function simply by uploading a JAR file. To help define not only a Lambda serverless application but also Amazon API Gateway, Amazon DynamoDB, and other related services, the AWS Serverless Application Model (SAM) allows developers to use a simple AWS CloudFormation template.

AWS provides the AWS Toolkit for Eclipse that supports both Lambda and SAM. AWS also gives customers an easy way to create Lambda functions and SAM applications in Java using the AWS Command Line Interface (AWS CLI). After you build a JAR file, all you have to do is type the following commands:

aws cloudformation package 
aws cloudformation deploy

To consolidate these steps, customers can use Archetype by Apache Maven. Archetype uses a predefined package template that makes getting started to develop a function exceptionally simple.

In this post, I introduce a Maven archetype that allows you to create a skeleton of AWS SAM for a Java function. Using this archetype, you can generate a sample Java code example and an accompanying SAM template to deploy it on AWS Lambda by a single Maven action.

Prerequisites

Make sure that the following software is installed on your workstation:

  • Java
  • Maven
  • AWS CLI
  • (Optional) AWS SAM CLI

Install Archetype

After you’ve set up those packages, install Archetype with the following commands:

git clone https://github.com/awslabs/aws-serverless-java-archetype
cd aws-serverless-java-archetype
mvn install

These are one-time operations, so you don’t run them for every new package. If you’d like, you can add Archetype to your company’s Maven repository so that other developers can use it later.

With those packages installed, you’re ready to develop your new Lambda Function.

Start a project

Now that you have the archetype, customize it and run the code:

cd /path/to/project_home
mvn archetype:generate \
  -DarchetypeGroupId=com.amazonaws.serverless.archetypes \
  -DarchetypeArtifactId=aws-serverless-java-archetype \
  -DarchetypeVersion=1.0.0 \
  -DarchetypeRepository=local \ # Forcing to use local maven repository
  -DinteractiveMode=false \ # For batch mode
  # You can also specify properties below interactively if you omit the line for batch mode
  -DgroupId=YOUR_GROUP_ID \
  -DartifactId=YOUR_ARTIFACT_ID \
  -Dversion=YOUR_VERSION \
  -DclassName=YOUR_CLASSNAME

You should have a directory called YOUR_ARTIFACT_ID that contains the files and folders shown below:

├── event.json
├── pom.xml
├── src
│   └── main
│       ├── java
│       │   └── Package
│       │       └── Example.java
│       └── resources
│           └── log4j2.xml
└── template.yaml

The sample code is a working example. If you install SAM CLI, you can invoke it just by the command below:

cd YOUR_ARTIFACT_ID
mvn -P invoke verify
[INFO] Scanning for projects...
[INFO]
[INFO] ---------------------------< com.riywo:foo >----------------------------
[INFO] Building foo 1.0
[INFO] --------------------------------[ jar ]---------------------------------
...
[INFO] --- maven-jar-plugin:3.0.2:jar (default-jar) @ foo ---
[INFO] Building jar: /private/tmp/foo/target/foo-1.0.jar
[INFO]
[INFO] --- maven-shade-plugin:3.1.0:shade (shade) @ foo ---
[INFO] Including com.amazonaws:aws-lambda-java-core:jar:1.2.0 in the shaded jar.
[INFO] Replacing /private/tmp/foo/target/lambda.jar with /private/tmp/foo/target/foo-1.0-shaded.jar
[INFO]
[INFO] --- exec-maven-plugin:1.6.0:exec (sam-local-invoke) @ foo ---
2018/04/06 16:34:35 Successfully parsed template.yaml
2018/04/06 16:34:35 Connected to Docker 1.37
2018/04/06 16:34:35 Fetching lambci/lambda:java8 image for java8 runtime...
java8: Pulling from lambci/lambda
Digest: sha256:14df0a5914d000e15753d739612a506ddb8fa89eaa28dcceff5497d9df2cf7aa
Status: Image is up to date for lambci/lambda:java8
2018/04/06 16:34:37 Invoking Package.Example::handleRequest (java8)
2018/04/06 16:34:37 Decompressing /tmp/foo/target/lambda.jar
2018/04/06 16:34:37 Mounting /private/var/folders/x5/ldp7c38545v9x5dg_zmkr5kxmpdprx/T/aws-sam-local-1523000077594231063 as /var/task:ro inside runtime container
START RequestId: a6ae19fe-b1b0-41e2-80bc-68a40d094d74 Version: $LATEST
Log output: Greeting is 'Hello Tim Wagner.'
END RequestId: a6ae19fe-b1b0-41e2-80bc-68a40d094d74
REPORT RequestId: a6ae19fe-b1b0-41e2-80bc-68a40d094d74	Duration: 96.60 ms	Billed Duration: 100 ms	Memory Size: 128 MB	Max Memory Used: 7 MB

{"greetings":"Hello Tim Wagner."}


[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10.452 s
[INFO] Finished at: 2018-04-06T16:34:40+09:00
[INFO] ------------------------------------------------------------------------

This maven goal invokes sam local invoke -e event.json, so you can see the sample output to greet Tim Wagner.

To deploy this application to AWS, you need an Amazon S3 bucket to upload your package. You can use the following command to create a bucket if you want:

aws s3 mb s3://YOUR_BUCKET --region YOUR_REGION

Now, you can deploy your application by just one command!

mvn deploy \
    -DawsRegion=YOUR_REGION \
    -Ds3Bucket=YOUR_BUCKET \
    -DstackName=YOUR_STACK
[INFO] Scanning for projects...
[INFO]
[INFO] ---------------------------< com.riywo:foo >----------------------------
[INFO] Building foo 1.0
[INFO] --------------------------------[ jar ]---------------------------------
...
[INFO] --- exec-maven-plugin:1.6.0:exec (sam-package) @ foo ---
Uploading to aws-serverless-java/com.riywo:foo:1.0/924732f1f8e4705c87e26ef77b080b47  11657 / 11657.0  (100.00%)
Successfully packaged artifacts and wrote output template to file target/sam.yaml.
Execute the following command to deploy the packaged template
aws cloudformation deploy --template-file /private/tmp/foo/target/sam.yaml --stack-name <YOUR STACK NAME>
[INFO]
[INFO] --- maven-deploy-plugin:2.8.2:deploy (default-deploy) @ foo ---
[INFO] Skipping artifact deployment
[INFO]
[INFO] --- exec-maven-plugin:1.6.0:exec (sam-deploy) @ foo ---

Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - archetype
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 37.176 s
[INFO] Finished at: 2018-04-06T16:41:02+09:00
[INFO] ------------------------------------------------------------------------

Maven automatically creates a shaded JAR file, uploads it to your S3 bucket, replaces template.yaml, and creates and updates the CloudFormation stack.

To customize the process, modify the pom.xml file. For example, to avoid typing values for awsRegion, s3Bucket or stackName, write them inside pom.xml and check in your VCS. Afterward, you and the rest of your team can deploy the function by typing just the following command:

mvn deploy

Options

Lambda Java 8 runtime has some types of handlers: POJO, Simple type and Stream. The default option of this archetype is POJO style, which requires to create request and response classes, but they are baked by the archetype by default. If you want to use other type of handlers, you can use handlerType property like below:

## POJO type (default)
mvn archetype:generate \
 ...
 -DhandlerType=pojo

## Simple type - String
mvn archetype:generate \
 ...
 -DhandlerType=simple

### Stream type
mvn archetype:generate \
 ...
 -DhandlerType=stream

See documentation for more details about handlers.

Also, Lambda Java 8 runtime supports two types of Logging class: Log4j 2 and LambdaLogger. This archetype creates LambdaLogger implementation by default, but you can use Log4j 2 if you want:

## LambdaLogger (default)
mvn archetype:generate \
 ...
 -Dlogger=lambda

## Log4j 2
mvn archetype:generate \
 ...
 -Dlogger=log4j2

If you use LambdaLogger, you can delete ./src/main/resources/log4j2.xml. See documentation for more details.

Conclusion

So, what’s next? Develop your Lambda function locally and type the following command: mvn deploy !

With this Archetype code example, available on GitHub repo, you should be able to deploy Lambda functions for Java 8 in a snap. If you have any questions or comments, please submit them below or leave them on GitHub.

The plan for merging CoreOS into Red Hat

Post Syndicated from corbet original https://lwn.net/Articles/754058/rss

The CoreOS blog is carrying an
article
describing the path forward now that CoreOS is owned by Red
Hat. “Since Red Hat’s acquisition of CoreOS was announced, we
received questions on the fate of Container Linux. CoreOS’s first project,
and initially its namesake, pioneered the lightweight, ‘over-the-air’
automatically updated container native operating system that fast rose in
popularity running the world’s containers. With the acquisition, Container
Linux will be reborn as Red Hat CoreOS, a new entry into the Red Hat
ecosystem. Red Hat CoreOS will be based on Fedora and Red Hat Enterprise
Linux sources and is expected to ultimately supersede Atomic Host as Red
Hat’s immutable, container-centric operating system.
” Some
information can also be found in this
Red Hat press release
.

What’s new in HiveMQ 3.4

Post Syndicated from The HiveMQ Team original https://www.hivemq.com/whats-new-in-hivemq-3-4

We are pleased to announce the release of HiveMQ 3.4. This version of HiveMQ is the most resilient and advanced version of HiveMQ ever. The main focus in this release was directed towards addressing the needs for the most ambitious MQTT deployments in the world for maximum performance and resilience for millions of concurrent MQTT clients. Of course, deployments of all sizes can profit from the improvements in the latest and greatest HiveMQ.

This version is a drop-in replacement for HiveMQ 3.3 and of course supports rolling upgrades with zero-downtime.

HiveMQ 3.4 brings many features that your users, administrators and plugin developers are going to love. These are the highlights:

 

New HiveMQ 3.4 features at a glance

Cluster

HiveMQ 3.4 brings various improvements in terms of scalability, availability, resilience and observability for the cluster mechanism. Many of the new features remain under the hood, but several additions stand out:

Cluster Overload Protection

The new version has a first-of-its-kind Cluster Overload Protection. The whole cluster is able to spot MQTT clients that cause overload on nodes or the cluster as a whole and protects itself from the overload. This mechanism also protects the deployment from cascading failures due to slow or failing underlying hardware (as sometimes seen on cloud providers). This feature is enabled by default and you can learn more about the mechanism in our documentation.

Dynamic Replicates

HiveMQ’s sophisticated cluster mechanism is able to scale in a linear fashion due to extremely efficient and true data distribution mechanics based on a configured replication factor. The most important aspect of every cluster is availability, which is achieved by having eventual consistency functions in place for edge cases. The 3.4 version adds dynamic replicates to the cluster so even the most challenging edge cases involving network splits don’t lead to the sacrifice of consistency for the most important MQTT operations.

Node Stress Level Metrics

All MQTT cluster nodes are now aware of their own stress level and the stress levels of other cluster members. While all stress mitigation is handled internally by HiveMQ, experienced operators may want to monitor the individual node’s stress level (e.g with Grafana) in order to start investigating what caused the increase of load.

WebUI

Operators worldwide love the HiveMQ WebUI introduced with HiveMQ 3.3. We gathered all the fantastic feedback from our users and polished the WebUI, so it’s even more useful for day-to-day broker operations and remote debugging of MQTT clients. The most important changes and additions are:

Trace Recording Download

The unique Trace Recordings functionality is without doubt a lifesaver when the behavior of individual MQTT clients needs further investigation as all interactions with the broker can be traced — at runtime and at scale! Huge production deployments may accumulate multiple gigabytes of trace recordings. HiveMQ now offers a convenient way to collect all trace recordings from all nodes, zips them and allows the download via a simple button on the WebUI. Remote debugging was never easier!

Additional Client Detail Information in WebUI

The mission of the HiveMQ WebUI is to provide easy insights to the whole production MQTT cluster for operators and administrators. Individual MQTT client investigations are a piece of cake, as all available information about clients can be viewed in detail. We further added the ability to view the restrictions a concrete client has:

  • Maximum Inflight Queue Size
  • Client Offline Queue Messages Size
  • Client Offline Message Drop Strategy

Session Invalidation

MQTT persistent sessions are one of the outstanding features of the MQTT protocol specification. Sessions which do not expire but are never reused unnecessarily consume disk space and memory. Administrators can now invalidate individual session directly in the HiveMQ WebUI for client sessions, which can be deleted safely. HiveMQ 3.4 will take care and release the resources on all cluster nodes after a session was invalidated

Web UI Polishing

Most texts on the WebUI were revisited and are now clearer and crisper. The help texts also received a major overhaul and should now be more, well, helpful. In addition, many small improvements were added, which are most of the time invisible but are here to help when you need them most. For example, the WebUI now displays a warning if cluster nodes with old versions are in the cluster (which may happen if a rolling upgrade was not finished properly)

Plugin System

One of the most popular features of HiveMQ is the extensive Plugin System, which virtually enables the integration of HiveMQ to any system and allows hooking into all aspects of the MQTT lifecycle. We listened to the feedback and are pleased to announce many improvements, big and small, for the Plugin System:

Client Session Time-to-live for individual clients

HiveMQ 3.3 offered a global configuration for setting the Time-To-Live for MQTT sessions. With the advent of HiveMQ 3.4, users can now programmatically set Time-To-Live values for individual MQTT clients and can discard a MQTT session immediately.

Individual Inflight Queues

While the Inflight Queue configuration is typically sufficient in the HiveMQ default configuration, there are some use cases that require the adjustment of this configuration. It’s now possible to change the Inflight Queue size for individual clients via the Plugin System.
 
 

Plugin Service Overload Protection

The HiveMQ Plugin System is a power-user tool and it’s possible to do unbelievably useful modifications as well as putting major stress on the system as a whole if the programmer is not careful. In order to protect the HiveMQ instances from accidental overload, a Plugin Service Overload Protection can be configured. This rate limits the Plugin Service usage and gives feedback to the application programmer in case the rate limit is exceeded. This feature is disabled by default but we strongly recommend updating your plugins to profit from this feature.

Session Attribute Store putIfNewer

This is one of the small bits you almost never need but when you do, you’re ecstatic for being able to use it. The Session Attribute Store now offers methods to put values, if the values you want to put are newer or fresher than the values already written. This is extremely useful, if multiple cluster nodes want to write to the Session Attribute Store simultaneously, as this guarantees that outdated values can no longer overwrite newer values.
 
 
 
 

Disconnection Timestamp for OnDisconnectCallback

As the OnDisconnectCallback is executed asynchronously, the client might already be gone when the callback is executed. It’s now easy to obtain the exact timestamp when a MQTT client disconnected, even if the callback is executed later on. This feature might be very interesting for many plugin developers in conjunction with the Session Attribute Store putIfNewer functionality.

Operations

We ❤️ Operators and we strive to provide all the tools needed for operating and administrating a MQTT broker cluster at scale in any environment. A key strategy for successful operations of any system is monitoring. We added some interesting new metrics you might find useful.

System Metrics

In addition to JVM Metrics, HiveMQ now also gathers Operating System Metrics for Linux Systems. So HiveMQ is able to see for itself how the operating system views the process, including native memory, the real CPU usage, and open file usage. These metrics are particularly useful, if you don’t have a monitoring agent for Linux systems setup. All metrics can be found here.

Client Disconnection Metrics

The reality of many MQTT scenarios is that not all clients are able to disconnect gracefully by sending MQTT DISCONNECT messages. HiveMQ now also exposes metrics about clients that disconnected by closing the TCP connection instead of sending a DISCONNECT packet first. This is especially useful for monitoring, if you regularly deal with clients that don’t have a stable connection to the MQTT brokers.

 

JMX enabled by default

JMX, the Java Monitoring Extension, is now enabled by default. Many HiveMQ operators use Application Performance Monitoring tools, which are able to hook into the metrics via JMX or use plain JMX for on-the-fly debugging. While we recommend to use official off-the-shelf plugins for monitoring, it’s now easier than ever to just use JMX if other solutions are not available to you.

Other notable improvements

The 3.4 release of HiveMQ is full of hidden gems and improvements. While it would be too much to highlight all small improvements, these notable changes stand out and contribute to the best HiveMQ release ever.

Topic Level Distribution Configuration

Our recommendation for all huge deployments with millions of devices is: Start with separate topic prefixes by bringing the dynamic topic parts directly to the beginning. The reality is that many customers have topics that are constructed like the following: “devices/{deviceId}/status”. So what happens is that all topics in this example start with a common prefix, “devices”, which is the first topic level. Unfortunately the first topic level doesn’t include a dynamic topic part. In order to guarantee the best scalability of the cluster and the best performance of the topic tree, customers can now configure how many topic levels are used for distribution. In the example outlined here, a topic level distribution of 2 would be perfect and guarantees the best scalability.

Mass disconnect performance improvements

Mass disconnections of MQTT clients can happen. This might be the case when e.g. a load balancer in front of the MQTT broker cluster drops the connections or if a mobile carrier experiences connectivity problems. Prior to HiveMQ 3.4, mass disconnect events caused stress on the cluster. Mass disconnect events are now massively optimized and even tens of millions of connection losses at the same time won’t bring the cluster into stress situations.

 
 
 
 
 
 

Replication Performance Improvements

Due to the distributed nature of a HiveMQ, data needs to be replicated across the cluster in certain events, e.g. when cluster topology changes occur. There are various internal improvements in HiveMQ version 3.4, which increase the replication performance significantly. Our engineers put special love into the replication of Queued Messages, which is now faster than ever, even for multiple millions of Queued Messages that need to be transferred across the cluster.

Updated Native SSL Libraries

The Native SSL Integration of HiveMQ was updated to the newest BoringSSL version. This results in better performance and increased security. In case you’re using SSL and you are not yet using the native SSL integration, we strongly recommend to give it a try, more than 40% performance improvement can be observed for most deployments.

 
 

Improvements for Java 9

While Java 9 was already supported for older HiveMQ versions, HiveMQ 3.4 has full-blown Java 9 support. The minimum Java version still remains Java 7, although we strongly recommend to use Java 8 or newer for the best performance of HiveMQ.

Analyze data in Amazon DynamoDB using Amazon SageMaker for real-time prediction

Post Syndicated from YongSeong Lee original https://aws.amazon.com/blogs/big-data/analyze-data-in-amazon-dynamodb-using-amazon-sagemaker-for-real-time-prediction/

Many companies across the globe use Amazon DynamoDB to store and query historical user-interaction data. DynamoDB is a fast NoSQL database used by applications that need consistent, single-digit millisecond latency.

Often, customers want to turn their valuable data in DynamoDB into insights by analyzing a copy of their table stored in Amazon S3. Doing this separates their analytical queries from their low-latency critical paths. This data can be the primary source for understanding customers’ past behavior, predicting future behavior, and generating downstream business value. Customers often turn to DynamoDB because of its great scalability and high availability. After a successful launch, many customers want to use the data in DynamoDB to predict future behaviors or provide personalized recommendations.

DynamoDB is a good fit for low-latency reads and writes, but it’s not practical to scan all data in a DynamoDB database to train a model. In this post, I demonstrate how you can use DynamoDB table data copied to Amazon S3 by AWS Data Pipeline to predict customer behavior. I also demonstrate how you can use this data to provide personalized recommendations for customers using Amazon SageMaker. You can also run ad hoc queries using Amazon Athena against the data. DynamoDB recently released on-demand backups to create full table backups with no performance impact. However, it’s not suitable for our purposes in this post, so I chose AWS Data Pipeline instead to create managed backups are accessible from other services.

To do this, I describe how to read the DynamoDB backup file format in Data Pipeline. I also describe how to convert the objects in S3 to a CSV format that Amazon SageMaker can read. In addition, I show how to schedule regular exports and transformations using Data Pipeline. The sample data used in this post is from Bank Marketing Data Set of UCI.

The solution that I describe provides the following benefits:

  • Separates analytical queries from production traffic on your DynamoDB table, preserving your DynamoDB read capacity units (RCUs) for important production requests
  • Automatically updates your model to get real-time predictions
  • Optimizes for performance (so it doesn’t compete with DynamoDB RCUs after the export) and for cost (using data you already have)
  • Makes it easier for developers of all skill levels to use Amazon SageMaker

All code and data set in this post are available in this .zip file.

Solution architecture

The following diagram shows the overall architecture of the solution.

The steps that data follows through the architecture are as follows:

  1. Data Pipeline regularly copies the full contents of a DynamoDB table as JSON into an S3
  2. Exported JSON files are converted to comma-separated value (CSV) format to use as a data source for Amazon SageMaker.
  3. Amazon SageMaker renews the model artifact and update the endpoint.
  4. The converted CSV is available for ad hoc queries with Amazon Athena.
  5. Data Pipeline controls this flow and repeats the cycle based on the schedule defined by customer requirements.

Building the auto-updating model

This section discusses details about how to read the DynamoDB exported data in Data Pipeline and build automated workflows for real-time prediction with a regularly updated model.

Download sample scripts and data

Before you begin, take the following steps:

  1. Download sample scripts in this .zip file.
  2. Unzip the src.zip file.
  3. Find the automation_script.sh file and edit it for your environment. For example, you need to replace 's3://<your bucket>/<datasource path>/' with your own S3 path to the data source for Amazon ML. In the script, the text enclosed by angle brackets—< and >—should be replaced with your own path.
  4. Upload the json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar file to your S3 path so that the ADD jar command in Apache Hive can refer to it.

For this solution, the banking.csv  should be imported into a DynamoDB table.

Export a DynamoDB table

To export the DynamoDB table to S3, open the Data Pipeline console and choose the Export DynamoDB table to S3 template. In this template, Data Pipeline creates an Amazon EMR cluster and performs an export in the EMRActivity activity. Set proper intervals for backups according to your business requirements.

One core node(m3.xlarge) provides the default capacity for the EMR cluster and should be suitable for the solution in this post. Leave the option to resize the cluster before running enabled in the TableBackupActivity activity to let Data Pipeline scale the cluster to match the table size. The process of converting to CSV format and renewing models happens in this EMR cluster.

For a more in-depth look at how to export data from DynamoDB, see Export Data from DynamoDB in the Data Pipeline documentation.

Add the script to an existing pipeline

After you export your DynamoDB table, you add an additional EMR step to EMRActivity by following these steps:

  1. Open the Data Pipeline console and choose the ID for the pipeline that you want to add the script to.
  2. For Actions, choose Edit.
  3. In the editing console, choose the Activities category and add an EMR step using the custom script downloaded in the previous section, as shown below.

Paste the following command into the new step after the data ­­upload step:

s3://#{myDDBRegion}.elasticmapreduce/libs/script-runner/script-runner.jar,s3://<your bucket name>/automation_script.sh,#{output.directoryPath},#{myDDBRegion}

The element #{output.directoryPath} references the S3 path where the data pipeline exports DynamoDB data as JSON. The path should be passed to the script as an argument.

The bash script has two goals, converting data formats and renewing the Amazon SageMaker model. Subsequent sections discuss the contents of the automation script.

Automation script: Convert JSON data to CSV with Hive

We use Apache Hive to transform the data into a new format. The Hive QL script to create an external table and transform the data is included in the custom script that you added to the Data Pipeline definition.

When you run the Hive scripts, do so with the -e option. Also, define the Hive table with the 'org.openx.data.jsonserde.JsonSerDe' row format to parse and read JSON format. The SQL creates a Hive EXTERNAL table, and it reads the DynamoDB backup data on the S3 path passed to it by Data Pipeline.

Note: You should create the table with the “EXTERNAL” keyword to avoid the backup data being accidentally deleted from S3 if you drop the table.

The full automation script for converting follows. Add your own bucket name and data source path in the highlighted areas.

#!/bin/bash
hive -e "
ADD jar s3://<your bucket name>/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar ; 
DROP TABLE IF EXISTS blog_backup_data ;
CREATE EXTERNAL TABLE blog_backup_data (
 customer_id map<string,string>,
 age map<string,string>, job map<string,string>, 
 marital map<string,string>,education map<string,string>, 
 default map<string,string>, housing map<string,string>,
 loan map<string,string>, contact map<string,string>, 
 month map<string,string>, day_of_week map<string,string>, 
 duration map<string,string>, campaign map<string,string>,
 pdays map<string,string>, previous map<string,string>, 
 poutcome map<string,string>, emp_var_rate map<string,string>, 
 cons_price_idx map<string,string>, cons_conf_idx map<string,string>,
 euribor3m map<string,string>, nr_employed map<string,string>, 
 y map<string,string> ) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' 
LOCATION '$1/';

INSERT OVERWRITE DIRECTORY 's3://<your bucket name>/<datasource path>/' 
SELECT concat( customer_id['s'],',', 
 age['n'],',', job['s'],',', 
 marital['s'],',', education['s'],',', default['s'],',', 
 housing['s'],',', loan['s'],',', contact['s'],',', 
 month['s'],',', day_of_week['s'],',', duration['n'],',', 
 campaign['n'],',',pdays['n'],',',previous['n'],',', 
 poutcome['s'],',', emp_var_rate['n'],',', cons_price_idx['n'],',',
 cons_conf_idx['n'],',', euribor3m['n'],',', nr_employed['n'],',', y['n'] ) 
FROM blog_backup_data
WHERE customer_id['s'] > 0 ; 

After creating an external table, you need to read data. You then use the INSERT OVERWRITE DIRECTORY ~ SELECT command to write CSV data to the S3 path that you designated as the data source for Amazon SageMaker.

Depending on your requirements, you can eliminate or process the columns in the SELECT clause in this step to optimize data analysis. For example, you might remove some columns that have unpredictable correlations with the target value because keeping the wrong columns might expose your model to “overfitting” during the training. In this post, customer_id  columns is removed. Overfitting can make your prediction weak. More information about overfitting can be found in the topic Model Fit: Underfitting vs. Overfitting in the Amazon ML documentation.

Automation script: Renew the Amazon SageMaker model

After the CSV data is replaced and ready to use, create a new model artifact for Amazon SageMaker with the updated dataset on S3.  For renewing model artifact, you must create a new training job.  Training jobs can be run using the AWS SDK ( for example, Amazon SageMaker boto3 ) or the Amazon SageMaker Python SDK that can be installed with “pip install sagemaker” command as well as the AWS CLI for Amazon SageMaker described in this post.

In addition, consider how to smoothly renew your existing model without service impact, because your model is called by applications in real time. To do this, you need to create a new endpoint configuration first and update a current endpoint with the endpoint configuration that is just created.

#!/bin/bash
## Define variable 
REGION=$2
DTTIME=`date +%Y-%m-%d-%H-%M-%S`
ROLE="<your AmazonSageMaker-ExecutionRole>" 


# Select containers image based on region.  
case "$REGION" in
"us-west-2" )
    IMAGE="174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:latest"
    ;;
"us-east-1" )
    IMAGE="382416733822.dkr.ecr.us-east-1.amazonaws.com/linear-learner:latest" 
    ;;
"us-east-2" )
    IMAGE="404615174143.dkr.ecr.us-east-2.amazonaws.com/linear-learner:latest" 
    ;;
"eu-west-1" )
    IMAGE="438346466558.dkr.ecr.eu-west-1.amazonaws.com/linear-learner:latest" 
    ;;
 *)
    echo "Invalid Region Name"
    exit 1 ;  
esac

# Start training job and creating model artifact 
TRAINING_JOB_NAME=TRAIN-${DTTIME} 
S3OUTPUT="s3://<your bucket name>/model/" 
INSTANCETYPE="ml.m4.xlarge"
INSTANCECOUNT=1
VOLUMESIZE=5 
aws sagemaker create-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}  --algorithm-specification TrainingImage=${IMAGE},TrainingInputMode=File --role-arn ${ROLE}  --input-data-config '[{ "ChannelName": "train", "DataSource": { "S3DataSource": { "S3DataType": "S3Prefix", "S3Uri": "s3://<your bucket name>/<datasource path>/", "S3DataDistributionType": "FullyReplicated" } }, "ContentType": "text/csv", "CompressionType": "None" , "RecordWrapperType": "None"  }]'  --output-data-config S3OutputPath=${S3OUTPUT} --resource-config  InstanceType=${INSTANCETYPE},InstanceCount=${INSTANCECOUNT},VolumeSizeInGB=${VOLUMESIZE} --stopping-condition MaxRuntimeInSeconds=120 --hyper-parameters feature_dim=20,predictor_type=binary_classifier  

# Wait until job completed 
aws sagemaker wait training-job-completed-or-stopped --training-job-name ${TRAINING_JOB_NAME}  --region ${REGION}

# Get newly created model artifact and create model
MODELARTIFACT=`aws sagemaker describe-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}  --query 'ModelArtifacts.S3ModelArtifacts' --output text `
MODELNAME=MODEL-${DTTIME}
aws sagemaker create-model --region ${REGION} --model-name ${MODELNAME}  --primary-container Image=${IMAGE},ModelDataUrl=${MODELARTIFACT}  --execution-role-arn ${ROLE}

# create a new endpoint configuration 
CONFIGNAME=CONFIG-${DTTIME}
aws sagemaker  create-endpoint-config --region ${REGION} --endpoint-config-name ${CONFIGNAME}  --production-variants  VariantName=Users,ModelName=${MODELNAME},InitialInstanceCount=1,InstanceType=ml.m4.xlarge

# create or update the endpoint
STATUS=`aws sagemaker describe-endpoint --endpoint-name  ServiceEndpoint --query 'EndpointStatus' --output text --region ${REGION} `
if [[ $STATUS -ne "InService" ]] ;
then
    aws sagemaker  create-endpoint --endpoint-name  ServiceEndpoint  --endpoint-config-name ${CONFIGNAME} --region ${REGION}    
else
    aws sagemaker  update-endpoint --endpoint-name  ServiceEndpoint  --endpoint-config-name ${CONFIGNAME} --region ${REGION}
fi

Grant permission

Before you execute the script, you must grant proper permission to Data Pipeline. Data Pipeline uses the DataPipelineDefaultResourceRole role by default. I added the following policy to DataPipelineDefaultResourceRole to allow Data Pipeline to create, delete, and update the Amazon SageMaker model and data source in the script.

{
 "Version": "2012-10-17",
 "Statement": [
 {
 "Effect": "Allow",
 "Action": [
 "sagemaker:CreateTrainingJob",
 "sagemaker:DescribeTrainingJob",
 "sagemaker:CreateModel",
 "sagemaker:CreateEndpointConfig",
 "sagemaker:DescribeEndpoint",
 "sagemaker:CreateEndpoint",
 "sagemaker:UpdateEndpoint",
 "iam:PassRole"
 ],
 "Resource": "*"
 }
 ]
}

Use real-time prediction

After you deploy a model into production using Amazon SageMaker hosting services, your client applications use this API to get inferences from the model hosted at the specified endpoint. This approach is useful for interactive web, mobile, or desktop applications.

Following, I provide a simple Python code example that queries against Amazon SageMaker endpoint URL with its name (“ServiceEndpoint”) and then uses them for real-time prediction.

=== Python sample for real-time prediction ===

#!/usr/bin/env python
import boto3
import json 

client = boto3.client('sagemaker-runtime', region_name ='<your region>' )
new_customer_info = '34,10,2,4,1,2,1,1,6,3,190,1,3,4,3,-1.7,94.055,-39.8,0.715,4991.6'
response = client.invoke_endpoint(
    EndpointName='ServiceEndpoint',
    Body=new_customer_info, 
    ContentType='text/csv'
)
result = json.loads(response['Body'].read().decode())
print(result)
--- output(response) ---
{u'predictions': [{u'score': 0.7528127431869507, u'predicted_label': 1.0}]}

Solution summary

The solution takes the following steps:

  1. Data Pipeline exports DynamoDB table data into S3. The original JSON data should be kept to recover the table in the rare event that this is needed. Data Pipeline then converts JSON to CSV so that Amazon SageMaker can read the data.Note: You should select only meaningful attributes when you convert CSV. For example, if you judge that the “campaign” attribute is not correlated, you can eliminate this attribute from the CSV.
  2. Train the Amazon SageMaker model with the new data source.
  3. When a new customer comes to your site, you can judge how likely it is for this customer to subscribe to your new product based on “predictedScores” provided by Amazon SageMaker.
  4. If the new user subscribes your new product, your application must update the attribute “y” to the value 1 (for yes). This updated data is provided for the next model renewal as a new data source. It serves to improve the accuracy of your prediction. With each new entry, your application can become smarter and deliver better predictions.

Running ad hoc queries using Amazon Athena

Amazon Athena is a serverless query service that makes it easy to analyze large amounts of data stored in Amazon S3 using standard SQL. Athena is useful for examining data and collecting statistics or informative summaries about data. You can also use the powerful analytic functions of Presto, as described in the topic Aggregate Functions of Presto in the Presto documentation.

With the Data Pipeline scheduled activity, recent CSV data is always located in S3 so that you can run ad hoc queries against the data using Amazon Athena. I show this with example SQL statements following. For an in-depth description of this process, see the post Interactive SQL Queries for Data in Amazon S3 on the AWS News Blog. 

Creating an Amazon Athena table and running it

Simply, you can create an EXTERNAL table for the CSV data on S3 in Amazon Athena Management Console.

=== Table Creation ===
CREATE EXTERNAL TABLE datasource (
 age int, 
 job string, 
 marital string , 
 education string, 
 default string, 
 housing string, 
 loan string, 
 contact string, 
 month string, 
 day_of_week string, 
 duration int, 
 campaign int, 
 pdays int , 
 previous int , 
 poutcome string, 
 emp_var_rate double, 
 cons_price_idx double,
 cons_conf_idx double, 
 euribor3m double, 
 nr_employed double, 
 y int 
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n' 
LOCATION 's3://<your bucket name>/<datasource path>/';

The following query calculates the correlation coefficient between the target attribute and other attributes using Amazon Athena.

=== Sample Query ===

SELECT corr(age,y) AS correlation_age_and_target, 
 corr(duration,y) AS correlation_duration_and_target, 
 corr(campaign,y) AS correlation_campaign_and_target,
 corr(contact,y) AS correlation_contact_and_target
FROM ( SELECT age , duration , campaign , y , 
 CASE WHEN contact = 'telephone' THEN 1 ELSE 0 END AS contact 
 FROM datasource 
 ) datasource ;

Conclusion

In this post, I introduce an example of how to analyze data in DynamoDB by using table data in Amazon S3 to optimize DynamoDB table read capacity. You can then use the analyzed data as a new data source to train an Amazon SageMaker model for accurate real-time prediction. In addition, you can run ad hoc queries against the data on S3 using Amazon Athena. I also present how to automate these procedures by using Data Pipeline.

You can adapt this example to your specific use case at hand, and hopefully this post helps you accelerate your development. You can find more examples and use cases for Amazon SageMaker in the video AWS 2017: Introducing Amazon SageMaker on the AWS website.

 


Additional Reading

If you found this post useful, be sure to check out Serving Real-Time Machine Learning Predictions on Amazon EMR and Analyzing Data in S3 using Amazon Athena.

 


About the Author

Yong Seong Lee is a Cloud Support Engineer for AWS Big Data Services. He is interested in every technology related to data/databases and helping customers who have difficulties in using AWS services. His motto is “Enjoy life, be curious and have maximum experience.”

 

 

Hard Drive Stats for Q1 2018

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-stats-for-q1-2018/

Backblaze Drive Stats Q1 2018

As of March 31, 2018 we had 100,110 spinning hard drives. Of that number, there were 1,922 boot drives and 98,188 data drives. This review looks at the quarterly and lifetime statistics for the data drive models in operation in our data centers. We’ll also take a look at why we are collecting and reporting 10 new SMART attributes and take a sneak peak at some 8 TB Toshiba drives. Along the way, we’ll share observations and insights on the data presented and we look forward to you doing the same in the comments.

Background

Since April 2013, Backblaze has recorded and saved daily hard drive statistics from the drives in our data centers. Each entry consists of the date, manufacturer, model, serial number, status (operational or failed), and all of the SMART attributes reported by that drive. Currently there are about 97 million entries totaling 26 GB of data. You can download this data from our website if you want to do your own research, but for starters here’s what we found.

Hard Drive Reliability Statistics for Q1 2018

At the end of Q1 2018 Backblaze was monitoring 98,188 hard drives used to store data. For our evaluation below we remove from consideration those drives which were used for testing purposes and those drive models for which we did not have at least 45 drives. This leaves us with 98,046 hard drives. The table below covers just Q1 2018.

Q1 2018 Hard Drive Failure Rates

Notes and Observations

If a drive model has a failure rate of 0%, it only means there were no drive failures of that model during Q1 2018.

The overall Annualized Failure Rate (AFR) for Q1 is just 1.2%, well below the Q4 2017 AFR of 1.65%. Remember that quarterly failure rates can be volatile, especially for models that have a small number of drives and/or a small number of Drive Days.

There were 142 drives (98,188 minus 98,046) that were not included in the list above because we did not have at least 45 of a given drive model. We use 45 drives of the same model as the minimum number when we report quarterly, yearly, and lifetime drive statistics.

Welcome Toshiba 8TB drives, almost…

We mentioned Toshiba 8 TB drives in the first paragraph, but they don’t show up in the Q1 Stats chart. What gives? We only had 20 of the Toshiba 8 TB drives in operation in Q1, so they were excluded from the chart. Why do we have only 20 drives? When we test out a new drive model we start with the “tome test” and it takes 20 drives to fill one tome. A tome is the same drive model in the same logical position in each of the 20 Storage Pods that make up a Backblaze Vault. There are 60 tomes in each vault.

In this test, we created a Backblaze Vault of 8 TB drives, with 59 of the tomes being Seagate 8 TB drives and 1 tome being the Toshiba drives. Then we monitored the performance of the vault and its member tomes to see if, in this case, the Toshiba drives performed as expected.

Q1 2018 Hard Drive Failure Rate — Toshiba 8TB

So far the Toshiba drive is performing fine, but they have been in place for only 20 days. Next up is the “pod test” where we fill a Storage Pod with Toshiba drives and integrate it into a Backblaze Vault comprised of like-sized drives. We hope to have a better look at the Toshiba 8 TB drives in our Q2 report — stay tuned.

Lifetime Hard Drive Reliability Statistics

While the quarterly chart presented earlier gets a lot of interest, the real test of any drive model is over time. Below is the lifetime failure rate chart for all the hard drive models which have 45 or more drives in operation as of March 31st, 2018. For each model, we compute their reliability starting from when they were first installed.

Lifetime Hard Drive Failure Rates

Notes and Observations

The failure rates of all of the larger drives (8-, 10- and 12 TB) are very good, 1.2% AFR (Annualized Failure Rate) or less. Many of these drives were deployed in the last year, so there is some volatility in the data, but you can use the Confidence Interval to get a sense of the failure percentage range.

The overall failure rate of 1.84% is the lowest we have ever achieved, besting the previous low of 2.00% from the end of 2017.

Our regular readers and drive stats wonks may have noticed a sizable jump in the number of HGST 8 TB drives (model: HUH728080ALE600), from 45 last quarter to 1,045 this quarter. As the 10 TB and 12 TB drives become more available, the price per terabyte of the 8 TB drives has gone down. This presented an opportunity to purchase the HGST drives at a price in line with our budget.

We purchased and placed into service the 45 original HGST 8 TB drives in Q2 of 2015. They were our first Helium-filled drives and our only ones until the 10 TB and 12 TB Seagate drives arrived in Q3 2017. We’ll take a first look into whether or not Helium makes a difference in drive failure rates in an upcoming blog post.

New SMART Attributes

If you have previously worked with the hard drive stats data or plan to, you’ll notice that we added 10 more columns of data starting in 2018. There are 5 new SMART attributes we are tracking each with a raw and normalized value:

  • 177 – Wear Range Delta
  • 179 – Used Reserved Block Count Total
  • 181- Program Fail Count Total or Non-4K Aligned Access Count
  • 182 – Erase Fail Count
  • 235 – Good Block Count AND System(Free) Block Count

The 5 values are all related to SSD drives.

Yes, SSD drives, but before you jump to any conclusions, we used 10 Samsung 850 EVO SSDs as boot drives for a period of time in Q1. This was an experiment to see if we could reduce boot up time for the Storage Pods. In our case, the improved boot up speed wasn’t worth the SSD cost, but it did add 10 new columns to the hard drive stats data.

Speaking of hard drive stats data, the complete data set used to create the information used in this review is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose, all we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone. It is free.

If you just want the summarized data used to create the tables and charts in this blog post, you can download the ZIP file containing the MS Excel spreadsheet.

Good luck and let us know if you find anything interesting.

[Ed: 5/1/2018 – Updated Lifetime chart to fix error in confidence interval for HGST 4TB drive, model: HDS5C4040ALE630]

The post Hard Drive Stats for Q1 2018 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Grafana v5.1 Released

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2018/04/26/grafana-v5.1-released/

v5.1 Stable Release

The recent 5.0 major release contained a lot of new features so the Grafana 5.1 release is focused on smoothing out the rough edges and iterating over some of the new features.

Download Grafana 5.1 Now

Release Highlights

There are two new features included, Heatmap Support for Prometheus and a new core data source for Microsoft SQL Server.

Another highlight is the revamp of the Grafana docker container that makes it easier to run and control but be aware there is a breaking change to file permissions that will affect existing containers with data volumes.

We got tons of useful improvement suggestions, bug reports and Pull Requests from our amazing community. Thank you all! See the full changelog for more details.

Improved Scrolling Experience

In Grafana v5.0 we introduced a new scrollbar component. Unfortunately this introduced a lot of issues and in some scenarios removed
the native scrolling functionality. Grafana v5.1 ships with a native scrollbar for all pages together with a scrollbar component for
the dashboard grid and panels that does not override the native scrolling functionality. We hope that these changes and improvements should
make the Grafana user experience much better!

Improved Docker Image

Grafana v5.1 brings an improved official docker image which should make it easier to run and use the Grafana docker image and at the same time give more control to the user how to use/run it.

We have switched the id of the grafana user running Grafana inside a docker container. Unfortunately this means that files created prior to 5.1 will not have the correct permissions for later versions and thereby introduces a breaking change. We made this change so that it would be easier for you to control what user Grafana is executed as.

Please read the updated documentation which includes migration instructions and more information.

Heatmap Support for Prometheus

The Prometheus datasource now supports transforming Prometheus histograms to the heatmap panel. The Prometheus histogram is a powerful feature, and we’re
really happy to finally allow our users to render those as heatmaps. The Heatmap panel documentation
contains more information on how to use it.

Another improvement is that the Prometheus query editor now supports autocomplete for template variables. More information in the Prometheus data source documentation.

Microsoft SQL Server

Grafana v5.1 now ships with a built-in Microsoft SQL Server (MSSQL) data source plugin that allows you to query and visualize data from any
Microsoft SQL Server 2005 or newer, including Microsoft Azure SQL Database. Do you have metric or log data in MSSQL? You can now visualize
that data and define alert rules on it as with any of Grafana’s other core datasources.

The using Microsoft SQL Server in Grafana documentation has more detailed information on how to get started.

Adding New Panels to Dashboards

The control for adding new panels to dashboards now includes panel search and it is also now possible to copy and paste panels between dashboards.

By copying a panel in a dashboard it will be displayed in the Paste tab. When you switch to a new dashboard you can paste the
copied panel.

Align Zero-Line for Right and Left Y-axes

The feature request to align the zero-line for right and left Y-axes on the Graph panel is more than 3 years old. It has finally been implemented – more information in the Graph panel documentation.

Other Highlights

  • Table Panel: New enhancements includes support for mapping a numeric value/range to text and additional units. More information in the Table panel documentation.
  • New variable interpolation syntax: We now support a new option for rendering variables that gives the user full control of how the value(s) should be rendered. More details in the in the Variables documentation.
  • Improved workflow for provisioned dashboards. More details here.

Changelog

Checkout the CHANGELOG.md file for a complete list
of new features, changes, and bug fixes.

Congratulations to Oracle on MySQL 8.0

Post Syndicated from Michael "Monty" Widenius original http://monty-says.blogspot.com/2018/04/congratulations-to-oracle-on-mysql-80.html

Last week, Oracle announced the general availability of MySQL 8.0. This is good news for database users, as it means Oracle is still developing MySQL.

I decide to celebrate the event by doing a quick test of MySQL 8.0. Here follows a step-by-step description of my first experience with MySQL 8.0.
Note that I did the following without reading the release notes, as is what I have done with every MySQL / MariaDB release up to date; In this case it was not the right thing to do.

I pulled MySQL 8.0 from [email protected]:mysql/mysql-server.git
I was pleasantly surprised that ‘cmake . ; make‘ worked without without any compiler warnings! I even checked the used compiler options and noticed that MySQL was compiled with -Wall + several other warning flags. Good job MySQL team!

I did have a little trouble finding the mysqld binary as Oracle had moved it to ‘runtime_output_directory’; Unexpected, but no big thing.

Now it’s was time to install MySQL 8.0.

I did know that MySQL 8.0 has removed mysql_install_db, so I had to use the mysqld binary directly to install the default databases:
(I have specified datadir=/my/data3 in the /tmp/my.cnf file)

> cd runtime_output_directory
> mkdir /my/data3
> ./mysqld –defaults-file=/tmp/my.cnf –install

2018-04-22T12:38:18.332967Z 1 [ERROR] [MY-011011] [Server] Failed to find valid data directory.
2018-04-22T12:38:18.333109Z 0 [ERROR] [MY-010020] [Server] Data Dictionary initialization failed.
2018-04-22T12:38:18.333135Z 0 [ERROR] [MY-010119] [Server] Aborting

A quick look in mysqld –help –verbose output showed that the right command option is –-initialize. My bad, lets try again,

> ./mysqld –defaults-file=/tmp/my.cnf –initialize

2018-04-22T12:39:31.910509Z 0 [ERROR] [MY-010457] [Server] –initialize specified but the data directory has files in it. Aborting.
2018-04-22T12:39:31.910578Z 0 [ERROR] [MY-010119] [Server] Aborting

Now I used the right options, but still didn’t work.
I took a quick look around:

> ls /my/data3/
binlog.index

So even if the mysqld noticed that the data3 directory was wrong, it still wrote things into it.  This even if I didn’t have –log-binlog enabled in the my.cnf file. Strange, but easy to fix:

> rm /my/data3/binlog.index
> ./mysqld –defaults-file=/tmp/my.cnf –initialize

2018-04-22T12:40:45.633637Z 0 [ERROR] [MY-011071] [Server] unknown variable ‘max-tmp-tables=100’
2018-04-22T12:40:45.633657Z 0 [Warning] [MY-010952] [Server] The privilege system failed to initialize correctly. If you have upgraded your server, make sure you’re executing mysql_upgrade to correct the issue.
2018-04-22T12:40:45.633663Z 0 [ERROR] [MY-010119] [Server] Aborting

The warning about the privilege system confused me a bit, but I ignored it for the time being and removed from my configuration files the variables that MySQL 8.0 doesn’t support anymore. I couldn’t find a list of the removed variables anywhere so this was done with the trial and error method.

> ./mysqld –defaults-file=/tmp/my.cnf

2018-04-22T12:42:56.626583Z 0 [ERROR] [MY-010735] [Server] Can’t open the mysql.plugin table. Please run mysql_upgrade to create it.
2018-04-22T12:42:56.827685Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table ‘mysql.gtid_executed’ cannot be opened.
2018-04-22T12:42:56.838501Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2018-04-22T12:42:56.848375Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables
2018-04-22T12:42:56.848863Z 0 [ERROR] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is attached. Therefore, we’re sending the information to the error-log instead: MY-001146 – Table ‘mysql.component’ doesn’t exist
2018-04-22T12:42:56.848916Z 0 [Warning] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is attached. Therefore, we’re sending the information to the error-log instead: MY-003543 – The mysql.component table is missing or has an incorrect definition.
….
2018-04-22T12:42:56.854141Z 0 [System] [MY-010931] [Server] /home/my/mysql-8.0/runtime_output_directory/mysqld: ready for connections. Version: ‘8.0.11’ socket: ‘/tmp/mysql.sock’ port: 3306 Source distribution.

I figured out that if there is a single wrong variable in the configuration file, running mysqld –initialize will leave the database in an inconsistent state. NOT GOOD! I am happy I didn’t try this in a production system!

Time to start over from the beginning:

> rm -r /my/data3/*
> ./mysqld –defaults-file=/tmp/my.cnf –initialize

2018-04-22T12:44:45.548960Z 5 [Note] [MY-010454] [Server] A temporary password is generated for [email protected]: px)NaaSp?6um
2018-04-22T12:44:51.221751Z 0 [System] [MY-013170] [Server] /home/my/mysql-8.0/runtime_output_directory/mysqld (mysqld 8.0.11) initializing of server has completed

Success!

I wonder why the temporary password is so complex; It could easily have been something that one could easily remember without decreasing security, it’s temporary after all. No big deal, one can always paste it from the logs. (Side note: MariaDB uses socket authentication on many system and thus doesn’t need temporary installation passwords).

Now lets start the MySQL server for real to do some testing:

> ./mysqld –defaults-file=/tmp/my.cnf

2018-04-22T12:45:43.683484Z 0 [System] [MY-010931] [Server] /home/my/mysql-8.0/runtime_output_directory/mysqld: ready for connections. Version: ‘8.0.11’ socket: ‘/tmp/mysql.sock’ port: 3306 Source distribution.

And the lets start the client:

> ./client/mysql –socket=/tmp/mysql.sock –user=root –password=”px)NaaSp?6um”
ERROR 2059 (HY000): Plugin caching_sha2_password could not be loaded: /usr/local/mysql/lib/plugin/caching_sha2_password.so: cannot open shared object file: No such file or directory

Apparently MySQL 8.0 doesn’t work with old MySQL / MariaDB clients by default 🙁

I was testing this in a system with MariaDB installed, like all modern Linux system today, and didn’t want to use the MySQL clients or libraries.

I decided to try to fix this by changing the authentication to the native (original) MySQL authentication method.

> mysqld –skip-grant-tables

> ./client/mysql –socket=/tmp/mysql.sock –user=root
ERROR 1045 (28000): Access denied for user ‘root’@’localhost’ (using password: NO)

Apparently –skip-grant-tables is not good enough anymore. Let’s try again with:

> mysqld –skip-grant-tables –default_authentication_plugin=mysql_native_password

> ./client/mysql –socket=/tmp/mysql.sock –user=root mysql
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 8.0.11 Source distribution

Great, we are getting somewhere, now lets fix “root”  to work with the old authenticaion:

MySQL [mysql]> update mysql.user set plugin=”mysql_native_password”,authentication_string=password(“test”) where user=”root”;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘(“test”) where user=”root”‘ at line 1

A quick look in the MySQL 8.0 release notes told me that the PASSWORD() function is removed in 8.0. Why???? I don’t know how one in MySQL 8.0 is supposed to generate passwords compatible with old installations of MySQL. One could of course start an old MySQL or MariaDB version, execute the password() function and copy the result.

I decided to fix this the easy way and use an empty password:

(Update:: I later discovered that the right way would have been to use: FLUSH PRIVILEGES;  ALTER USER’ root’@’localhost’ identified by ‘test’  ; I however dislike this syntax as it has the password in clear text which is easy to grab and the command can’t be used to easily update the mysql.user table. One must also disable the –skip-grant mode to do use this)

MySQL [mysql]> update mysql.user set plugin=”mysql_native_password”,authentication_string=”” where user=”root”;
Query OK, 1 row affected (0.077 sec)
Rows matched: 1 Changed: 1 Warnings: 0
 
I restarted mysqld:
> mysqld –default_authentication_plugin=mysql_native_password

> ./client/mysql –user=root –password=”” mysql
ERROR 1862 (HY000): Your password has expired. To log in you must change it using a client that supports expired passwords.

Ouch, forgot that. Lets try again:

> mysqld –skip-grant-tables –default_authentication_plugin=mysql_native_password

> ./client/mysql –user=root –password=”” mysql
MySQL [mysql]> update mysql.user set password_expired=”N” where user=”root”;

Now restart and test worked:

> ./mysqld –default_authentication_plugin=mysql_native_password

>./client/mysql –user=root –password=”” mysql

Finally I had a working account that I can use to create other users!

When looking at mysqld –help –verbose again. I noticed the option:

–initialize-insecure
Create the default database and exit. Create a super user
with empty password.

I decided to check if this would have made things easier:

> rm -r /my/data3/*
> ./mysqld –defaults-file=/tmp/my.cnf –initialize-insecure

2018-04-22T13:18:06.629548Z 5 [Warning] [MY-010453] [Server] [email protected] is created with an empty password ! Please consider switching off the –initialize-insecure option.

Hm. Don’t understand the warning as–initialize-insecure is not an option that one would use more than one time and thus nothing one would ‘switch off’.

> ./mysqld –defaults-file=/tmp/my.cnf

> ./client/mysql –user=root –password=”” mysql
ERROR 2059 (HY000): Plugin caching_sha2_password could not be loaded: /usr/local/mysql/lib/plugin/caching_sha2_password.so: cannot open shared object file: No such file or directory

Back to the beginning 🙁

To get things to work with old clients, one has to initialize the database with:
> ./mysqld –defaults-file=/tmp/my.cnf –initialize-insecure –default_authentication_plugin=mysql_native_password

Now I finally had MySQL 8.0 up and running and thought I would take it up for a spin by running the “standard” MySQL/MariaDB sql-bench test suite. This was removed in MySQL 5.7, but as I happened to have MariaDB 10.3 installed, I decided to run it from there.

sql-bench is a single threaded benchmark that measures the “raw” speed for some common operations. It gives you the ‘maximum’ performance for a single query. Its different from other benchmarks that measures the maximum throughput when you have a lot of users, but sql-bench still tells you a lot about what kind of performance to expect from the database.

I tried first to be clever and create the “test” database, that I needed for sql-bench, with
> mkdir /my/data3/test

but when I tried to run the benchmark, MySQL 8.0 complained that the test database didn’t exist.

MySQL 8.0 has gone away from the original concept of MySQL where the user can easily
create directories and copy databases into the database directory. This may have serious
implication for anyone doing backup of databases and/or trying to restore a backup with normal OS commands.

I created the ‘test’ database with mysqladmin and then tried to run sql-bench:

> ./run-all-tests –user=root

The first run failed in test-ATIS:

Can’t execute command ‘create table class_of_service (class_code char(2) NOT NULL,rank tinyint(2) NOT NULL,class_description char(80) NOT NULL,PRIMARY KEY (class_code))’
Error: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘rank tinyint(2) NOT NULL,class_description char(80) NOT NULL,PRIMARY KEY (class_’ at line 1

This happened because ‘rank‘ is now a reserved word in MySQL 8.0. This is also reserved in ANSI SQL, but I don’t know of any other database that has failed to run test-ATIS before. I have in the past run it against Oracle, PostgreSQL, Mimer, MSSQL etc without any problems.

MariaDB also has ‘rank’ as a keyword in 10.2 and 10.3 but one can still use it as an identifier.

I fixed test-ATIS and then managed to run all tests on MySQL 8.0.

I did run the test both with MySQL 8.0 and MariaDB 10.3 with the InnoDB storage engine and by having identical values for all InnoDB variables, table-definition-cache and table-open-cache. I turned off performance schema for both databases. All test are run with a user with an empty password (to keep things comparable and because it’s was too complex to generate a password in MySQL 8.0)

The result are as follows
Results per test in seconds:

Operation         |MariaDB|MySQL-8|

———————————–
ATIS              | 153.00| 228.00|
alter-table       |  92.00| 792.00|
big-tables        | 990.00|2079.00|
connect           | 186.00| 227.00|
create            | 575.00|4465.00|
insert            |4552.00|8458.00|
select            | 333.00| 412.00|
table-elimination |1900.00|3916.00|
wisconsin         | 272.00| 590.00|
———————————–

This is of course just a first view of the performance of MySQL 8.0 in a single user environment. Some reflections about the results:

  • Alter-table test is slower (as expected) in 8.0 as some of the alter tests benefits of the instant add column in MariaDB 10.3.
  • connect test is also better for MariaDB as we put a lot of efforts to speed this up in MariaDB 10.2
  • table-elimination shows an optimization in MariaDB for the  Anchor table model, which MySQL doesn’t have.
  • CREATE and DROP TABLE is almost 8 times slower in MySQL 8.0 than in MariaDB 10.3. I assume this is the cost of ‘atomic DDL’. This may also cause performance problems for any thread using the data dictionary when another thread is creating/dropping tables.
  • When looking at the individual test results, MySQL 8.0 was slower in almost every test, in many significantly slower.
  • The only test where MySQL was faster was “update_with_key_prefix”. I checked this and noticed that there was a bug in the test and the columns was updated to it’s original value (which should be instant with any storage engine). This is an old bug that MySQL has found and fixed and that we have not been aware of in the test or in MariaDB.
  • While writing this, I noticed that MySQL 8.0 is now using utf8mb4 as the default character set instead of latin1. This may affect some of the benchmarks slightly (not much as most tests works with numbers and Oracle claims that utf8mb4 is only 20% slower than latin1), but needs to be verified.
  • Oracle claims that MySQL 8.0 is much faster on multi user benchmarks. The above test indicates that they may have done this by sacrificing single user performance.
  •  We need to do more and many different benchmarks to better understand exactly what is going on. Stay tuned!

Short summary of my first run with MySQL 8.0:

  • Using the new caching_sha2_password authentication as default for new installation is likely to cause a lot of problems for users. No old application will be able to use MySQL 8.0, installed with default options, without moving to MySQL’s client libraries. While working on this blog I saw MySQL users complain on IRC that not even MySQL Workbench can authenticate with MySQL 8.0. This is the first time in MySQL’s history where such an incompatible change has ever been done!
  • Atomic DDL is a good thing (We plan to have this in MariaDB 10.4), but it should not have such a drastic impact on performance. I am also a bit skeptical of MySQL 8.0 having just one copy of the data dictionary as if this gets corrupted you will lose all your data. (Single point of failure)
  • MySQL 8.0 has several new reserved words and has removed a lot of variables, which makes upgrades hard. Before upgrading to MySQL 8.0 one has to check all one’s databases and applications to ensure that there are no conflicts.
  • As my test above shows, if you have a single deprecated variable in your configuration files, the installation of MySQL will abort and can leave the database in inconsistent state. I did of course my tests by installing into an empty data dictionary, but one can assume that some of the problems may also happen when upgrading an old installation.

Conclusions:
In many ways, MySQL 8.0 has caught up with some earlier versions of MariaDB. For instance, in MariaDB 10.0, we introduced roles (four years ago). In MariaDB 10.1, we introduced encrypted redo/undo logs (three years ago). In MariaDB 10.2, we introduced window functions and CTEs (a year ago). However, some catch-up of MariaDB Server 10.2 features still remains for MySQL (such as check constraints, binlog compression, and log-based rollback).

MySQL 8.0 has a few new interesting features (mostly Atomic DDL and JSON TABLE functions), but at the same time MySQL has strayed away from some of the fundamental corner stone principles of MySQL:

From the start of the first version of MySQL in 1995, all development has been focused around 3 core principles:

  • Ease of use
  • Performance
  • Stability

With MySQL 8.0, Oracle has sacrifices 2 of 3 of these.

In addition (as part of ease of use), while I was working on MySQL, we did our best to ensure that the following should hold:

  • Upgrades should be trivial
  • Things should be kept compatible, if possible (don’t remove features/options/functions that are used)
  • Minimize reserved words, don’t remove server variables
  • One should be able to use normal OS commands to create and drop databases, copy and move tables around within the same system or between different systems. With 8.0 and data dictionary taking backups of specific tables will be hard, even if the server is not running.
  • mysqldump should always be usable backups and to move to new releases
  • Old clients and application should be able to use ‘any’ MySQL server version unchanged. (Some Oracle client libraries, like C++, by default only supports the new X protocol and can thus not be used with older MySQL or any MariaDB version)

We plan to add a data dictionary to MariaDB 10.4 or MariaDB 10.5, but in a way to not sacrifice any of the above principles!

The competition between MySQL and MariaDB is not just about a tactical arms race on features. It’s about design philosophy, or strategic vision, if you will.

This shows in two main ways: our respective view of the Storage Engine structure, and of the top-level direction of the roadmap.

On the Storage Engine side, MySQL is converging on InnoDB, even for clustering and partitioning. In doing so, they are abandoning the advantages of multiple ways of storing data. By contrast, MariaDB sees lots of value in the Storage Engine architecture: MariaDB Server 10.3 will see the general availability of MyRocks (for write-intensive workloads) and Spider (for scalable workloads). On top of that, we have ColumnStore for analytical workloads. One can use the CONNECT engine to join with other databases. The use of different storage engines for different workloads and different hardware is a competitive differentiator, now more than ever.

On the roadmap side, MySQL is carefully steering clear of features that close the gap between MySQL and Oracle. MariaDB has no such constraints. With MariaDB 10.3, we are introducing PL/SQL compatibility (Oracle’s stored procedures) and AS OF (built-in system versioned tables with point-in-time querying). For both of those features, MariaDB is the first Open Source database doing so. I don’t except Oracle to provide any of the above features in MySQL!

Also on the roadmap side, MySQL is not working with the ecosystem in extending the functionality. In 2017, MariaDB accepted more code contributions in one year, than MySQL has done during its entire lifetime, and the rate is increasing!

I am sure that the experience I had with testing MySQL 8.0 would have been significantly better if MySQL would have an open development model where the community could easily participate in developing and testing MySQL continuously. Most of the confusing error messages and strange behavior would have been found and fixed long before the GA release.

Before upgrading to MySQL 8.0 please read https://dev.mysql.com/doc/refman/8.0/en/upgrading-from-previous-series.html to see what problems you can run into! Don’t expect that old installations or applications will work out of the box without testing as a lot of features and options has been removed (query cache, partition of myisam tables etc)! You probably also have to revise your backup methods, especially if you want to ever restore just a few tables. (With 8.0, I don’t know how this can be easily done).

According to the MySQL 8.0 release notes, one can’t use mysqldump to copy a database to MySQL 8.0. One has to first to move to a MySQL 5.7 GA version (with mysqldump, as recommended by Oracle) and then to MySQL 8.0 with in-place update. I assume this means that all old mysqldump backups are useless for MySQL 8.0?

MySQL 8.0 seams to be a one way street to an unknown future. Up to MySQL 5.7 it has been trivial to move to MariaDB and one could always move back to MySQL with mysqldump. All MySQL client libraries has worked with MariaDB and all MariaDB client libraries has worked with MySQL. With MySQL 8.0 this has changed in the wrong direction.

As long as you are using MySQL 5.7 and below you have choices for your future, after MySQL 8.0 you have very little choice. But don’t despair, as MariaDB will always be able to load a mysqldump file and it’s very easy to upgrade your old MySQL installation to MariaDB 🙂

I wish you good luck to try MySQL 8.0 (and also the upcoming MariaDB 10.3)!

Implementing safe AWS Lambda deployments with AWS CodeDeploy

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/implementing-safe-aws-lambda-deployments-with-aws-codedeploy/

This post courtesy of George Mao, AWS Senior Serverless Specialist – Solutions Architect

AWS Lambda and AWS CodeDeploy recently made it possible to automatically shift incoming traffic between two function versions based on a preconfigured rollout strategy. This new feature allows you to gradually shift traffic to the new function. If there are any issues with the new code, you can quickly rollback and control the impact to your application.

Previously, you had to manually move 100% of traffic from the old version to the new version. Now, you can have CodeDeploy automatically execute pre- or post-deployment tests and automate a gradual rollout strategy. Traffic shifting is built right into the AWS Serverless Application Model (SAM), making it easy to define and deploy your traffic shifting capabilities. SAM is an extension of AWS CloudFormation that provides a simplified way of defining serverless applications.

In this post, I show you how to use SAM, CloudFormation, and CodeDeploy to accomplish an automated rollout strategy for safe Lambda deployments.

Scenario

For this walkthrough, you write a Lambda application that returns a count of the S3 buckets that you own. You deploy it and use it in production. Later on, you receive requirements that tell you that you need to change your Lambda application to count only buckets that begin with the letter “a”.

Before you make the change, you need to be sure that your new Lambda application works as expected. If it does have issues, you want to minimize the number of impacted users and roll back easily. To accomplish this, you create a deployment process that publishes the new Lambda function, but does not send any traffic to it. You use CodeDeploy to execute a PreTraffic test to ensure that your new function works as expected. After the test succeeds, CodeDeploy automatically shifts traffic gradually to the new version of the Lambda function.

Your Lambda function is exposed as a REST service via an Amazon API Gateway deployment. This makes it easy to test and integrate.

Prerequisites

To execute the SAM and CloudFormation deployment, you must have the following IAM permissions:

  • cloudformation:*
  • lambda:*
  • codedeploy:*
  • iam:create*

You may use the AWS SAM Local CLI or the AWS CLI to package and deploy your Lambda application. If you choose to use SAM Local, be sure to install it onto your system. For more information, see AWS SAM Local Installation.

All of the code used in this post can be found in this GitHub repository: https://github.com/aws-samples/aws-safe-lambda-deployments.

Walkthrough

For this post, use SAM to define your resources because it comes with built-in CodeDeploy support for safe Lambda deployments.  The deployment is handled and automated by CloudFormation.

SAM allows you to define your Serverless applications in a simple and concise fashion, because it automatically creates all necessary resources behind the scenes. For example, if you do not define an execution role for a Lambda function, SAM automatically creates one. SAM also creates the CodeDeploy application necessary to drive the traffic shifting, as well as the IAM service role that CodeDeploy uses to execute all actions.

Create a SAM template

To get started, write your SAM template and call it template.yaml.

AWSTemplateFormatVersion : '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: An example SAM template for Lambda Safe Deployments.

Resources:

  returnS3Buckets:
    Type: AWS::Serverless::Function
    Properties:
      Handler: returnS3Buckets.handler
      Runtime: nodejs6.10
      AutoPublishAlias: live
      Policies:
        - Version: "2012-10-17"
          Statement: 
          - Effect: "Allow"
            Action: 
              - "s3:ListAllMyBuckets"
            Resource: '*'
      DeploymentPreference:
          Type: Linear10PercentEvery1Minute
          Hooks:
            PreTraffic: !Ref preTrafficHook
      Events:
        Api:
          Type: Api
          Properties:
            Path: /test
            Method: get

  preTrafficHook:
    Type: AWS::Serverless::Function
    Properties:
      Handler: preTrafficHook.handler
      Policies:
        - Version: "2012-10-17"
          Statement: 
          - Effect: "Allow"
            Action: 
              - "codedeploy:PutLifecycleEventHookExecutionStatus"
            Resource:
              !Sub 'arn:aws:codedeploy:${AWS::Region}:${AWS::AccountId}:deploymentgroup:${ServerlessDeploymentApplication}/*'
        - Version: "2012-10-17"
          Statement: 
          - Effect: "Allow"
            Action: 
              - "lambda:InvokeFunction"
            Resource: !Ref returnS3Buckets.Version
      Runtime: nodejs6.10
      FunctionName: 'CodeDeployHook_preTrafficHook'
      DeploymentPreference:
        Enabled: false
      Timeout: 5
      Environment:
        Variables:
          NewVersion: !Ref returnS3Buckets.Version

This template creates two functions:

  • returnS3Buckets
  • preTrafficHook

The returnS3Buckets function is where your application logic lives. It’s a simple piece of code that uses the AWS SDK for JavaScript in Node.JS to call the Amazon S3 listBuckets API action and return the number of buckets.

'use strict';

var AWS = require('aws-sdk');
var s3 = new AWS.S3();

exports.handler = (event, context, callback) => {
	console.log("I am here! " + context.functionName  +  ":"  +  context.functionVersion);

	s3.listBuckets(function (err, data){
		if(err){
			console.log(err, err.stack);
			callback(null, {
				statusCode: 500,
				body: "Failed!"
			});
		}
		else{
			var allBuckets = data.Buckets;

			console.log("Total buckets: " + allBuckets.length);
			callback(null, {
				statusCode: 200,
				body: allBuckets.length
			});
		}
	});	
}

Review the key parts of the SAM template that defines returnS3Buckets:

  • The AutoPublishAlias attribute instructs SAM to automatically publish a new version of the Lambda function for each new deployment and link it to the live alias.
  • The Policies attribute specifies additional policy statements that SAM adds onto the automatically generated IAM role for this function. The first statement provides the function with permission to call listBuckets.
  • The DeploymentPreference attribute configures the type of rollout pattern to use. In this case, you are shifting traffic in a linear fashion, moving 10% of traffic every minute to the new version. For more information about supported patterns, see Serverless Application Model: Traffic Shifting Configurations.
  • The Hooks attribute specifies that you want to execute the preTrafficHook Lambda function before CodeDeploy automatically begins shifting traffic. This function should perform validation testing on the newly deployed Lambda version. This function invokes the new Lambda function and checks the results. If you’re satisfied with the tests, instruct CodeDeploy to proceed with the rollout via an API call to: codedeploy.putLifecycleEventHookExecutionStatus.
  • The Events attribute defines an API-based event source that can trigger this function. It accepts requests on the /test path using an HTTP GET method.
'use strict';

const AWS = require('aws-sdk');
const codedeploy = new AWS.CodeDeploy({apiVersion: '2014-10-06'});
var lambda = new AWS.Lambda();

exports.handler = (event, context, callback) => {

	console.log("Entering PreTraffic Hook!");
	
	// Read the DeploymentId & LifecycleEventHookExecutionId from the event payload
    var deploymentId = event.DeploymentId;
	var lifecycleEventHookExecutionId = event.LifecycleEventHookExecutionId;

	var functionToTest = process.env.NewVersion;
	console.log("Testing new function version: " + functionToTest);

	// Perform validation of the newly deployed Lambda version
	var lambdaParams = {
		FunctionName: functionToTest,
		InvocationType: "RequestResponse"
	};

	var lambdaResult = "Failed";
	lambda.invoke(lambdaParams, function(err, data) {
		if (err){	// an error occurred
			console.log(err, err.stack);
			lambdaResult = "Failed";
		}
		else{	// successful response
			var result = JSON.parse(data.Payload);
			console.log("Result: " +  JSON.stringify(result));

			// Check the response for valid results
			// The response will be a JSON payload with statusCode and body properties. ie:
			// {
			//		"statusCode": 200,
			//		"body": 51
			// }
			if(result.body == 9){	
				lambdaResult = "Succeeded";
				console.log ("Validation testing succeeded!");
			}
			else{
				lambdaResult = "Failed";
				console.log ("Validation testing failed!");
			}

			// Complete the PreTraffic Hook by sending CodeDeploy the validation status
			var params = {
				deploymentId: deploymentId,
				lifecycleEventHookExecutionId: lifecycleEventHookExecutionId,
				status: lambdaResult // status can be 'Succeeded' or 'Failed'
			};
			
			// Pass AWS CodeDeploy the prepared validation test results.
			codedeploy.putLifecycleEventHookExecutionStatus(params, function(err, data) {
				if (err) {
					// Validation failed.
					console.log('CodeDeploy Status update failed');
					console.log(err, err.stack);
					callback("CodeDeploy Status update failed");
				} else {
					// Validation succeeded.
					console.log('Codedeploy status updated successfully');
					callback(null, 'Codedeploy status updated successfully');
				}
			});
		}  
	});
}

The hook is hardcoded to check that the number of S3 buckets returned is 9.

Review the key parts of the SAM template that defines preTrafficHook:

  • The Policies attribute specifies additional policy statements that SAM adds onto the automatically generated IAM role for this function. The first statement provides permissions to call the CodeDeploy PutLifecycleEventHookExecutionStatus API action. The second statement provides permissions to invoke the specific version of the returnS3Buckets function to test
  • This function has traffic shifting features disabled by setting the DeploymentPreference option to false.
  • The FunctionName attribute explicitly tells CloudFormation what to name the function. Otherwise, CloudFormation creates the function with the default naming convention: [stackName]-[FunctionName]-[uniqueID].  Name the function with the “CodeDeployHook_” prefix because the CodeDeployServiceRole role only allows InvokeFunction on functions named with that prefix.
  • Set the Timeout attribute to allow enough time to complete your validation tests.
  • Use an environment variable to inject the ARN of the newest deployed version of the returnS3Buckets function. The ARN allows the function to know the specific version to invoke and perform validation testing on.

Deploy the function

Your SAM template is all set and the code is written—you’re ready to deploy the function for the first time. Here’s how to do it via the SAM CLI. Replace “sam” with “cloudformation” to use CloudFormation instead.

First, package the function. This command returns a CloudFormation importable file, packaged.yaml.

sam package –template-file template.yaml –s3-bucket mybucket –output-template-file packaged.yaml

Now deploy everything:

sam deploy –template-file packaged.yaml –stack-name mySafeDeployStack –capabilities CAPABILITY_IAM

At this point, both Lambda functions have been deployed within the CloudFormation stack mySafeDeployStack. The returnS3Buckets has been deployed as Version 1:

SAM automatically created a few things, including the CodeDeploy application, with the deployment pattern that you specified (Linear10PercentEvery1Minute). There is currently one deployment group, with no action, because no deployments have occurred. SAM also created the IAM service role that this CodeDeploy application uses:

There is a single managed policy attached to this role, which allows CodeDeploy to invoke any Lambda function that begins with “CodeDeployHook_”.

An API has been set up called safeDeployStack. It targets your Lambda function with the /test resource using the GET method. When you test the endpoint, API Gateway executes the returnS3Buckets function and it returns the number of S3 buckets that you own. In this case, it’s 51.

Publish a new Lambda function version

Now implement the requirements change, which is to make returnS3Buckets count only buckets that begin with the letter “a”. The code now looks like the following (see returnS3BucketsNew.js in GitHub):

'use strict';

var AWS = require('aws-sdk');
var s3 = new AWS.S3();

exports.handler = (event, context, callback) => {
	console.log("I am here! " + context.functionName  +  ":"  +  context.functionVersion);

	s3.listBuckets(function (err, data){
		if(err){
			console.log(err, err.stack);
			callback(null, {
				statusCode: 500,
				body: "Failed!"
			});
		}
		else{
			var allBuckets = data.Buckets;

			console.log("Total buckets: " + allBuckets.length);
			//callback(null, allBuckets.length);

			//  New Code begins here
			var counter=0;
			for(var i  in allBuckets){
				if(allBuckets[i].Name[0] === "a")
					counter++;
			}
			console.log("Total buckets starting with a: " + counter);

			callback(null, {
				statusCode: 200,
				body: counter
			});
			
		}
	});	
}

Repackage and redeploy with the same two commands as earlier:

sam package –template-file template.yaml –s3-bucket mybucket –output-template-file packaged.yaml
	
sam deploy –template-file packaged.yaml –stack-name mySafeDeployStack –capabilities CAPABILITY_IAM

CloudFormation understands that this is a stack update instead of an entirely new stack. You can see that reflected in the CloudFormation console:

During the update, CloudFormation deploys the new Lambda function as version 2 and adds it to the “live” alias. There is no traffic routing there yet. CodeDeploy now takes over to begin the safe deployment process.

The first thing CodeDeploy does is invoke the preTrafficHook function. Verify that this happened by reviewing the Lambda logs and metrics:

The function should progress successfully, invoke Version 2 of returnS3Buckets, and finally invoke the CodeDeploy API with a success code. After this occurs, CodeDeploy begins the predefined rollout strategy. Open the CodeDeploy console to review the deployment progress (Linear10PercentEvery1Minute):

Verify the traffic shift

During the deployment, verify that the traffic shift has started to occur by running the test periodically. As the deployment shifts towards the new version, a larger percentage of the responses return 9 instead of 51. These numbers match the S3 buckets.

A minute later, you see 10% more traffic shifting to the new version. The whole process takes 10 minutes to complete. After completion, open the Lambda console and verify that the “live” alias now points to version 2:

After 10 minutes, the deployment is complete and CodeDeploy signals success to CloudFormation and completes the stack update.

Check the results

If you invoke the function alias manually, you see the results of the new implementation.

aws lambda invoke –function [lambda arn to live alias] out.txt

You can also execute the prod stage of your API and verify the results by issuing an HTTP GET to the invoke URL:

Summary

This post has shown you how you can safely automate your Lambda deployments using the Lambda traffic shifting feature. You used the Serverless Application Model (SAM) to define your Lambda functions and configured CodeDeploy to manage your deployment patterns. Finally, you used CloudFormation to automate the deployment and updates to your function and PreTraffic hook.

Now that you know all about this new feature, you’re ready to begin automating Lambda deployments with confidence that things will work as designed. I look forward to hearing about what you’ve built with the AWS Serverless Platform.

Telegram Founder Pledges Millions in Bitcoin For VPNs and “Digital Resistance”

Post Syndicated from Andy original https://torrentfreak.com/telegram-founder-pledges-millions-in-bitcoin-for-vpns-and-digital-resistance-180418/

Starting yesterday, Russia went to war with free cross-platform messaging app Telegram. Authorities including the FSB wanted access to Telegram’s encryption keys, but the service refused to hand them over.

As a result, the service – which serviced 200,000,000 people in March alone – came under massive attack. Supported by a court ruling obtained last Friday, authorities ordered ISPs to block huge numbers of IP addresses in an effort to shut Telegram down.

Amazon and Google, whose services Telegram uses, were both hit with censorship measures, with around 1.8 million IP addresses belonging to the Internet giants blocked in an initial wave of action. But the government was just getting warmed up.

In an updated posted by Pavel Durov to Twitter from Switzerland late last night, the Telegram founder confirmed that Russia had massively stepped up the fight against his encrypted messaging platform.

Of course, 15 million IP addresses is a huge volume, particularly since ‘just’ 14 million of Telegram’s users are located in Russia – that’s more than one IP address for each of them. As a result, there are reports of completed unrelated services being affected by the ban, which is to be expected given its widespread nature. But Russia doesn’t want to stop there.

According to Reuters, local telecoms watchdog Rozcomnadzor asked both Google and Apple [Update: and APKMirror] to remove Telegram from their app stores, to prevent local citizens from gaining access to the software itself. It is unclear whether either company intends to comply but as yet, neither has responded publicly nor taken any noticeable action.

An announcement from Durov last night thanked the companies for not complying with the Russian government’s demands, noting that the efforts so far had proven mostly futile.

“Despite the ban, we haven’t seen a significant drop in user engagement so far, since Russians tend to bypass the ban with VPNs and proxies. We also have been relying on third-party cloud services to remain partly available for our users there,” Durov wrote on Telegram.

“Thank you for your support and loyalty, Russian users of Telegram. Thank you, Apple, Google, Amazon, Microsoft – for not taking part in political censorship.”

Durov noted that Russia accounts for around 7% of Telegram’s userbase, a figure that could be compensated for with organic growth in just a couple of months, even if Telegram lost access to the entire market. However, the action only appears to have lit a fire under the serial entrepreneur, who now has declared a war of his own against censorship.

“To support internet freedoms in Russia and elsewhere I started giving out bitcoin grants to individuals and companies who run socks5 proxies and VPN,” Durov said.

“I am happy to donate millions of dollars this year to this cause, and hope that other people will follow. I called this Digital Resistance – a decentralized movement standing for digital freedoms and progress globally.”

As founder of not only Telegram but also vKontakte, Russia’s answer to Facebook, Durov is a force to be reckoned with. As such, his promises are unlikely to be hollow ones. While Russia has drawn a line in the sand on encryption, it appears to have energized Durov to take a stand, one that could have a positive effect on anti-censorship measures both in Russia and further afield.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.