# EC2 Instance Update – C5 Instances with Local NVMe Storage (C5d)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/ec2-instance-update-c5-instances-with-local-nvme-storage-c5d/

As you can see from my EC2 Instance History post, we add new instance types on a regular and frequent basis. Driven by increasingly powerful processors and designed to address an ever-widening set of use cases, the size and diversity of this list reflects the equally diverse group of EC2 customers!

Near the bottom of that list you will find the new compute-intensive C5 instances. With a 25% to 50% improvement in price-performance over the C4 instances, the C5 instances are designed for applications like batch and log processing, distributed and or real-time analytics, high-performance computing (HPC), ad serving, highly scalable multiplayer gaming, and video encoding. Some of these applications can benefit from access to high-speed, ultra-low latency local storage. For example, video encoding, image manipulation, and other forms of media processing often necessitates large amounts of I/O to temporary storage. While the input and output files are valuable assets and are typically stored as Amazon Simple Storage Service (S3) objects, the intermediate files are expendable. Similarly, batch and log processing runs in a race-to-idle model, flushing volatile data to disk as fast as possible in order to make full use of compute resources.

New C5d Instances with Local Storage
In order to meet this need, we are introducing C5 instances equipped with local NVMe storage. Available for immediate use in 5 regions, these instances are a great fit for the applications that I described above, as well as others that you will undoubtedly dream up! Here are the specs:

 Instance Name vCPUs RAM Local Storage EBS Bandwidth Network Bandwidth c5d.large 2 4 GiB 1 x 50 GB NVMe SSD Up to 2.25 Gbps Up to 10 Gbps c5d.xlarge 4 8 GiB 1 x 100 GB NVMe SSD Up to 2.25 Gbps Up to 10 Gbps c5d.2xlarge 8 16 GiB 1 x 225 GB NVMe SSD Up to 2.25 Gbps Up to 10 Gbps c5d.4xlarge 16 32 GiB 1 x 450 GB NVMe SSD 2.25 Gbps Up to 10 Gbps c5d.9xlarge 36 72 GiB 1 x 900 GB NVMe SSD 4.5 Gbps 10 Gbps c5d.18xlarge 72 144 GiB 2 x 900 GB NVMe SSD 9 Gbps 25 Gbps

Other than the addition of local storage, the C5 and C5d share the same specs. Both are powered by 3.0 GHz Intel Xeon Platinum 8000-series processors, optimized for EC2 and with full control over C-states on the two largest sizes, giving you the ability to run two cores at up to 3.5 GHz using Intel Turbo Boost Technology.

You can use any AMI that includes drivers for the Elastic Network Adapter (ENA) and NVMe; this includes the latest Amazon Linux, Microsoft Windows (Server 2008 R2, Server 2012, Server 2012 R2 and Server 2016), Ubuntu, RHEL, SUSE, and CentOS AMIs.

Here are a couple of things to keep in mind about the local NVMe storage:

Naming – You don’t have to specify a block device mapping in your AMI or during the instance launch; the local storage will show up as one or more devices (/dev/nvme*1 on Linux) after the guest operating system has booted.

Encryption – Each local NVMe device is hardware encrypted using the XTS-AES-256 block cipher and a unique key. Each key is destroyed when the instance is stopped or terminated.

Lifetime – Local NVMe devices have the same lifetime as the instance they are attached to, and do not stick around after the instance has been stopped or terminated.

Available Now
C5d instances are available in On-Demand, Reserved Instance, and Spot form in the US East (N. Virginia), US West (Oregon), EU (Ireland), US East (Ohio), and Canada (Central) Regions. Prices vary by Region, and are just a bit higher than for the equivalent C5 instances.

Jeff;

PS – We will be adding local NVMe storage to other EC2 instance types in the months to come, so stay tuned!

# From Framework to Function: Deploying AWS Lambda Functions for Java 8 using Apache Maven Archetype

As a serverless computing platform that supports Java 8 runtime, AWS Lambda makes it easy to run any type of Java function simply by uploading a JAR file. To help define not only a Lambda serverless application but also Amazon API Gateway, Amazon DynamoDB, and other related services, the AWS Serverless Application Model (SAM) allows developers to use a simple AWS CloudFormation template.

AWS provides the AWS Toolkit for Eclipse that supports both Lambda and SAM. AWS also gives customers an easy way to create Lambda functions and SAM applications in Java using the AWS Command Line Interface (AWS CLI). After you build a JAR file, all you have to do is type the following commands:

aws cloudformation package
aws cloudformation deploy

To consolidate these steps, customers can use Archetype by Apache Maven. Archetype uses a predefined package template that makes getting started to develop a function exceptionally simple.

In this post, I introduce a Maven archetype that allows you to create a skeleton of AWS SAM for a Java function. Using this archetype, you can generate a sample Java code example and an accompanying SAM template to deploy it on AWS Lambda by a single Maven action.

## Prerequisites

Make sure that the following software is installed on your workstation:

• Java
• Maven
• AWS CLI
• (Optional) AWS SAM CLI

## Install Archetype

After you’ve set up those packages, install Archetype with the following commands:

git clone https://github.com/awslabs/aws-serverless-java-archetype
cd aws-serverless-java-archetype
mvn install

These are one-time operations, so you don’t run them for every new package. If you’d like, you can add Archetype to your company’s Maven repository so that other developers can use it later.

With those packages installed, you’re ready to develop your new Lambda Function.

## Start a project

Now that you have the archetype, customize it and run the code:

cd /path/to/project_home
mvn archetype:generate \
-DarchetypeGroupId=com.amazonaws.serverless.archetypes \
-DarchetypeArtifactId=aws-serverless-java-archetype \
-DarchetypeVersion=1.0.0 \
-DarchetypeRepository=local \ # Forcing to use local maven repository
-DinteractiveMode=false \ # For batch mode
# You can also specify properties below interactively if you omit the line for batch mode
-DgroupId=YOUR_GROUP_ID \
-DartifactId=YOUR_ARTIFACT_ID \
-Dversion=YOUR_VERSION \
-DclassName=YOUR_CLASSNAME

You should have a directory called YOUR_ARTIFACT_ID that contains the files and folders shown below:

├── event.json
├── pom.xml
├── src
│   └── main
│       ├── java
│       │   └── Package
│       │       └── Example.java
│       └── resources
│           └── log4j2.xml
└── template.yaml

The sample code is a working example. If you install SAM CLI, you can invoke it just by the command below:

cd YOUR_ARTIFACT_ID
mvn -P invoke verify
[INFO] Scanning for projects...
[INFO]
[INFO] ---------------------------< com.riywo:foo >----------------------------
[INFO] Building foo 1.0
[INFO] --------------------------------[ jar ]---------------------------------
...
[INFO] --- maven-jar-plugin:3.0.2:jar (default-jar) @ foo ---
[INFO] Building jar: /private/tmp/foo/target/foo-1.0.jar
[INFO]
[INFO] Including com.amazonaws:aws-lambda-java-core:jar:1.2.0 in the shaded jar.
[INFO]
[INFO] --- exec-maven-plugin:1.6.0:exec (sam-local-invoke) @ foo ---
2018/04/06 16:34:35 Successfully parsed template.yaml
2018/04/06 16:34:35 Connected to Docker 1.37
2018/04/06 16:34:35 Fetching lambci/lambda:java8 image for java8 runtime...
java8: Pulling from lambci/lambda
Digest: sha256:14df0a5914d000e15753d739612a506ddb8fa89eaa28dcceff5497d9df2cf7aa
Status: Image is up to date for lambci/lambda:java8
2018/04/06 16:34:37 Invoking Package.Example::handleRequest (java8)
2018/04/06 16:34:37 Decompressing /tmp/foo/target/lambda.jar
2018/04/06 16:34:37 Mounting /private/var/folders/x5/ldp7c38545v9x5dg_zmkr5kxmpdprx/T/aws-sam-local-1523000077594231063 as /var/task:ro inside runtime container
START RequestId: a6ae19fe-b1b0-41e2-80bc-68a40d094d74 Version: $LATEST Log output: Greeting is 'Hello Tim Wagner.' END RequestId: a6ae19fe-b1b0-41e2-80bc-68a40d094d74 REPORT RequestId: a6ae19fe-b1b0-41e2-80bc-68a40d094d74 Duration: 96.60 ms Billed Duration: 100 ms Memory Size: 128 MB Max Memory Used: 7 MB {"greetings":"Hello Tim Wagner."} [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 10.452 s [INFO] Finished at: 2018-04-06T16:34:40+09:00 [INFO] ------------------------------------------------------------------------ This maven goal invokes sam local invoke -e event.json, so you can see the sample output to greet Tim Wagner. To deploy this application to AWS, you need an Amazon S3 bucket to upload your package. You can use the following command to create a bucket if you want: aws s3 mb s3://YOUR_BUCKET --region YOUR_REGION Now, you can deploy your application by just one command! mvn deploy \ -DawsRegion=YOUR_REGION \ -Ds3Bucket=YOUR_BUCKET \ -DstackName=YOUR_STACK [INFO] Scanning for projects... [INFO] [INFO] ---------------------------< com.riywo:foo >---------------------------- [INFO] Building foo 1.0 [INFO] --------------------------------[ jar ]--------------------------------- ... [INFO] --- exec-maven-plugin:1.6.0:exec (sam-package) @ foo --- Uploading to aws-serverless-java/com.riywo:foo:1.0/924732f1f8e4705c87e26ef77b080b47 11657 / 11657.0 (100.00%) Successfully packaged artifacts and wrote output template to file target/sam.yaml. Execute the following command to deploy the packaged template aws cloudformation deploy --template-file /private/tmp/foo/target/sam.yaml --stack-name <YOUR STACK NAME> [INFO] [INFO] --- maven-deploy-plugin:2.8.2:deploy (default-deploy) @ foo --- [INFO] Skipping artifact deployment [INFO] [INFO] --- exec-maven-plugin:1.6.0:exec (sam-deploy) @ foo --- Waiting for changeset to be created.. Waiting for stack create/update to complete Successfully created/updated stack - archetype [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 37.176 s [INFO] Finished at: 2018-04-06T16:41:02+09:00 [INFO] ------------------------------------------------------------------------ Maven automatically creates a shaded JAR file, uploads it to your S3 bucket, replaces template.yaml, and creates and updates the CloudFormation stack. To customize the process, modify the pom.xml file. For example, to avoid typing values for awsRegion, s3Bucket or stackName, write them inside pom.xml and check in your VCS. Afterward, you and the rest of your team can deploy the function by typing just the following command: mvn deploy ## Options Lambda Java 8 runtime has some types of handlers: POJO, Simple type and Stream. The default option of this archetype is POJO style, which requires to create request and response classes, but they are baked by the archetype by default. If you want to use other type of handlers, you can use handlerType property like below: ## POJO type (default) mvn archetype:generate \ ... -DhandlerType=pojo ## Simple type - String mvn archetype:generate \ ... -DhandlerType=simple ### Stream type mvn archetype:generate \ ... -DhandlerType=stream  See documentation for more details about handlers. Also, Lambda Java 8 runtime supports two types of Logging class: Log4j 2 and LambdaLogger. This archetype creates LambdaLogger implementation by default, but you can use Log4j 2 if you want: ## LambdaLogger (default) mvn archetype:generate \ ... -Dlogger=lambda ## Log4j 2 mvn archetype:generate \ ... -Dlogger=log4j2  If you use LambdaLogger, you can delete ./src/main/resources/log4j2.xml. See documentation for more details. ## Conclusion So, what’s next? Develop your Lambda function locally and type the following command: mvn deploy ! With this Archetype code example, available on GitHub repo, you should be able to deploy Lambda functions for Java 8 in a snap. If you have any questions or comments, please submit them below or leave them on GitHub. # Congratulations to Oracle on MySQL 8.0 Post Syndicated from Michael "Monty" Widenius original http://monty-says.blogspot.com/2018/04/congratulations-to-oracle-on-mysql-80.html Last week, Oracle announced the general availability of MySQL 8.0. This is good news for database users, as it means Oracle is still developing MySQL. I decide to celebrate the event by doing a quick test of MySQL 8.0. Here follows a step-by-step description of my first experience with MySQL 8.0. Note that I did the following without reading the release notes, as is what I have done with every MySQL / MariaDB release up to date; In this case it was not the right thing to do. I pulled MySQL 8.0 from [email protected]:mysql/mysql-server.git I was pleasantly surprised that ‘cmake . ; make‘ worked without without any compiler warnings! I even checked the used compiler options and noticed that MySQL was compiled with -Wall + several other warning flags. Good job MySQL team! I did have a little trouble finding the mysqld binary as Oracle had moved it to ‘runtime_output_directory’; Unexpected, but no big thing. Now it’s was time to install MySQL 8.0. I did know that MySQL 8.0 has removed mysql_install_db, so I had to use the mysqld binary directly to install the default databases: (I have specified datadir=/my/data3 in the /tmp/my.cnf file) > cd runtime_output_directory > mkdir /my/data3 > ./mysqld –defaults-file=/tmp/my.cnf –install 2018-04-22T12:38:18.332967Z 1 [ERROR] [MY-011011] [Server] Failed to find valid data directory. 2018-04-22T12:38:18.333109Z 0 [ERROR] [MY-010020] [Server] Data Dictionary initialization failed. 2018-04-22T12:38:18.333135Z 0 [ERROR] [MY-010119] [Server] Aborting A quick look in mysqld –help –verbose output showed that the right command option is –-initialize. My bad, lets try again, > ./mysqld –defaults-file=/tmp/my.cnf –initialize 2018-04-22T12:39:31.910509Z 0 [ERROR] [MY-010457] [Server] –initialize specified but the data directory has files in it. Aborting. 2018-04-22T12:39:31.910578Z 0 [ERROR] [MY-010119] [Server] Aborting Now I used the right options, but still didn’t work. I took a quick look around: > ls /my/data3/ binlog.index So even if the mysqld noticed that the data3 directory was wrong, it still wrote things into it. This even if I didn’t have –log-binlog enabled in the my.cnf file. Strange, but easy to fix: > rm /my/data3/binlog.index > ./mysqld –defaults-file=/tmp/my.cnf –initialize 2018-04-22T12:40:45.633637Z 0 [ERROR] [MY-011071] [Server] unknown variable ‘max-tmp-tables=100’ 2018-04-22T12:40:45.633657Z 0 [Warning] [MY-010952] [Server] The privilege system failed to initialize correctly. If you have upgraded your server, make sure you’re executing mysql_upgrade to correct the issue. 2018-04-22T12:40:45.633663Z 0 [ERROR] [MY-010119] [Server] Aborting The warning about the privilege system confused me a bit, but I ignored it for the time being and removed from my configuration files the variables that MySQL 8.0 doesn’t support anymore. I couldn’t find a list of the removed variables anywhere so this was done with the trial and error method. > ./mysqld –defaults-file=/tmp/my.cnf 2018-04-22T12:42:56.626583Z 0 [ERROR] [MY-010735] [Server] Can’t open the mysql.plugin table. Please run mysql_upgrade to create it. 2018-04-22T12:42:56.827685Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table ‘mysql.gtid_executed’ cannot be opened. 2018-04-22T12:42:56.838501Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed. 2018-04-22T12:42:56.848375Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables 2018-04-22T12:42:56.848863Z 0 [ERROR] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is attached. Therefore, we’re sending the information to the error-log instead: MY-001146 – Table ‘mysql.component’ doesn’t exist 2018-04-22T12:42:56.848916Z 0 [Warning] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is attached. Therefore, we’re sending the information to the error-log instead: MY-003543 – The mysql.component table is missing or has an incorrect definition. …. 2018-04-22T12:42:56.854141Z 0 [System] [MY-010931] [Server] /home/my/mysql-8.0/runtime_output_directory/mysqld: ready for connections. Version: ‘8.0.11’ socket: ‘/tmp/mysql.sock’ port: 3306 Source distribution. I figured out that if there is a single wrong variable in the configuration file, running mysqld –initialize will leave the database in an inconsistent state. NOT GOOD! I am happy I didn’t try this in a production system! Time to start over from the beginning: > rm -r /my/data3/* > ./mysqld –defaults-file=/tmp/my.cnf –initialize 2018-04-22T12:44:45.548960Z 5 [Note] [MY-010454] [Server] A temporary password is generated for [email protected]: px)NaaSp?6um 2018-04-22T12:44:51.221751Z 0 [System] [MY-013170] [Server] /home/my/mysql-8.0/runtime_output_directory/mysqld (mysqld 8.0.11) initializing of server has completed Success! I wonder why the temporary password is so complex; It could easily have been something that one could easily remember without decreasing security, it’s temporary after all. No big deal, one can always paste it from the logs. (Side note: MariaDB uses socket authentication on many system and thus doesn’t need temporary installation passwords). Now lets start the MySQL server for real to do some testing: > ./mysqld –defaults-file=/tmp/my.cnf 2018-04-22T12:45:43.683484Z 0 [System] [MY-010931] [Server] /home/my/mysql-8.0/runtime_output_directory/mysqld: ready for connections. Version: ‘8.0.11’ socket: ‘/tmp/mysql.sock’ port: 3306 Source distribution. And the lets start the client: > ./client/mysql –socket=/tmp/mysql.sock –user=root –password=”px)NaaSp?6um” ERROR 2059 (HY000): Plugin caching_sha2_password could not be loaded: /usr/local/mysql/lib/plugin/caching_sha2_password.so: cannot open shared object file: No such file or directory Apparently MySQL 8.0 doesn’t work with old MySQL / MariaDB clients by default 🙁 I was testing this in a system with MariaDB installed, like all modern Linux system today, and didn’t want to use the MySQL clients or libraries. I decided to try to fix this by changing the authentication to the native (original) MySQL authentication method. > mysqld –skip-grant-tables > ./client/mysql –socket=/tmp/mysql.sock –user=root ERROR 1045 (28000): Access denied for user ‘root’@’localhost’ (using password: NO) Apparently –skip-grant-tables is not good enough anymore. Let’s try again with: > mysqld –skip-grant-tables –default_authentication_plugin=mysql_native_password > ./client/mysql –socket=/tmp/mysql.sock –user=root mysql Welcome to the MariaDB monitor. Commands end with ; or \g. Your MySQL connection id is 7 Server version: 8.0.11 Source distribution Great, we are getting somewhere, now lets fix “root” to work with the old authenticaion: MySQL [mysql]> update mysql.user set plugin=”mysql_native_password”,authentication_string=password(“test”) where user=”root”; ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘(“test”) where user=”root”‘ at line 1 A quick look in the MySQL 8.0 release notes told me that the PASSWORD() function is removed in 8.0. Why???? I don’t know how one in MySQL 8.0 is supposed to generate passwords compatible with old installations of MySQL. One could of course start an old MySQL or MariaDB version, execute the password() function and copy the result. I decided to fix this the easy way and use an empty password: (Update:: I later discovered that the right way would have been to use: FLUSH PRIVILEGES; ALTER USER’ root’@’localhost’ identified by ‘test’ ; I however dislike this syntax as it has the password in clear text which is easy to grab and the command can’t be used to easily update the mysql.user table. One must also disable the –skip-grant mode to do use this) MySQL [mysql]> update mysql.user set plugin=”mysql_native_password”,authentication_string=”” where user=”root”; Query OK, 1 row affected (0.077 sec) Rows matched: 1 Changed: 1 Warnings: 0 I restarted mysqld: > mysqld –default_authentication_plugin=mysql_native_password > ./client/mysql –user=root –password=”” mysql ERROR 1862 (HY000): Your password has expired. To log in you must change it using a client that supports expired passwords. Ouch, forgot that. Lets try again: > mysqld –skip-grant-tables –default_authentication_plugin=mysql_native_password > ./client/mysql –user=root –password=”” mysql MySQL [mysql]> update mysql.user set password_expired=”N” where user=”root”; Now restart and test worked: > ./mysqld –default_authentication_plugin=mysql_native_password >./client/mysql –user=root –password=”” mysql Finally I had a working account that I can use to create other users! When looking at mysqld –help –verbose again. I noticed the option: –initialize-insecure Create the default database and exit. Create a super user with empty password. I decided to check if this would have made things easier: > rm -r /my/data3/* > ./mysqld –defaults-file=/tmp/my.cnf –initialize-insecure 2018-04-22T13:18:06.629548Z 5 [Warning] [MY-010453] [Server] [email protected] is created with an empty password ! Please consider switching off the –initialize-insecure option. Hm. Don’t understand the warning as–initialize-insecure is not an option that one would use more than one time and thus nothing one would ‘switch off’. > ./mysqld –defaults-file=/tmp/my.cnf > ./client/mysql –user=root –password=”” mysql ERROR 2059 (HY000): Plugin caching_sha2_password could not be loaded: /usr/local/mysql/lib/plugin/caching_sha2_password.so: cannot open shared object file: No such file or directory Back to the beginning 🙁 To get things to work with old clients, one has to initialize the database with: > ./mysqld –defaults-file=/tmp/my.cnf –initialize-insecure –default_authentication_plugin=mysql_native_password Now I finally had MySQL 8.0 up and running and thought I would take it up for a spin by running the “standard” MySQL/MariaDB sql-bench test suite. This was removed in MySQL 5.7, but as I happened to have MariaDB 10.3 installed, I decided to run it from there. sql-bench is a single threaded benchmark that measures the “raw” speed for some common operations. It gives you the ‘maximum’ performance for a single query. Its different from other benchmarks that measures the maximum throughput when you have a lot of users, but sql-bench still tells you a lot about what kind of performance to expect from the database. I tried first to be clever and create the “test” database, that I needed for sql-bench, with > mkdir /my/data3/test but when I tried to run the benchmark, MySQL 8.0 complained that the test database didn’t exist. MySQL 8.0 has gone away from the original concept of MySQL where the user can easily create directories and copy databases into the database directory. This may have serious implication for anyone doing backup of databases and/or trying to restore a backup with normal OS commands. I created the ‘test’ database with mysqladmin and then tried to run sql-bench: > ./run-all-tests –user=root The first run failed in test-ATIS: Can’t execute command ‘create table class_of_service (class_code char(2) NOT NULL,rank tinyint(2) NOT NULL,class_description char(80) NOT NULL,PRIMARY KEY (class_code))’ Error: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘rank tinyint(2) NOT NULL,class_description char(80) NOT NULL,PRIMARY KEY (class_’ at line 1 This happened because ‘rank‘ is now a reserved word in MySQL 8.0. This is also reserved in ANSI SQL, but I don’t know of any other database that has failed to run test-ATIS before. I have in the past run it against Oracle, PostgreSQL, Mimer, MSSQL etc without any problems. MariaDB also has ‘rank’ as a keyword in 10.2 and 10.3 but one can still use it as an identifier. I fixed test-ATIS and then managed to run all tests on MySQL 8.0. I did run the test both with MySQL 8.0 and MariaDB 10.3 with the InnoDB storage engine and by having identical values for all InnoDB variables, table-definition-cache and table-open-cache. I turned off performance schema for both databases. All test are run with a user with an empty password (to keep things comparable and because it’s was too complex to generate a password in MySQL 8.0) The result are as follows Results per test in seconds: Operation |MariaDB|MySQL-8| ———————————– ATIS | 153.00| 228.00| alter-table | 92.00| 792.00| big-tables | 990.00|2079.00| connect | 186.00| 227.00| create | 575.00|4465.00| insert |4552.00|8458.00| select | 333.00| 412.00| table-elimination |1900.00|3916.00| wisconsin | 272.00| 590.00| ———————————– This is of course just a first view of the performance of MySQL 8.0 in a single user environment. Some reflections about the results: • Alter-table test is slower (as expected) in 8.0 as some of the alter tests benefits of the instant add column in MariaDB 10.3. • connect test is also better for MariaDB as we put a lot of efforts to speed this up in MariaDB 10.2 • table-elimination shows an optimization in MariaDB for the Anchor table model, which MySQL doesn’t have. • CREATE and DROP TABLE is almost 8 times slower in MySQL 8.0 than in MariaDB 10.3. I assume this is the cost of ‘atomic DDL’. This may also cause performance problems for any thread using the data dictionary when another thread is creating/dropping tables. • When looking at the individual test results, MySQL 8.0 was slower in almost every test, in many significantly slower. • The only test where MySQL was faster was “update_with_key_prefix”. I checked this and noticed that there was a bug in the test and the columns was updated to it’s original value (which should be instant with any storage engine). This is an old bug that MySQL has found and fixed and that we have not been aware of in the test or in MariaDB. • While writing this, I noticed that MySQL 8.0 is now using utf8mb4 as the default character set instead of latin1. This may affect some of the benchmarks slightly (not much as most tests works with numbers and Oracle claims that utf8mb4 is only 20% slower than latin1), but needs to be verified. • Oracle claims that MySQL 8.0 is much faster on multi user benchmarks. The above test indicates that they may have done this by sacrificing single user performance. • We need to do more and many different benchmarks to better understand exactly what is going on. Stay tuned! Short summary of my first run with MySQL 8.0: • Using the new caching_sha2_password authentication as default for new installation is likely to cause a lot of problems for users. No old application will be able to use MySQL 8.0, installed with default options, without moving to MySQL’s client libraries. While working on this blog I saw MySQL users complain on IRC that not even MySQL Workbench can authenticate with MySQL 8.0. This is the first time in MySQL’s history where such an incompatible change has ever been done! • Atomic DDL is a good thing (We plan to have this in MariaDB 10.4), but it should not have such a drastic impact on performance. I am also a bit skeptical of MySQL 8.0 having just one copy of the data dictionary as if this gets corrupted you will lose all your data. (Single point of failure) • MySQL 8.0 has several new reserved words and has removed a lot of variables, which makes upgrades hard. Before upgrading to MySQL 8.0 one has to check all one’s databases and applications to ensure that there are no conflicts. • As my test above shows, if you have a single deprecated variable in your configuration files, the installation of MySQL will abort and can leave the database in inconsistent state. I did of course my tests by installing into an empty data dictionary, but one can assume that some of the problems may also happen when upgrading an old installation. Conclusions: In many ways, MySQL 8.0 has caught up with some earlier versions of MariaDB. For instance, in MariaDB 10.0, we introduced roles (four years ago). In MariaDB 10.1, we introduced encrypted redo/undo logs (three years ago). In MariaDB 10.2, we introduced window functions and CTEs (a year ago). However, some catch-up of MariaDB Server 10.2 features still remains for MySQL (such as check constraints, binlog compression, and log-based rollback). MySQL 8.0 has a few new interesting features (mostly Atomic DDL and JSON TABLE functions), but at the same time MySQL has strayed away from some of the fundamental corner stone principles of MySQL: From the start of the first version of MySQL in 1995, all development has been focused around 3 core principles: • Ease of use • Performance • Stability With MySQL 8.0, Oracle has sacrifices 2 of 3 of these. In addition (as part of ease of use), while I was working on MySQL, we did our best to ensure that the following should hold: • Upgrades should be trivial • Things should be kept compatible, if possible (don’t remove features/options/functions that are used) • Minimize reserved words, don’t remove server variables • One should be able to use normal OS commands to create and drop databases, copy and move tables around within the same system or between different systems. With 8.0 and data dictionary taking backups of specific tables will be hard, even if the server is not running. • mysqldump should always be usable backups and to move to new releases • Old clients and application should be able to use ‘any’ MySQL server version unchanged. (Some Oracle client libraries, like C++, by default only supports the new X protocol and can thus not be used with older MySQL or any MariaDB version) We plan to add a data dictionary to MariaDB 10.4 or MariaDB 10.5, but in a way to not sacrifice any of the above principles! The competition between MySQL and MariaDB is not just about a tactical arms race on features. It’s about design philosophy, or strategic vision, if you will. This shows in two main ways: our respective view of the Storage Engine structure, and of the top-level direction of the roadmap. On the Storage Engine side, MySQL is converging on InnoDB, even for clustering and partitioning. In doing so, they are abandoning the advantages of multiple ways of storing data. By contrast, MariaDB sees lots of value in the Storage Engine architecture: MariaDB Server 10.3 will see the general availability of MyRocks (for write-intensive workloads) and Spider (for scalable workloads). On top of that, we have ColumnStore for analytical workloads. One can use the CONNECT engine to join with other databases. The use of different storage engines for different workloads and different hardware is a competitive differentiator, now more than ever. On the roadmap side, MySQL is carefully steering clear of features that close the gap between MySQL and Oracle. MariaDB has no such constraints. With MariaDB 10.3, we are introducing PL/SQL compatibility (Oracle’s stored procedures) and AS OF (built-in system versioned tables with point-in-time querying). For both of those features, MariaDB is the first Open Source database doing so. I don’t except Oracle to provide any of the above features in MySQL! Also on the roadmap side, MySQL is not working with the ecosystem in extending the functionality. In 2017, MariaDB accepted more code contributions in one year, than MySQL has done during its entire lifetime, and the rate is increasing! I am sure that the experience I had with testing MySQL 8.0 would have been significantly better if MySQL would have an open development model where the community could easily participate in developing and testing MySQL continuously. Most of the confusing error messages and strange behavior would have been found and fixed long before the GA release. Before upgrading to MySQL 8.0 please read https://dev.mysql.com/doc/refman/8.0/en/upgrading-from-previous-series.html to see what problems you can run into! Don’t expect that old installations or applications will work out of the box without testing as a lot of features and options has been removed (query cache, partition of myisam tables etc)! You probably also have to revise your backup methods, especially if you want to ever restore just a few tables. (With 8.0, I don’t know how this can be easily done). According to the MySQL 8.0 release notes, one can’t use mysqldump to copy a database to MySQL 8.0. One has to first to move to a MySQL 5.7 GA version (with mysqldump, as recommended by Oracle) and then to MySQL 8.0 with in-place update. I assume this means that all old mysqldump backups are useless for MySQL 8.0? MySQL 8.0 seams to be a one way street to an unknown future. Up to MySQL 5.7 it has been trivial to move to MariaDB and one could always move back to MySQL with mysqldump. All MySQL client libraries has worked with MariaDB and all MariaDB client libraries has worked with MySQL. With MySQL 8.0 this has changed in the wrong direction. As long as you are using MySQL 5.7 and below you have choices for your future, after MySQL 8.0 you have very little choice. But don’t despair, as MariaDB will always be able to load a mysqldump file and it’s very easy to upgrade your old MySQL installation to MariaDB 🙂 I wish you good luck to try MySQL 8.0 (and also the upcoming MariaDB 10.3)! # GnuCash 3.0 released Post Syndicated from corbet original https://lwn.net/Articles/750813/rss The GnuCash 3.0 release is out. “The headline item for this release is that GnuCash now uses the Gtk+-3.0 Toolkit and the WebKit2Gtk API. This change was forced on us by some major Linux distributions dropping support for the WebKit1 API. ” This release also includes some new reports, a rewritten CSV importer, and more. LWN looked at GnuCash from a business-accounting point of view in August 2017. # “Филтри и предразсъдъци” Post Syndicated from Vasil Kolev original https://vasil.ludost.net/blog/?p=3382 Около чл. 13 в петък ще се организира дискусия по темата, на 23.03 (петък) от 15:00 в Сохо (и ще се излъчва на живо). Казва се “Филтри и предразсъдъци: Технически аспекти на предварителното филтриране на съдържание онлайн”. # Deezer Piles Pressure on Pirates, Deezloader Reborn Throws in the Towel Post Syndicated from Andy original https://torrentfreak.com/deezer-piles-pressure-on-pirates-deezloader-reborn-throws-in-the-towel-180315/ Spotify might grab most of the headlines in the world of music streaming but French firm Deezer is also growing in popularity. Focused more on non-English speaking regions, the music service still has a massive selection of tens of millions of tracks. More importantly for pirates, it also has a loophole or two that allows users to permanently download songs from the service, a huge ‘selling’ point for the compulsive archiver. One of the most popular third-party tools for achieving this was Deezloader but last year Deezer put pressure on its operators to cease-and-desist. “On April 27, 2017 we received takedowns and threatened legal action from Deezer if we don’t shut down by April 29. So we decided to shut down Deezloader permanently,” the team announced. Rather than kill the scene, the attack on Deezloader only seemed to spur things on. Many other apps underwent development in the months that followed but last December it became evident that Deezer (and probably the record labels supplying its content) were growing increasingly tired of these kinds of applications. The company sent a wave of DMCA notices to developer platform GitHub, targeting several tools, claiming that they are “in total violation of our rights and of the rights of our music licensors.” GitHub responded quickly by removing access to repositories referencing Deezloader, DeezerDownload, Deeze, Deezerio, Deezit, Deedown, and their associated forks. Deezer also reportedly modified its API, in order to stop or hinder apps already in existence. However, pirates are a determined bunch and behind the scenes many sought to breathe new life into their projects, to maintain the flow of free music from Deezer. One of those that gained traction was the obviously-titled ‘Deezloader Reborn’ which enjoyed a new lease of life on both Github and Reddit after taking over from DeezLoader V2.3.1. But in January 2018, Deezer turned up the pressure again, hitting Github with a wave (1,2) of takedown notices targeting various projects. On January 23, Deezer hit Deezloader Reborn itself with the notice detailed below. The following project, identified in the paragraph below, makes available a hacked version of our Deezer application by describing methods to bypass Deezer’s security measures to unlawfully download its music catalogue, in total violation of our rights and of the rights of our music licensors (phonographic producers, performing artists, songwriters and composers): https://github.com/ExtendLord/DeezLoader-Reborn I therefore ask that you immediately take down the project corresponding to the URL above and all of the related forks by others members who have had access or even contributed to such projects. Not only did Github comply with Deezer’s request, Reddit did too. According to a thread still listed on the site, Reddit removed a post about Deezloader Reborn following a copyright complaint from Deezer. Two days later Deezer targeted similar projects on Github but by this time, Deezloader Reborn already had new plans. Speaking with TF, project developer ExtendLord said that he wouldn’t be shutting down but would continue on code repository Gitlab instead. Now, however, those plans have also come to an abrupt end after Gitlab took the page down. Deezloader Reborn – gone from Gitlab A copy of the page available on Archive.org shows Deezloader Reborn at version 3.0.5 with the ability to download music ready-tagged and in FLAC quality. Links to newer versions are being shared on Reddit but it appears there is no longer a central trusted source for the application. There’s no official confirmation yet but it seems likely that Deezer was behind the Gitlab takedown. TorrentFreak has contacted ExtendLord who linked us to this page which states that “DeezLoader Reborn is no longer maintained due to DMCA. [Version] 3.1.0 is the last update, no more updates will be made.” So, at least for now, it appears that Deezloader Reborn will go the way of various other Deezer-reliant applications. That won’t be the end of the story though, that’s a certainty. Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons. # Message Filtering Operators for Numeric Matching, Prefix Matching, and Blacklisting in Amazon SNS This blog was contributed by Otavio Ferreira, Software Development Manager for Amazon SNS Message filtering simplifies the overall pub/sub messaging architecture by offloading message filtering logic from subscribers, as well as message routing logic from publishers. The initial launch of message filtering provided a basic operator that was based on exact string comparison. For more information, see Simplify Your Pub/Sub Messaging with Amazon SNS Message Filtering. Today, AWS is announcing an additional set of filtering operators that bring even more power and flexibility to your pub/sub messaging use cases. # Message filtering operators Amazon SNS now supports both numeric and string matching. Specifically, string matching operators allow for exact, prefix, and “anything-but” comparisons, while numeric matching operators allow for exact and range comparisons, as outlined below. Numeric matching operators work for values between -10e9 and +10e9 inclusive, with five digits of accuracy right of the decimal point. • Exact matching on string values (Whitelisting): Subscription filter policy {"sport": ["rugby"]} matches message attribute {"sport": "rugby"} only. • Anything-but matching on string values (Blacklisting): Subscription filter policy {"sport": [{"anything-but": "rugby"}]} matches message attributes such as {"sport": "baseball"} and {"sport": "basketball"} and {"sport": "football"} but not {"sport": "rugby"} • Prefix matching on string values: Subscription filter policy {"sport": [{"prefix": "bas"}]} matches message attributes such as {"sport": "baseball"} and {"sport": "basketball"} • Exact matching on numeric values: Subscription filter policy {"balance": [{"numeric": ["=", 301.5]}]} matches message attributes {"balance": 301.500} and {"balance": 3.015e2} • Range matching on numeric values: Subscription filter policy {"balance": [{"numeric": ["<", 0]}]} matches negative numbers only, and {"balance": [{"numeric": [">", 0, "<=", 150]}]} matches any positive number up to 150. As usual, you may apply the “AND” logic by appending multiple keys in the subscription filter policy, and the “OR” logic by appending multiple values for the same key, as follows: • AND logic: Subscription filter policy {"sport": ["rugby"], "language": ["English"]} matches only messages that carry both attributes {"sport": "rugby"} and {"language": "English"} • OR logic: Subscription filter policy {"sport": ["rugby", "football"]} matches messages that carry either the attribute {"sport": "rugby"} or {"sport": "football"} # Message filtering operators in action Here’s how this new set of filtering operators works. The following example is based on a pharmaceutical company that develops, produces, and markets a variety of prescription drugs, with research labs located in Asia Pacific and Europe. The company built an internal procurement system to manage the purchasing of lab supplies (for example, chemicals and utensils), office supplies (for example, paper, folders, and markers) and tech supplies (for example, laptops, monitors, and printers) from global suppliers. This distributed system is composed of the four following subsystems: • A requisition system that presents the catalog of products from suppliers, and takes orders from buyers • An approval system for orders targeted to Asia Pacific labs • Another approval system for orders targeted to European labs • A fulfillment system that integrates with shipping partners As shown in the following diagram, the company leverages AWS messaging services to integrate these distributed systems. • Firstly, an SNS topic named “Orders” was created to take all orders placed by buyers on the requisition system. • Secondly, two Amazon SQS queues, named “Lab-Orders-AP” and “Lab-Orders-EU” (for Asia Pacific and Europe respectively), were created to backlog orders that are up for review on the approval systems. • Lastly, an SQS queue named “Common-Orders” was created to backlog orders that aren’t related to lab supplies, which can already be picked up by shipping partners on the fulfillment system. The company also uses AWS Lambda functions to automatically process lab supply orders that don’t require approval or which are invalid. In this example, because different types of orders have been published to the SNS topic, the subscribing endpoints have had to set advanced filter policies on their SNS subscriptions, to have SNS automatically filter out orders they can’t deal with. As depicted in the above diagram, the following five filter policies have been created: • The SNS subscription that points to the SQS queue “Lab-Orders-AP” sets a filter policy that matches lab supply orders, with a total value greater than$1,000, and that target Asia Pacific labs only. These more expensive transactions require an approver to review orders placed by buyers.
• The SNS subscription that points to the SQS queue “Lab-Orders-EU” sets a filter policy that matches lab supply orders, also with a total value greater than $1,000, but that target European labs instead. • The SNS subscription that points to the Lambda function “Lab-Preapproved” sets a filter policy that only matches lab supply orders that aren’t as expensive, up to$1,000, regardless of their target lab location. These orders simply don’t require approval and can be automatically processed.
• The SNS subscription that points to the Lambda function “Lab-Cancelled” sets a filter policy that only matches lab supply orders with total value of $0 (zero), regardless of their target lab location. These orders carry no actual items, obviously need neither approval nor fulfillment, and as such can be automatically canceled. • The SNS subscription that points to the SQS queue “Common-Orders” sets a filter policy that blacklists lab supply orders. Hence, this policy matches only office and tech supply orders, which have a more streamlined fulfillment process, and require no approval, regardless of price or target location. After the company finished building this advanced pub/sub architecture, they were then able to launch their internal procurement system and allow buyers to begin placing orders. The diagram above shows six example orders published to the SNS topic. Each order contains message attributes that describe the order, and cause them to be filtered in a different manner, as follows: • Message #1 is a lab supply order, with a total value of$15,700 and targeting a research lab in Singapore. Because the value is greater than $1,000, and the location “Asia-Pacific-Southeast” matches the prefix “Asia-Pacific-“, this message matches the first SNS subscription and is delivered to SQS queue “Lab-Orders-AP”. • Message #2 is a lab supply order, with a total value of$1,833 and targeting a research lab in Ireland. Because the value is greater than $1,000, and the location “Europe-West” matches the prefix “Europe-“, this message matches the second SNS subscription and is delivered to SQS queue “Lab-Orders-EU”. • Message #3 is a lab supply order, with a total value of$415. Because the value is greater than $0 and less than$1,000, this message matches the third SNS subscription and is delivered to Lambda function “Lab-Preapproved”.
• Message #4 is a lab supply order, but with a total value of $0. Therefore, it only matches the fourth SNS subscription, and is delivered to Lambda function “Lab-Cancelled”. • Messages #5 and #6 aren’t lab supply orders actually; one is an office supply order, and the other is a tech supply order. Therefore, they only match the fifth SNS subscription, and are both delivered to SQS queue “Common-Orders”. Although each message only matched a single subscription, each was tested against the filter policy of every subscription in the topic. Hence, depending on which attributes are set on the incoming message, the message might actually match multiple subscriptions, and multiple deliveries will take place. Also, it is important to bear in mind that subscriptions with no filter policies catch every single message published to the topic, as a blank filter policy equates to a catch-all behavior. # Summary Amazon SNS allows for both string and numeric filtering operators. As explained in this post, string operators allow for exact, prefix, and “anything-but” comparisons, while numeric operators allow for exact and range comparisons. These advanced filtering operators bring even more power and flexibility to your pub/sub messaging functionality and also allow you to simplify your architecture further by removing even more logic from your subscribers. Message filtering can be implemented easily with existing AWS SDKs by applying message and subscription attributes across all SNS supported protocols (Amazon SQS, AWS Lambda, HTTP, SMS, email, and mobile push). SNS filtering operators for numeric matching, prefix matching, and blacklisting are available now in all AWS Regions, for no extra charge. To experiment with these new filtering operators yourself, and continue learning, try the 10-minute Tutorial Filter Messages Published to Topics. For more information, see Filtering Messages with Amazon SNS in the SNS documentation. # Jailed Streaming Site Operator Hit With Fresh$3m Damages Lawsuit

Post Syndicated from Andy original https://torrentfreak.com/jailed-streaming-site-operator-hit-with-fresh-3m-damages-lawsuit-180207/

After being founded more than half a decade ago, Swefilmer grew to become Sweden’s most popular movie and TV show streaming site. It was only a question of time before authorities stepped in to bring the show to an end.

In 2015, a Swedish operator of the site in his early twenties was raided by local police. A second man, Turkish and in his late twenties, was later arrested in Germany.

The pair, who hadn’t met in person, appeared before the Varberg District Court in January 2017, accused of making more than $1.5m from their activities between November 2013 and June 2015. The prosecutor described Swefilmer as “organized crime”, painting the then 26-year-old as the main brains behind the site and the 23-year-old as playing a much smaller role. The former was said to have led a luxury lifestyle after benefiting from$1.5m in advertising revenue.

The sentences eventually handed down matched the defendants’ alleged level of participation. While the younger man received probation and community service, the Turk was sentenced to serve three years in prison and ordered to forfeit $1.59m. Very quickly it became clear there would be an appeal, with plaintiffs represented by anti-piracy outfit RightsAlliance complaining that their 10m krona ($1.25m) claim for damages over the unlawful distribution of local movie Johan Falk: Kodnamn: Lisa had been ruled out by the Court.

With the appeal hearing now just a couple of weeks away, Swedish outlet Breakit is reporting that media giant Bonnier Broadcasting has launched an action of its own against the now 27-year-old former operator of Swefilmer.

According to the publication, Bonnier’s pay-TV company C More, which distributes for Fox, MGM, Paramount, Universal, Sony and Warner, is set to demand around 24m krona ($3.01m) via anti-piracy outfit RightsAlliance. “This is about organized crime and grossly criminal individuals who earned huge sums on our and others’ content. We want to take every opportunity to take advantage of our rights,” says Johan Gustafsson, Head of Corporate Communications at Bonnier Broadcasting. C More reportedly filed its lawsuit at the Stockholm District Court on January 30, 2018. At its core are four local movies said to have been uploaded and made available via Swefilmer. “C More would probably never even have granted a license to [the operator] to make or allow others to make the films available to the public in a similar way as [the operator] did, but if that had happened, the fee would not be less than 5,000,000 krona ($628,350) per film or a total of 20,000,000 krona (2,513,400),” C More’s claim reads. Speaking with Breakit, lawyer Ansgar Firsching said he couldn’t say much about C More’s claims against his client. “I am very surprised that two weeks before the main hearing [C More] comes in with this requirement. If you open another front, we have two trials that are partly about the same thing,” he said. Firsching said he couldn’t elaborate at this stage but expects his client to deny the claim for damages. C More sees things differently. “Many people live under the illusion that sites like Swefilmer are driven by idealistic teens in their parents’ basements, which is completely wrong. This is about organized crime where our content is used to generate millions and millions in revenue,” the company notes. The appeal in the main case is set to go ahead February 20th. Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons # Success at Apache: A Newbie’s Narrative Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/170536010891 Kuhu Shukla (bottom center) and team at the 2017 DataWorks Summit By Kuhu Shukla This post first appeared here on the Apache Software Foundation blog as part of ASF’s “Success at Apache” monthly blog series. As I sit at my desk on a rather frosty morning with my coffee, looking up new JIRAs from the previous day in the Apache Tez project, I feel rather pleased. The latest community release vote is complete, the bug fixes that we so badly needed are in and the new release that we tested out internally on our many thousand strong cluster is looking good. Today I am looking at a new stack trace from a different Apache project process and it is hard to miss how much of the exceptional code I get to look at every day comes from people all around the globe. A contributor leaves a JIRA comment before he goes on to pick up his kid from soccer practice while someone else wakes up to find that her effort on a bug fix for the past two months has finally come to fruition through a binding +1. Yahoo – which joined AOL, HuffPost, Tumblr, Engadget, and many more brands to form the Verizon subsidiary Oath last year – has been at the frontier of open source adoption and contribution since before I was in high school. So while I have no historical trajectories to share, I do have a story on how I found myself in an epic journey of migrating all of Yahoo jobs from Apache MapReduce to Apache Tez, a then-new DAG based execution engine. Oath grid infrastructure is through and through driven by Apache technologies be it storage through HDFS, resource management through YARN, job execution frameworks with Tez and user interface engines such as Hive, Hue, Pig, Sqoop, Spark, Storm. Our grid solution is specifically tailored to Oath’s business-critical data pipeline needs using the polymorphic technologies hosted, developed and maintained by the Apache community. On the third day of my job at Yahoo in 2015, I received a YouTube link on An Introduction to Apache Tez. I watched it carefully trying to keep up with all the questions I had and recognized a few names from my academic readings of Yarn ACM papers. I continued to ramp up on YARN and HDFS, the foundational Apache technologies Oath heavily contributes to even today. For the first few weeks I spent time picking out my favorite (necessary) mailing lists to subscribe to and getting started on setting up on a pseudo-distributed Hadoop cluster. I continued to find my footing with newbie contributions and being ever more careful with whitespaces in my patches. One thing was clear – Tez was the next big thing for us. By the time I could truly call myself a contributor in the Hadoop community nearly 80-90% of the Yahoo jobs were now running with Tez. But just like hiking up the Grand Canyon, the last 20% is where all the pain was. Being a part of the solution to this challenge was a happy prospect and thankfully contributing to Tez became a goal in my next quarter. The next sprint planning meeting ended with me getting my first major Tez assignment – progress reporting. The progress reporting in Tez was non-existent – “Just needs an API fix,” I thought. Like almost all bugs in this ecosystem, it was not easy. How do you define progress? How is it different for different kinds of outputs in a graph? The questions were many. I, however, did not have to go far to get answers. The Tez community actively came to a newbie’s rescue, finding answers and posing important questions. I started attending the bi-weekly Tez community sync up calls and asking existing contributors and committers for course correction. Suddenly the team was much bigger, the goals much more chiseled. This was new to anyone like me who came from the networking industry, where the most open part of the code are the RFCs and the implementation details are often hidden. These meetings served as a clean room for our coding ideas and experiments. Ideas were shared, to the extent of which data structure we should pick and what a future user of Tez would take from it. In between the usual status updates and extensive knowledge transfers were made. Oath uses Apache Pig and Apache Hive extensively and most of the urgent requirements and requests came from Pig and Hive developers and users. Each issue led to a community JIRA and as we started running Tez at Oath scale, new feature ideas and bugs around performance and resource utilization materialized. Every year most of the Hadoop team at Oath travels to the Hadoop Summit where we meet our cohorts from the Apache community and we stand for hours discussing the state of the art and what is next for the project. One such discussion set the course for the next year and a half for me. We needed an innovative way to shuffle data. Frameworks like MapReduce and Tez have a shuffle phase in their processing lifecycle wherein the data from upstream producers is made available to downstream consumers. Even though Apache Tez was designed with a feature set corresponding to optimization requirements in Pig and Hive, the Shuffle Handler Service was retrofitted from MapReduce at the time of the project’s inception. With several thousands of jobs on our clusters leveraging these features in Tez, the Shuffle Handler Service became a clear performance bottleneck. So as we stood talking about our experience with Tez with our friends from the community, we decided to implement a new Shuffle Handler for Tez. All the conversation points were tracked now through an umbrella JIRA TEZ-3334 and the to-do list was long. I picked a few JIRAs and as I started reading through I realized, this is all new code I get to contribute to and review. There might be a better way to put this, but to be honest it was just a lot of fun! All the whiteboards were full, the team took walks post lunch and discussed how to go about defining the API. Countless hours were spent debugging hangs while fetching data and looking at stack traces and Wireshark captures from our test runs. Six months in and we had the feature on our sandbox clusters. There were moments ranging from sheer frustration to absolute exhilaration with high fives as we continued to address review comments and fixing big and small issues with this evolving feature. As much as owning your code is valued everywhere in the software community, I would never go on to say “I did this!” In fact, “we did!” It is this strong sense of shared ownership and fluid team structure that makes the open source experience at Apache truly rewarding. This is just one example. A lot of the work that was done in Tez was leveraged by the Hive and Pig community and cross Apache product community interaction made the work ever more interesting and challenging. Triaging and fixing issues with the Tez rollout led us to hit a 100% migration score last year and we also rolled the Tez Shuffle Handler Service out to our research clusters. As of last year we have run around 100 million Tez DAGs with a total of 50 billion tasks over almost 38,000 nodes. In 2018 as I move on to explore Hadoop 3.0 as our future release, I hope that if someone outside the Apache community is reading this, it will inspire and intrigue them to contribute to a project of their choice. As an astronomy aficionado, going from a newbie Apache contributor to a newbie Apache committer was very much like looking through my telescope － it has endless possibilities and challenges you to be your best. About the Author: Kuhu Shukla is a software engineer at Oath and did her Masters in Computer Science at North Carolina State University. She works on the Big Data Platforms team on Apache Tez, YARN and HDFS with a lot of talented Apache PMCs and Committers in Champaign, Illinois. A recent Apache Tez Committer herself she continues to contribute to YARN and HDFS and spoke at the 2017 Dataworks Hadoop Summit on “Tez Shuffle Handler: Shuffling At Scale With Apache Hadoop”. Prior to that she worked on Juniper Networks’ router and switch configuration APIs. She likes to participate in open source conferences and women in tech events. In her spare time she loves singing Indian classical and jazz, laughing, whale watching, hiking and peering through her Dobsonian telescope. # Raspberry Crusoe: how a Pi got lost at sea Post Syndicated from James Robinson original https://www.raspberrypi.org/blog/lost-high-altitude-balloon/ The tale of the little HAB that could and its three-month journey from Portslade Aldridge Community Academy in the UK to the coast of Denmark. #### PACA Computing on Twitter Where did it land ???? #skypaca #skycademy @pacauk #RaspberryPi ## High-altitude ballooning Some of you may be familiar with Raspberry Pi being used as the flight computer, or tracker, of high-altitude balloon (HAB) payloads. For those who aren’t, high-altitude ballooning is a relatively simple activity (at least in principle) where a tracker is attached to a large weather balloon which is then released into the atmosphere. While the HAB ascends, the tracker takes pictures and data readings the whole time. Eventually (around 30km up) the balloon bursts, leaving the payload free to descend and be recovered. For a better explanation, I’m handing over to the students of UTC Oxfordshire: #### Pi in the Sky | UTC Oxfordshire On Tuesday 2nd May, students launched a Raspberry Pi computer 35,000 metres into the stratosphere as part of an Employer-Led project at UTC Oxfordshire, set by the Raspberry Pi Foundation. The project involved engineering, scientific and communication/publicity skills being developed to create the payload and code to interpret experiments set by the science team. ## Skycademy Over the past few years, we’ve seen schools and their students explore the possibilities that high-altitude ballooning offers, and back in 2015 and 2016 we ran Skycademy. The programme was simple enough: get a bunch of educators together in the same space, show them how to launch a balloon flight, and then send them back to their students to try and repeat what they’ve learned. Since the first Skycademy event, a number of participants have carried out launches, and we are extremely proud of each and every one of them. ## The case of the vanishing PACA HAB Not every launch has been a 100% success though. There are many things that can and do go wrong during HAB flights, and watching each launch from the comfort of our office can be a nerve-wracking experience. We had such an experience back in July 2017, during the launch performed by Skycademy graduate and Raspberry Pi Certified Educator Dave Hartley and his students from Portslade Aldridge Community Academy (PACA). Dave and his team had been working on their payload for some time, and were awaiting suitable weather conditions. Early one Wednesday in July, everything aligned: they had a narrow window of good weather and so set their launch plan in motion. Soon they had assembled the payload in the school grounds and all was ready for the launch. #### Dave Hartley on Twitter Launch day! @pacauk #skycademy #skypaca #raspberrypi Just before 11:00, they’d completed their final checks and released their payload into the atmosphere. Over the course of 64 minutes, the HAB steadily rose to an altitude of 25647m, where it captured some amazing pictures before the balloon burst and a rapid descent began. Soon after the payload began to descend, the team noticed something worrying: their predicted descent path took the payload dangerously far south — it was threatening to land in the sea. As the payload continued to lose altitude, their calculated results kept shifting, alternately predicting a landing on the ground or out to sea. Eventually it became clear that the payload would narrowly overshoot the land, and it finally landed about 2 km out to sea. The path of the balloon It’s not uncommon for a HAB payload to get lost. There are many ways this can happen, particularly in a narrow country with a prevailing easterly wind like the UK. Payloads can get lost at sea, land somewhere inaccessible, or simply run out of power before they are located and retrieved. So normally, this would be the end of the story for the PACA students — even if the team had had a speedboat to hand, their payload was surely lost for good. ## A message from Denmark However, this is not the end of our story! A couple of months later, I arrived at work and saw this tweet from a colleague: #### Raspberry Pi on Twitter Anyone lost a Raspberry Pi HAB? Someone found this one on a beach in south western Denmark yesterday #UKHAS https://t.co/7lBzFiemgr Good Samaritan Henning Hansen had found a Raspberry Pi washed up on a remote beach in Denmark! While walking a stretch of coast to collect plastic debris for an environmental monitoring project, he came across something unusual near the shore at 55°04’53.0″N and 8°38’46.9″E. This of course piqued my interest, and we began to investigate the image he had shared on Facebook. Inspecting the photo closely, we noticed a small asset label — the kind of label that, over a year earlier, we’d stuck to each and every bit of Skycademy field kit. We excitedly claimed the kit on behalf of Dave and his students, and contacted Henning to arrange the recovery of the payload. He told us it must have been carried ashore with the tide some time between 21 and 27 September, and probably on 21 September, since that day had the highest tide over the period. This meant the payload must have spent over two months at sea! From the photo we could tell that the Raspberry Pi had suffered significant corrosion, having been exposed to salt water for so long, and so we felt pessimistic about the chances that there would be any recoverable data on it. However, Henning said that he’d been able to read some files from the FAT partition of the SD card, so all hope was not lost. After a few weeks and a number of complications around dispatch and delivery (thank you, Henning, for your infinite patience!), Helen collected the HAB from a local Post Office. SUCCESS! We set about trying to read the data from the SD card, and eventually became disheartened: despite several attempts, we were unable to read its contents. In a last-ditch effort, we gave the SD card to Jonathan, one of our engineers, who initially laughed at the prospect of recovering any data from it. But ten minutes later, he returned with news of success! Since then, we’ve been able to reunite the payload with the PACA launch team, and the students sent us the perfect message to end this story: The post Raspberry Crusoe: how a Pi got lost at sea appeared first on Raspberry Pi. # Wine 3.0 released Post Syndicated from corbet original https://lwn.net/Articles/744741/rss Version 3.0 of the Wine Windows emulation layer has been released. “This release represents a year of development effort and over 6,000 individual changes. ” Most of the improvements seem to be around Direct3D graphics, but it also now possible to package up Wine as an Android app; see the release notes for details. # MusE 3.0.0 released Post Syndicated from ris original https://lwn.net/Articles/743598/rss Three years after the last stable release, version 3.0 of the MusE MIDI/Audio sequencer is now available. As you might expect there many changes since the last release including a switch to Qt5, a new Plugin Path editor in Global Settings, a mixer makeover with lots of fixes, a system-wide move to double precision of all audio paths, and much more. # Modding Legends Team-Xecuter Announce “Future-Proof” Nintendo Switch Hack Post Syndicated from Andy original https://torrentfreak.com/modding-legends-team-xecuter-announce-future-proof-nintendo-switch-hack-180104/ Since the advent of the first truly mass-market videogames consoles, people have dreamed about removing the protection mechanisms that prevent users from tinkering with their machines. These modifications — which are software, hardware, or combination of the two — facilitate the running of third-party or “homebrew” code. On this front, a notable mention must go to XBMC (now known as Kodi) which ran on the original Xbox after its copy protection mechanisms had been removed. However, these same modifications regularly open the door to mass-market piracy too, with mod-chips (hardware devices) or soft-mods (software solutions) opening up machines so that consumers can run games obtained from the Internet or elsewhere. For the Nintendo Switch, that prospect edged closer at the end of December when Wololo reported that hackers Plutoo, Derrek, and Naehrwert had given a long presentation (video) at the 34C3 hacking conference in Germany, revealing their kernel hack for the Nintendo Switch. While this in itself is an exciting development, fresh news from a veteran hacking group suggests that Nintendo could be in big trouble on the piracy front in the not-too-distant future. “In the light of a recent presentation at the Chaos Communication Congress in Germany we’ve decided to come out of the woodwork and tease you all a bit with our latest upcoming product,” the legendary Team-Xecutor just announced. While the hack announced in December requires Switch firmware 3.0 (and a copy of Pokken Tournament DX), Team-Xecutor say that their product will be universal, something which tends to suggest a fundamental flaw in the Switch system. “This solution will work on ANY Nintendo Switch console regardless of the currently installed firmware, and will be completely future proof,” the team explain. Xecutor say that their solution opens up the possibility of custom firmware (CFW) on Nintendo’s console. In layman’s terms, this means that those with the technical ability will be able to dictate, at least to a point, how the console functions. “We want to move the community forward and provide a persistent, stable and fast method of running your own code and custom firmware patches on Nintendo’s latest flagship product. And we think we’ve succeeded!” the team add. The console-modding community thrives on rumors, with various parties claiming to have made progress here and there, on this console and that, so it’s natural for people to greet this kind of announcement with a degree of skepticism. That being said, Team-Xecutor is no regular group. With a long history of console-based meddling, Team-Xecutor’s efforts include hardware solutions for the original Playstation and Playstation 2, an array of hacks for the original Xbox (Enigmah and various Xecuter-branded solutions), plus close involvement in prominent Xbox360 mods. Their pedigree is definitely not up for debate. For now, the team isn’t releasing any more details on the nature of the hack but they have revealed when the public can expect to get their hands on it. “Spring 2018 or there around,” they conclude. Team-Xecutor demo Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons # Random with care Post Syndicated from Eevee original https://eev.ee/blog/2018/01/02/random-with-care/ Hi! Here are a few loose thoughts about picking random numbers. ## A word about crypto DON’T ROLL YOUR OWN CRYPTO This is all aimed at frivolous pursuits like video games. Hell, even video games where money is at stake should be deferring to someone who knows way more than I do. Otherwise you might find out that your deck shuffles in your poker game are woefully inadequate and some smartass is cheating you out of millions. (If your random number generator has fewer than 226 bits of state, it can’t even generate every possible shuffling of a deck of cards!) ## Use the right distribution Most languages have a random number primitive that spits out a number uniformly in the range [0, 1), and you can go pretty far with just that. But beware a few traps! ### Random pitches Say you want to pitch up a sound by a random amount, perhaps up to an octave. Your audio API probably has a way to do this that takes a pitch multiplier, where I say “probably” because that’s how the only audio API I’ve used works. Easy peasy. If 1 is unchanged and 2 is pitched up by an octave, then all you need is rand() + 1. Right? No! Pitch is exponential — within the same octave, the “gap” between C and C♯ is about half as big as the gap between B and the following C. If you pick a pitch multiplier uniformly, you’ll have a noticeable bias towards the higher pitches. One octave corresponds to a doubling of pitch, so if you want to pick a random note, you want 2 ** rand(). ### Random directions For two dimensions, you can just pick a random angle with rand() * TAU. If you want a vector rather than an angle, or if you want a random direction in three dimensions, it’s a little trickier. You might be tempted to just pick a random point where each component is rand() * 2 - 1 (ranging from −1 to 1), but that’s not quite right. A direction is a point on the surface (or, equivalently, within the volume) of a sphere, and picking each component independently produces a point within the volume of a cube; the result will be a bias towards the corners of the cube, where there’s much more extra volume beyond the sphere. No? Well, just trust me. I don’t know how to make a diagram for this. Anyway, you could use the Pythagorean theorem a few times and make a huge mess of things, or it turns out there’s a really easy way that even works for two or four or any number of dimensions. You pick each coordinate from a Gaussian (normal) distribution, then normalize the resulting vector. In other words, using Python’s random module:  1 2 3 4 5 6 def random_direction(): x = random.gauss(0, 1) y = random.gauss(0, 1) z = random.gauss(0, 1) r = math.sqrt(x*x + y*y + z*z) return x/r, y/r, z/r  Why does this work? I have no idea! Note that it is possible to get zero (or close to it) for every component, in which case the result is nonsense. You can re-roll all the components if necessary; just check that the magnitude (or its square) is less than some epsilon, which is equivalent to throwing away a tiny sphere at the center and shouldn’t affect the distribution. ### Beware Gauss Since I brought it up: the Gaussian distribution is a pretty nice one for choosing things in some range, where the middle is the common case and should appear more frequently. That said, I never use it, because it has one annoying drawback: the Gaussian distribution has no minimum or maximum value, so you can’t really scale it down to the range you want. In theory, you might get any value out of it, with no limit on scale. In practice, it’s astronomically rare to actually get such a value out. I did a hundred million trials just to see what would happen, and the largest value produced was 5.8. But, still, I’d rather not knowingly put extremely rare corner cases in my code if I can at all avoid it. I could clamp the ends, but that would cause unnatural bunching at the endpoints. I could reroll if I got a value outside some desired range, but I prefer to avoid rerolling when I can, too; after all, it’s still (astronomically) possible to have to reroll for an indefinite amount of time. (Okay, it’s really not, since you’ll eventually hit the period of your PRNG. Still, though.) I don’t bend over backwards here — I did just say to reroll when picking a random direction, after all — but when there’s a nicer alternative I’ll gladly use it. And lo, there is a nicer alternative! Enter the beta distribution. It always spits out a number in [0, 1], so you can easily swap it in for the standard normal function, but it takes two “shape” parameters α and β that alter its behavior fairly dramatically. With α = β = 1, the beta distribution is uniform, i.e. no different from rand(). As α increases, the distribution skews towards the right, and as β increases, the distribution skews towards the left. If α = β, the whole thing is symmetric with a hump in the middle. The higher either one gets, the more extreme the hump (meaning that value is far more common than any other). With a little fiddling, you can get a number of interesting curves. Screenshots don’t really do it justice, so here’s a little Wolfram widget that lets you play with α and β live: Note that if α = 1, then 1 is a possible value; if β = 1, then 0 is a possible value. You probably want them both greater than 1, which clamps the endpoints to zero. Also, it’s possible to have either α or β or both be less than 1, but this creates very different behavior: the corresponding endpoints become poles. Anyway, something like α = β = 3 is probably close enough to normal for most purposes but already clamped for you. And you could easily replicate something like, say, NetHack’s incredibly bizarre rnz function. ### Random frequency Say you want some event to have an 80% chance to happen every second. You (who am I kidding, I) might be tempted to do something like this:  1 2 if random() < 0.8 * dt: do_thing()  In an ideal world, dt is always the same and is equal to 1 / f, where f is the framerate. Replace that 80% with a variable, say P, and every tic you have a P / f chance to do the… whatever it is. Each second, f tics pass, so you’ll make this check f times. The chance that any check succeeds is the inverse of the chance that every check fails, which is $$1 – \left(1 – \frac{P}{f}\right)^f$$. For P of 80% and a framerate of 60, that’s a total probability of 55.3%. Wait, what? Consider what happens if the framerate is 2. On the first tic, you roll 0.4 twice — but probabilities are combined by multiplying, and splitting work up by dt only works for additive quantities. You lose some accuracy along the way. If you’re dealing with something that multiplies, you need an exponent somewhere. But in this case, maybe you don’t want that at all. Each separate roll you make might independently succeed, so it’s possible (but very unlikely) that the event will happen 60 times within a single second! Or 200 times, if that’s someone’s framerate. If you explicitly want something to have a chance to happen on a specific interval, you have to check on that interval. If you don’t have a gizmo handy to run code on an interval, it’s easy to do yourself with a time buffer:  1 2 3 4 5 6 timer += dt # here, 1 is the "every 1 seconds" while timer > 1: timer -= 1 if random() < 0.8: do_thing()  Using while means rolls still happen even if you somehow skipped over an entire second. (For the curious, and the nerds who already noticed: the expression $$1 – \left(1 – \frac{P}{f}\right)^f$$ converges to a specific value! As the framerate increases, it becomes a better and better approximation for $$1 – e^{-P}$$, which for the example above is 0.551. Hey, 60 fps is pretty accurate — it’s just accurately representing something nowhere near what I wanted. Er, you wanted.) ### Rolling your own Of course, you can fuss with the classic [0, 1] uniform value however you want. If I want a bias towards zero, I’ll often just square it, or multiply two of them together. If I want a bias towards one, I’ll take a square root. If I want something like a Gaussian/normal distribution, but with clearly-defined endpoints, I might add together n rolls and divide by n. (The normal distribution is just what you get if you roll infinite dice and divide by infinity!) It’d be nice to be able to understand exactly what this will do to the distribution. Unfortunately, that requires some calculus, which this post is too small to contain, and which I didn’t even know much about myself until I went down a deep rabbit hole while writing, and which in many cases is straight up impossible to express directly. Here’s the non-calculus bit. A source of randomness is often graphed as a PDF — a probability density function. You’ve almost certainly seen a bell curve graphed, and that’s a PDF. They’re pretty nice, since they do exactly what they look like: they show the relative chance that any given value will pop out. On a bog standard bell curve, there’s a peak at zero, and of course zero is the most common result from a normal distribution. (Okay, actually, since the results are continuous, it’s vanishingly unlikely that you’ll get exactly zero — but you’re much more likely to get a value near zero than near any other number.) For the uniform distribution, which is what a classic rand() gives you, the PDF is just a straight horizontal line — every result is equally likely. If there were a calculus bit, it would go here! Instead, we can cheat. Sometimes. Mathematica knows how to work with probability distributions in the abstract, and there’s a free web version you can use. For the example of squaring a uniform variable, try this out:  1 PDF[TransformedDistribution[u^2, u \[Distributed] UniformDistribution[{0, 1}]], u]  (The \[Distributed] is a funny tilde that doesn’t exist in Unicode, but which Mathematica uses as a first-class operator. Also, press shiftEnter to evaluate the line.) This will tell you that the distribution is… $$\frac{1}{2\sqrt{u}}$$. Weird! You can plot it:  1 Plot[%, {u, 0, 1}]  (The % refers to the result of the last thing you did, so if you want to try several of these, you can just do Plot[PDF[…], u] directly.) The resulting graph shows that numbers around zero are, in fact, vastly — infinitely — more likely than anything else. What about multiplying two together? I can’t figure out how to get Mathematica to understand this, but a great amount of digging revealed that the answer is -ln x, and from there you can plot them both on Wolfram Alpha. They’re similar, though squaring has a much better chance of giving you high numbers than multiplying two separate rolls — which makes some sense, since if either of two rolls is a low number, the product will be even lower. What if you know the graph you want, and you want to figure out how to play with a uniform roll to get it? Good news! That’s a whole thing called inverse transform sampling. All you have to do is take an integral. Good luck! This is all extremely ridiculous. New tactic: Just Simulate The Damn Thing. You already have the code; run it a million times, make a histogram, and tada, there’s your PDF. That’s one of the great things about computers! Brute-force numerical answers are easy to come by, so there’s no excuse for producing something like rnz. (Though, be sure your histogram has sufficiently narrow buckets — I tried plotting one for rnz once and the weird stuff on the left side didn’t show up at all!) By the way, I learned something from futzing with Mathematica here! Taking the square root (to bias towards 1) gives a PDF that’s a straight diagonal line, nothing like the hyperbola you get from squaring (to bias towards 0). How do you get a straight line the other way? Surprise: $$1 – \sqrt{1 – u}$$. ### Okay, okay, here’s the actual math I don’t claim to have a very firm grasp on this, but I had a hell of a time finding it written out clearly, so I might as well write it down as best I can. This was a great excuse to finally set up MathJax, too. Say $$u(x)$$ is the PDF of the original distribution and $$u$$ is a representative number you plucked from that distribution. For the uniform distribution, $$u(x) = 1$$. Or, more accurately, $$u(x) = \begin{cases} 1 & \text{ if } 0 \le x \lt 1 \\ 0 & \text{ otherwise } \end{cases}$$ Remember that $$x$$ here is a possible outcome you want to know about, and the PDF tells you the relative probability that a roll will be near it. This PDF spits out 1 for every $$x$$, meaning every number between 0 and 1 is equally likely to appear. We want to do something to that PDF, which creates a new distribution, whose PDF we want to know. I’ll use my original example of $$f(u) = u^2$$, which creates a new PDF $$v(x)$$. The trick is that we need to work in terms of the cumulative distribution function for $$u$$. Where the PDF gives the relative chance that a roll will be (“near”) a specific value, the CDF gives the relative chance that a roll will be less than a specific value. The conventions for this seem to be a bit fuzzy, and nobody bothers to explain which ones they’re using, which makes this all the more confusing to read about… but let’s write the CDF with a capital letter, so we have $$U(x)$$. In this case, $$U(x) = x$$, a straight 45° line (at least between 0 and 1). With the definition I gave, this should make sense. At some arbitrary point like 0.4, the value of the PDF is 1 (0.4 is just as likely as anything else), and the value of the CDF is 0.4 (you have a 40% chance of getting a number from 0 to 0.4). Calculus ahoy: the PDF is the derivative of the CDF, which means it measures the slope of the CDF at any point. For $$U(x) = x$$, the slope is always 1, and indeed $$u(x) = 1$$. See, calculus is easy. Okay, so, now we’re getting somewhere. What we want is the CDF of our new distribution, $$V(x)$$. The CDF is defined as the probability that a roll $$v$$ will be less than $$x$$, so we can literally write: $$V(x) = P(v \le x)$$ (This is why we have to work with CDFs, rather than PDFs — a PDF gives the chance that a roll will be “nearby,” whatever that means. A CDF is much more concrete.) What is $$v$$, exactly? We defined it ourselves; it’s the do something applied to a roll from the original distribution, or $$f(u)$$. $$V(x) = P\!\left(f(u) \le x\right)$$ Now the first tricky part: we have to solve that inequality for $$u$$, which means we have to do something, backwards to $$x$$. $$V(x) = P\!\left(u \le f^{-1}(x)\right)$$ Almost there! We now have a probability that $$u$$ is less than some value, and that’s the definition of a CDF! $$V(x) = U\!\left(f^{-1}(x)\right)$$ Hooray! Now to turn these CDFs back into PDFs, all we need to do is differentiate both sides and use the chain rule. If you never took calculus, don’t worry too much about what that means! $$v(x) = u\!\left(f^{-1}(x)\right)\left|\frac{d}{dx}f^{-1}(x)\right|$$ Wait! Where did that absolute value come from? It takes care of whether $$f(x)$$ increases or decreases. It’s the least interesting part here by far, so, whatever. There’s one more magical part here when using the uniform distribution — $$u(\dots)$$ is always equal to 1, so that entire term disappears! (Note that this only works for a uniform distribution with a width of 1; PDFs are scaled so the entire area under them sums to 1, so if you had a rand() that could spit out a number between 0 and 2, the PDF would be $$u(x) = \frac{1}{2}$$.) $$v(x) = \left|\frac{d}{dx}f^{-1}(x)\right|$$ So for the specific case of modifying the output of rand(), all we have to do is invert, then differentiate. The inverse of $$f(u) = u^2$$ is $$f^{-1}(x) = \sqrt{x}$$ (no need for a ± since we’re only dealing with positive numbers), and differentiating that gives $$v(x) = \frac{1}{2\sqrt{x}}$$. Done! This is also why square root comes out nicer; inverting it gives $$x^2$$, and differentiating that gives $$2x$$, a straight line. Incidentally, that method for turning a uniform distribution into any distribution — inverse transform sampling — is pretty much the same thing in reverse: integrate, then invert. For example, when I saw that taking the square root gave $$v(x) = 2x$$, I naturally wondered how to get a straight line going the other way, $$v(x) = 2 – 2x$$. Integrating that gives $$2x – x^2$$, and then you can use the quadratic formula (or just ask Wolfram Alpha) to solve $$2x – x^2 = u$$ for $$x$$ and get $$f(u) = 1 – \sqrt{1 – u}$$. Multiply two rolls is a bit more complicated; you have to write out the CDF as an integral and you end up doing a double integral and wow it’s a mess. The only thing I’ve retained is that you do a division somewhere, which then gets integrated, and that’s why it ends up as $$-\ln x$$. And that’s quite enough of that! (Okay but having math in my blog is pretty cool and I will definitely be doing more of this, sorry, not sorry.) ## Random vs varied Sometimes, random isn’t actually what you want. We tend to use the word “random” casually to mean something more like chaotic, i.e., with no discernible pattern. But that’s not really random. In fact, given how good humans can be at finding incidental patterns, they aren’t all that unlikely! Consider that when you roll two dice, they’ll come up either the same or only one apart almost half the time. Coincidence? Well, yes. If you ask for randomness, you’re saying that any outcome — or series of outcomes — is acceptable, including five heads in a row or five tails in a row. Most of the time, that’s fine. Some of the time, it’s less fine, and what you really want is variety. Here are a couple examples and some fairly easy workarounds. ### NPC quips The nature of games is such that NPCs will eventually run out of things to say, at which point further conversation will give the player a short brush-off quip — a slight nod from the designer to the player that, hey, you hit the end of the script. Some NPCs have multiple possible quips and will give one at random. The trouble with this is that it’s very possible for an NPC to repeat the same quip several times in a row before abruptly switching to another one. With only a few options to choose from, getting the same option twice or thrice (especially across an entire game, which may have numerous NPCs) isn’t all that unlikely. The notion of an NPC quip isn’t very realistic to start with, but having someone repeat themselves and then abruptly switch to something else is especially jarring. The easy fix is to show the quips in order! Paradoxically, this is more consistently varied than choosing at random — the original “order” is likely to be meaningless anyway, and it already has the property that the same quip can never appear twice in a row. If you like, you can shuffle the list of quips every time you reach the end, but take care here — it’s possible that the last quip in the old order will be the same as the first quip in the new order, so you may still get a repeat. (Of course, you can just check for this case and swap the first quip somewhere else if it bothers you.) That last behavior is, in fact, the canonical way that Tetris chooses pieces — the game simply shuffles a list of all 7 pieces, gives those to you in shuffled order, then shuffles them again to make a new list once it’s exhausted. There’s no avoidance of duplicates, though, so you can still get two S blocks in a row, or even two S and two Z all clumped together, but no more than that. Some Tetris variants take other approaches, such as actively avoiding repeats even several pieces apart or deliberately giving you the worst piece possible. ### Random drops Random drops are often implemented as a flat chance each time. Maybe enemies have a 5% chance to drop health when they die. Legally speaking, over the long term, a player will see health drops for about 5% of enemy kills. Over the short term, they may be desperate for health and not survive to see the long term. So you may want to put a thumb on the scale sometimes. Games in the Metroid series, for example, have a somewhat infamous bias towards whatever kind of drop they think you need — health if your health is low, missiles if your missiles are low. I can’t give you an exact approach to use, since it depends on the game and the feeling you’re going for and the variables at your disposal. In extreme cases, you might want to guarantee a health drop from a tough enemy when the player is critically low on health. (Or if you’re feeling particularly evil, you could go the other way and deny the player health when they most need it…) The problem becomes a little different, and worse, when the event that triggers the drop is relatively rare. The pathological case here would be something like a raid boss in World of Warcraft, which requires hours of effort from a coordinated group of people to defeat, and which has some tiny chance of dropping a good item that will go to only one of those people. This is why I stopped playing World of Warcraft at 60. Dialing it back a little bit gives us Enter the Gungeon, a roguelike where each room is a set of encounters and each floor only has a dozen or so rooms. Initially, you have a 1% chance of getting a reward after completing a room — but every time you complete a room and don’t get a reward, the chance increases by 9%, up to a cap of 80%. Once you get a reward, the chance resets to 1%. The natural question is: how frequently, exactly, can a player expect to get a reward? We could do math, or we could Just Simulate The Damn Thing.   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 from collections import Counter import random histogram = Counter() TRIALS = 1000000 chance = 1 rooms_cleared = 0 rewards_found = 0 while rewards_found < TRIALS: rooms_cleared += 1 if random.random() * 100 < chance: # Reward! rewards_found += 1 histogram[rooms_cleared] += 1 rooms_cleared = 0 chance = 1 else: chance = min(80, chance + 9) for gaps, count in sorted(histogram.items()): print(f"{gaps:3d} | {count / TRIALS * 100:6.2f}%", '#' * (count // (TRIALS // 100)))    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15  1 | 0.98% 2 | 9.91% ######### 3 | 17.00% ################ 4 | 20.23% #################### 5 | 19.21% ################### 6 | 15.05% ############### 7 | 9.69% ######### 8 | 5.07% ##### 9 | 2.09% ## 10 | 0.63% 11 | 0.12% 12 | 0.03% 13 | 0.00% 14 | 0.00% 15 | 0.00%  We’ve got kind of a hilly distribution, skewed to the left, which is up in this histogram. Most of the time, a player should see a reward every three to six rooms, which is maybe twice per floor. It’s vanishingly unlikely to go through a dozen rooms without ever seeing a reward, so a player should see at least one per floor. Of course, this simulated a single continuous playthrough; when starting the game from scratch, your chance at a reward always starts fresh at 1%, the worst it can be. If you want to know about how many rewards a player will get on the first floor, hey, Just Simulate The Damn Thing.  1 2 3 4 5 6 7  0 | 0.01% 1 | 13.01% ############# 2 | 56.28% ######################################################## 3 | 27.49% ########################### 4 | 3.10% ### 5 | 0.11% 6 | 0.00%  Cool. Though, that’s assuming exactly 12 rooms; it might be worth changing that to pick at random in a way that matches the level generator. (Enter the Gungeon does some other things to skew probability, which is very nice in a roguelike where blind luck can make or break you. For example, if you kill a boss without having gotten a new gun anywhere else on the floor, the boss is guaranteed to drop a gun.) ### Critical hits I suppose this is the same problem as random drops, but backwards. Say you have a battle sim where every attack has a 6% chance to land a devastating critical hit. Presumably the same rules apply to both the player and the AI opponents. Consider, then, that the AI opponents have exactly the same 6% chance to ruin the player’s day. Consider also that this gives them an 0.4% chance to critical hit twice in a row. 0.4% doesn’t sound like much, but across an entire playthrough, it’s not unlikely that a player might see it happen and find it incredibly annoying. Perhaps it would be worthwhile to explicitly forbid AI opponents from getting consecutive critical hits. ## In conclusion An emerging theme here has been to Just Simulate The Damn Thing. So consider Just Simulating The Damn Thing. Even a simple change to a random value can do surprising things to the resulting distribution, so unless you feel like differentiating the inverse function of your code, maybe test out any non-trivial behavior and make sure it’s what you wanted. Probability is hard to reason about. # OWASP Dependency Check Maven Plugin – a Must-Have Post Syndicated from Bozho original https://techblog.bozho.net/owasp-dependency-check-maven-plugin-must/ I have to admit with a high degree of shame that I didn’t know about the OWASP dependency check maven plugin. And seems to have been around since 2013. And apparently a thousand projects on GitHub are using it already. In the past I’ve gone manually through dependencies to check them against vulnerability databases, or in many cases I was just blissfully ignorant about any vulnerabilities that my dependencies had. The purpose of this post is just that – to recommend the OWASP dependency check maven plugin as a must-have in practically every maven project. (There are dependency-check tools for other build systems as well). When you add the plugin it generates a report. Initially you can go and manually upgrade the problematic dependencies (I upgraded two of those in my current project), or suppress the false positives (e.g. the cassandra library is marked as vulnerable, whereas the actual vulnerability is that Cassandra binds an unauthenticated RMI endpoint, which I’ve addressed via my stack setup, so the library isn’t an issue). Then you can configure a threshold for vulnerabilities and fail the build if new ones appear – either by you adding a vulnerable dependency, or in case a vulnerability is discovered in an existing dependency. All of that is shown in the examples page and is pretty straightforward. I’d suggest adding the plugin immediately, it’s a must-have: <plugin> <groupId>org.owasp</groupId> <artifactId>dependency-check-maven</artifactId> <version>3.0.2</version> <executions> <execution> <goals> <goal>check</goal> </goals> </execution> </executions> </plugin>  Now, checking dependencies for vulnerabilities is just one small aspect of having your software secure and it shouldn’t give you a false sense of security (a sort-of “I have my dependencies checked, therefore my system is secure” fallacy). But it’s an important aspect. And having that check automated is a huge gain. The post OWASP Dependency Check Maven Plugin – a Must-Have appeared first on Bozho's tech blog. # Managing AWS Lambda Function Concurrency Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/managing-aws-lambda-function-concurrency/ One of the key benefits of serverless applications is the ease in which they can scale to meet traffic demands or requests, with little to no need for capacity planning. In AWS Lambda, which is the core of the serverless platform at AWS, the unit of scale is a concurrent execution. This refers to the number of executions of your function code that are happening at any given time. Thinking about concurrent executions as a unit of scale is a fairly unique concept. In this post, I dive deeper into this and talk about how you can make use of per function concurrency limits in Lambda. ## Understanding concurrency in Lambda Instead of diving right into the guts of how Lambda works, here’s an appetizing analogy: a magical pizza. Yes, a magical pizza! This magical pizza has some unique properties: • It has a fixed maximum number of slices, such as 8. • Slices automatically re-appear after they are consumed. • When you take a slice from the pizza, it does not re-appear until it has been completely consumed. • One person can take multiple slices at a time. • You can easily ask to have the number of slices increased, but they remain fixed at any point in time otherwise. Now that the magical pizza’s properties are defined, here’s a hypothetical situation of some friends sharing this pizza. Shawn, Kate, Daniela, Chuck, Ian and Avleen get together every Friday to share a pizza and catch up on their week. As there is just six of them, they can easily all enjoy a slice of pizza at a time. As they finish each slice, it re-appears in the pizza pan and they can take another slice again. Given the magical properties of their pizza, they can continue to eat all they want, but with two very important constraints: • If any of them take too many slices at once, the others may not get as much as they want. • If they take too many slices, they might also eat too much and get sick. One particular week, some of the friends are hungrier than the rest, taking two slices at a time instead of just one. If more than two of them try to take two pieces at a time, this can cause contention for pizza slices. Some of them would wait hungry for the slices to re-appear. They could ask for a pizza with more slices, but then run the same risk again later if more hungry friends join than planned for. What can they do? If the friends agreed to accept a limit for the maximum number of slices they each eat concurrently, both of these issues are avoided. Some could have a maximum of 2 of the 8 slices, or other concurrency limits that were more or less. Just so long as they kept it at or under eight total slices to be eaten at one time. This would keep any from going hungry or eating too much. The six friends can happily enjoy their magical pizza without worry! ## Concurrency in Lambda Concurrency in Lambda actually works similarly to the magical pizza model. Each AWS Account has an overall AccountLimit value that is fixed at any point in time, but can be easily increased as needed, just like the count of slices in the pizza. As of May 2017, the default limit is 1000 “slices” of concurrency per AWS Region. Also like the magical pizza, each concurrency “slice” can only be consumed individually one at a time. After consumption, it becomes available to be consumed again. Services invoking Lambda functions can consume multiple slices of concurrency at the same time, just like the group of friends can take multiple slices of the pizza. Let’s take our example of the six friends and bring it back to AWS services that commonly invoke Lambda: • Amazon S3 • Amazon Kinesis • Amazon DynamoDB • Amazon Cognito In a single account with the default concurrency limit of 1000 concurrent executions, any of these four services could invoke enough functions to consume the entire limit or some part of it. Just like with the pizza example, there is the possibility for two issues to pop up: • One or more of these services could invoke enough functions to consume a majority of the available concurrency capacity. This could cause others to be starved for it, causing failed invocations. • A service could consume too much concurrent capacity and cause a downstream service or database to be overwhelmed, which could cause failed executions. For Lambda functions that are launched in a VPC, you have the potential to consume the available IP addresses in a subnet or the maximum number of elastic network interfaces to which your account has access. For more information, see Configuring a Lambda Function to Access Resources in an Amazon VPC. For information about elastic network interface limits, see Network Interfaces section in the Amazon VPC Limits topic. One way to solve both of these problems is applying a concurrency limit to the Lambda functions in an account. ## Configuring per function concurrency limits You can now set a concurrency limit on individual Lambda functions in an account. The concurrency limit that you set reserves a portion of your account level concurrency for a given function. All of your functions’ concurrent executions count against this account-level limit by default. If you set a concurrency limit for a specific function, then that function’s concurrency limit allocation is deducted from the shared pool and assigned to that specific function. AWS also reserves 100 units of concurrency for all functions that don’t have a specified concurrency limit set. This helps to make sure that future functions have capacity to be consumed. Going back to the example of the consuming services, you could set throttles for the functions as follows: Amazon S3 function = 350 Amazon Kinesis function = 200 Amazon DynamoDB function = 200 Amazon Cognito function = 150 Total = 900 With the 100 reserved for all non-concurrency reserved functions, this totals the account limit of 1000. Here’s how this works. To start, create a basic Lambda function that is invoked via Amazon API Gateway. This Lambda function returns a single “Hello World” statement with an added sleep time between 2 and 5 seconds. The sleep time simulates an API providing some sort of capability that can take a varied amount of time. The goal here is to show how an API that is underloaded can reach its concurrency limit, and what happens when it does. To create the example function 1. Open the Lambda console. 2. Choose Create Function. 3. For Author from scratch, enter the following values: 1. For Name, enter a value (such as concurrencyBlog01). 2. For Runtime, choose Python 3.6. 3. For Role, choose Create new role from template and enter a name aligned with this function, such as concurrencyBlogRole. 4. Choose Create function. 5. The function is created with some basic example code. Replace that code with the following:  import time from random import randint seconds = randint(2, 5) def lambda_handler(event, context): time.sleep(seconds) return {"statusCode": 200, "body": ("Hello world, slept " + str(seconds) + " seconds"), "headers": { "Access-Control-Allow-Headers": "Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token", "Access-Control-Allow-Methods": "GET,OPTIONS", }}  1. Under Basic settings, set Timeout to 10 seconds. While this function should only ever take up to 5-6 seconds (with the 5-second max sleep), this gives you a little bit of room if it takes longer. 1. Choose Save at the top right. At this point, your function is configured for this example. Test it and confirm this in the console: 1. Choose Test. 2. Enter a name (it doesn’t matter for this example). 3. Choose Create. 4. In the console, choose Test again. 5. You should see output similar to the following: Now configure API Gateway so that you have an HTTPS endpoint to test against. 1. In the Lambda console, choose Configuration. 2. Under Triggers, choose API Gateway. 3. Open the API Gateway icon now shown as attached to your Lambda function: 1. Under Configure triggers, leave the default values for API Name and Deployment stage. For Security, choose Open. 2. Choose Add, Save. API Gateway is now configured to invoke Lambda at the Invoke URL shown under its configuration. You can take this URL and test it in any browser or command line, using tools such as “curl”:  curl https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01
Hello world, slept 2 seconds


### Throwing load at the function

Now start throwing some load against your API Gateway + Lambda function combo. Right now, your function is only limited by the total amount of concurrency available in an account. For this example account, you might have 850 unreserved concurrency out of a full account limit of 1000 due to having configured a few concurrency limits already (also the 100 concurrency saved for all functions without configured limits). You can find all of this information on the main Dashboard page of the Lambda console:

For generating load in this example, use an open source tool called “hey” (https://github.com/rakyll/hey), which works similarly to ApacheBench (ab). You test from an Amazon EC2 instance running the default Amazon Linux AMI from the EC2 console. For more help with configuring an EC2 instance, follow the steps in the Launch Instance Wizard.

After the EC2 instance is running, SSH into the host and run the following:


sudo yum install go
go get -u github.com/rakyll/hey


“hey” is easy to use. For these tests, specify a total number of tests (5,000) and a concurrency of 50 against the API Gateway URL as follows(replace the URL here with your own):


$./go/bin/hey -n 5000 -c 50 https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01  The output from “hey” tells you interesting bits of information: $ ./go/bin/hey -n 5000 -c 50 https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01

Summary:
Total: 381.9978 secs
Slowest: 9.4765 secs
Fastest: 0.0438 secs
Average: 3.2153 secs
Requests/sec: 13.0891
Total data: 140024 bytes
Size/request: 28 bytes

Response time histogram:
0.044 [1] |
0.987 [2] |
1.930 [0] |
2.874 [1803] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
3.817 [1518] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
4.760 [719] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
5.703 [917] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
6.647 [13] |
7.590 [14] |
8.533 [9] |
9.477 [4] |

Latency distribution:
10% in 2.0224 secs
25% in 2.0267 secs
50% in 3.0251 secs
75% in 4.0269 secs
90% in 5.0279 secs
95% in 5.0414 secs
99% in 5.1871 secs

Details (average, fastest, slowest):
DNS+dialup: 0.0003 secs, 0.0000 secs, 0.0332 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0046 secs
req write: 0.0000 secs, 0.0000 secs, 0.0005 secs
resp wait: 3.2149 secs, 0.0438 secs, 9.4472 secs
resp read: 0.0000 secs, 0.0000 secs, 0.0004 secs

Status code distribution:
[200] 4997 responses
[502] 3 responses


You can see a helpful histogram and latency distribution. Remember that this Lambda function has a random sleep period in it and so isn’t entirely representational of a real-life workload. Those three 502s warrant digging deeper, but could be due to Lambda cold-start timing and the “second” variable being the maximum of 5, causing the Lambda functions to time out. AWS X-Ray and the Amazon CloudWatch logs generated by both API Gateway and Lambda could help you troubleshoot this.

### Configuring a concurrency reservation

Now that you’ve established that you can generate this load against the function, I show you how to limit it and protect a backend resource from being overloaded by all of these requests.

1. In the console, choose Configure.
2. Under Concurrency, for Reserve concurrency, enter 25.

1. Click on Save in the top right corner.

You could also set this with the AWS CLI using the Lambda put-function-concurrency command or see your current concurrency configuration via Lambda get-function. Here’s an example command:


$aws lambda get-function --function-name concurrencyBlog01 --output json --query Concurrency { "ReservedConcurrentExecutions": 25 }  Either way, you’ve set the Concurrency Reservation to 25 for this function. This acts as both a limit and a reservation in terms of making sure that you can execute 25 concurrent functions at all times. Going above this results in the throttling of the Lambda function. Depending on the invoking service, throttling can result in a number of different outcomes, as shown in the documentation on Throttling Behavior. This change has also reduced your unreserved account concurrency for other functions by 25. Rerun the same load generation as before and see what happens. Previously, you tested at 50 concurrency, which worked just fine. By limiting the Lambda functions to 25 concurrency, you should see rate limiting kick in. Run the same test again: $ ./go/bin/hey -n 5000 -c 50 https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01


While this test runs, refresh the Monitoring tab on your function detail page. You see the following warning message:

This is great! It means that your throttle is working as configured and you are now protecting your downstream resources from too much load from your Lambda function.

Here is the output from a new “hey” command:


$./go/bin/hey -n 5000 -c 50 https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01 Summary: Total: 379.9922 secs Slowest: 7.1486 secs Fastest: 0.0102 secs Average: 1.1897 secs Requests/sec: 13.1582 Total data: 164608 bytes Size/request: 32 bytes Response time histogram: 0.010 [1] | 0.724 [3075] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 1.438 [0] | 2.152 [811] |∎∎∎∎∎∎∎∎∎∎∎ 2.866 [11] | 3.579 [566] |∎∎∎∎∎∎∎ 4.293 [214] |∎∎∎ 5.007 [1] | 5.721 [315] |∎∎∎∎ 6.435 [4] | 7.149 [2] | Latency distribution: 10% in 0.0130 secs 25% in 0.0147 secs 50% in 0.0205 secs 75% in 2.0344 secs 90% in 4.0229 secs 95% in 5.0248 secs 99% in 5.0629 secs Details (average, fastest, slowest): DNS+dialup: 0.0004 secs, 0.0000 secs, 0.0537 secs DNS-lookup: 0.0002 secs, 0.0000 secs, 0.0184 secs req write: 0.0000 secs, 0.0000 secs, 0.0016 secs resp wait: 1.1892 secs, 0.0101 secs, 7.1038 secs resp read: 0.0000 secs, 0.0000 secs, 0.0005 secs Status code distribution: [502] 3076 responses [200] 1924 responses  This looks fairly different from the last load test run. A large percentage of these requests failed fast due to the concurrency throttle failing them (those with the 0.724 seconds line). The timing shown here in the histogram represents the entire time it took to get a response between the EC2 instance and API Gateway calling Lambda and being rejected. It’s also important to note that this example was configured with an edge-optimized endpoint in API Gateway. You see under Status code distribution that 3076 of the 5000 requests failed with a 502, showing that the backend service from API Gateway and Lambda failed the request. ## Other uses Managing function concurrency can be useful in a few other ways beyond just limiting the impact on downstream services and providing a reservation of concurrency capacity. Here are two other uses: • Emergency kill switch • Cost controls ### Emergency kill switch On occasion, due to issues with applications I’ve managed in the past, I’ve had a need to disable a certain function or capability of an application. By setting the concurrency reservation and limit of a Lambda function to zero, you can do just that. With the reservation set to zero every invocation of a Lambda function results in being throttled. You could then work on the related parts of the infrastructure or application that aren’t working, and then reconfigure the concurrency limit to allow invocations again. ### Cost controls While I mentioned how you might want to use concurrency limits to control the downstream impact to services or databases that your Lambda function might call, another resource that you might be cautious about is money. Setting the concurrency throttle is another way to help control costs during development and testing of your application. You might want to prevent against a function performing a recursive action too quickly or a development workload generating too high of a concurrency. You might also want to protect development resources connected to this function from generating too much cost, such as APIs that your Lambda function calls. ## Conclusion Concurrent executions as a unit of scale are a fairly unique characteristic about Lambda functions. Placing limits on how many concurrency “slices” that your function can consume can prevent a single function from consuming all of the available concurrency in an account. Limits can also prevent a function from overwhelming a backend resource that isn’t as scalable. Unlike monolithic applications or even microservices where there are mixed capabilities in a single service, Lambda functions encourage a sort of “nano-service” of small business logic directly related to the integration model connected to the function. I hope you’ve enjoyed this post and configure your concurrency limits today! # Looking Forward to 2018 Post Syndicated from Let's Encrypt - Free SSL/TLS Certificates original https://letsencrypt.org//2017/12/07/looking-forward-to-2018.html Let’s Encrypt had a great year in 2017. We more than doubled the number of active (unexpired) certificates we service to 46 million, we just about tripled the number of unique domains we service to 61 million, and we did it all while maintaining a stellar security and compliance track record. Most importantly though, the Web went from 46% encrypted page loads to 67% according to statistics from Mozilla – a gain of 21% in a single year – incredible. We’re proud to have contributed to that, and we’d like to thank all of the other people and organizations who also worked hard to create a more secure and privacy-respecting Web. While we’re proud of what we accomplished in 2017, we are spending most of the final quarter of the year looking forward rather than back. As we wrap up our own planning process for 2018, I’d like to share some of our plans with you, including both the things we’re excited about and the challenges we’ll face. We’ll cover service growth, new features, infrastructure, and finances. # Service Growth We are planning to double the number of active certificates and unique domains we service in 2018, to 90 million and 120 million, respectively. This anticipated growth is due to continuing high expectations for HTTPS growth in general in 2018. Let’s Encrypt helps to drive HTTPS adoption by offering a free, easy to use, and globally available option for obtaining the certificates required to enable HTTPS. HTTPS adoption on the Web took off at an unprecedented rate from the day Let’s Encrypt launched to the public. One of the reasons Let’s Encrypt is so easy to use is that our community has done great work making client software that works well for a wide variety of platforms. We’d like to thank everyone involved in the development of over 60 client software options for Let’s Encrypt. We’re particularly excited that support for the ACME protocol and Let’s Encrypt is being added to the Apache httpd server. Other organizations and communities are also doing great work to promote HTTPS adoption, and thus stimulate demand for our services. For example, browsers are starting to make their users more aware of the risks associated with unencrypted HTTP (e.g. Firefox, Chrome). Many hosting providers and CDNs are making it easier than ever for all of their customers to use HTTPS. Government agencies are waking up to the need for stronger security to protect constituents. The media community is working to Secure the News. # New Features We’ve got some exciting features planned for 2018. First, we’re planning to introduce an ACME v2 protocol API endpoint and support for wildcard certificates along with it. Wildcard certificates will be free and available globally just like our other certificates. We are planning to have a public test API endpoint up by January 4, and we’ve set a date for the full launch: Tuesday, February 27. Later in 2018 we plan to introduce ECDSA root and intermediate certificates. ECDSA is generally considered to be the future of digital signature algorithms on the Web due to the fact that it is more efficient than RSA. Let’s Encrypt will currently sign ECDSA keys from subscribers, but we sign with the RSA key from one of our intermediate certificates. Once we have an ECDSA root and intermediates, our subscribers will be able to deploy certificate chains which are entirely ECDSA. # Infrastructure Our CA infrastructure is capable of issuing millions of certificates per day with multiple redundancy for stability and a wide variety of security safeguards, both physical and logical. Our infrastructure also generates and signs nearly 20 million OCSP responses daily, and serves those responses nearly 2 billion times per day. We expect issuance and OCSP numbers to double in 2018. Our physical CA infrastructure currently occupies approximately 70 units of rack space, split between two datacenters, consisting primarily of compute servers, storage, HSMs, switches, and firewalls. When we issue more certificates it puts the most stress on storage for our databases. We regularly invest in more and faster storage for our database servers, and that will continue in 2018. We’ll need to add a few additional compute servers in 2018, and we’ll also start aging out hardware in 2018 for the first time since we launched. We’ll age out about ten 2u compute servers and replace them with new 1u servers, which will save space and be more energy efficient while providing better reliability and performance. We’ll also add another infrastructure operations staff member, bringing that team to a total of six people. This is necessary in order to make sure we can keep up with demand while maintaining a high standard for security and compliance. Infrastructure operations staff are systems administrators responsible for building and maintaining all physical and logical CA infrastructure. The team also manages a 24/7/365 on-call schedule and they are primary participants in both security and compliance audits. # Finances We pride ourselves on being an efficient organization. In 2018 Let’s Encrypt will secure a large portion of the Web with a budget of only$3.0M. For an overall increase in our budget of only 13%, we will be able to issue and service twice as many certificates as we did in 2017. We believe this represents an incredible value and that contributing to Let’s Encrypt is one of the most effective ways to help create a more secure and privacy-respecting Web.

Our 2018 fundraising efforts are off to a strong start with Platinum sponsorships from Mozilla, Akamai, OVH, Cisco, Google Chrome and the Electronic Frontier Foundation. The Ford Foundation has renewed their grant to Let’s Encrypt as well. We are seeking additional sponsorship and grant assistance to meet our full needs for 2018.

We had originally budgeted $2.91M for 2017 but we’ll likely come in under budget for the year at around$2.65M. The difference between our 2017 expenses of $2.65M and the 2018 budget of$3.0M consists primarily of the additional infrastructure operations costs previously mentioned.

# Support Let’s Encrypt

We depend on contributions from our community of users and supporters in order to provide our services. If your company or organization would like to sponsor Let’s Encrypt please email us at [email protected]. We ask that you make an individual contribution if it is within your means.

We’re grateful for the industry and community support that we receive, and we look forward to continuing to create a more secure and privacy-respecting Web!