Tag Archives: Useful

How to Manage Amazon GuardDuty Security Findings Across Multiple Accounts

Post Syndicated from Tom Stickle original https://aws.amazon.com/blogs/security/how-to-manage-amazon-guardduty-security-findings-across-multiple-accounts/

Introduced at AWS re:Invent 2017, Amazon GuardDuty is a managed threat detection service that continuously monitors for malicious or unauthorized behavior to help you protect your AWS accounts and workloads. In an AWS Blog post, Jeff Barr shows you how to enable GuardDuty to monitor your AWS resources continuously. That blog post shows how to get started with a single GuardDuty account and provides an overview of the features of the service. Your security team, though, will probably want to use GuardDuty to monitor a group of AWS accounts continuously.

In this post, I demonstrate how to use GuardDuty to monitor a group of AWS accounts and have their findings routed to another AWS account—the master account—that is owned by a security team. The method I demonstrate in this post is especially useful if your security team is responsible for monitoring a group of AWS accounts over which it does not have direct access—known as member accounts. In this solution, I simplify the work needed to enable GuardDuty in member accounts and configure findings by simplifying the process, which I do by enabling GuardDuty in the master account and inviting member accounts.

Enable GuardDuty in a master account and invite member accounts

To get started, you must enable GuardDuty in the master account, which will receive GuardDuty findings. The master account should be managed by your security team, and it will display the findings from all member accounts. The master account can be reverted later by removing any member accounts you add to it. Adding member accounts is a two-way handshake mechanism to ensure that administrators from both the master and member accounts formally agree to establish the relationship.

To enable GuardDuty in the master account and add member accounts:

  1. Navigate to the GuardDuty console.
  2. In the navigation pane, choose Accounts.
    Screenshot of the Accounts choice in the navigation pane
  1. To designate this account as the GuardDuty master account, start adding member accounts:
    • You can add individual accounts by choosing Add Account, or you can add a list of accounts by choosing Upload List (.csv).
  1. Now, add the account ID and email address of the member account, and choose Add. (If you are uploading a list of accounts, choose Browse, choose the .csv file with the member accounts [one email address and account ID per line], and choose Add accounts.)
    Screenshot of adding an account

For security reasons, AWS checks to make sure each account ID is valid and that you’ve entered each member account’s email address that was used to create the account. If a member account’s account ID and email address do not match, GuardDuty does not send an invitation.
Screenshot showing the Status of Invite

  1. After you add all the member accounts you want to add, you will see them listed in the Member accounts table with a Status of Invite. You don’t have to individually invite each account—you can choose a group of accounts and when you choose to invite one account in the group, all accounts are invited.
  2. When you choose Invite for each member account:
    1. AWS checks to make sure the account ID is valid and the email address provided is the email address of the member account.
    2. AWS sends an email to the member account email address with a link to the GuardDuty console, where the member account owner can accept the invitation. You can add a customized message from your security team. Account owners who receive the invitation must sign in to their AWS account to accept the invitation. The service also sends an invitation through the AWS Personal Health Dashboard in case the member email address is not monitored. This invitation appears in the member account under the AWS Personal Health Dashboard alert bell on the AWS Management Console.
    3. A pending-invitation indicator is shown on the GuardDuty console of the member account, as shown in the following screenshot.
      Screenshot showing the pending-invitation indicator

When the invitation is sent by email, it is sent to the account owner of the GuardDuty member account.
Screenshot of the invitation sent by email

The account owner can click the link in the email invitation or the AWS Personal Health Dashboard message, or the account owner can sign in to their account and navigate to the GuardDuty console. In all cases, the member account displays the pending invitation in the member account’s GuardDuty console with instructions for accepting the invitation. The GuardDuty console walks the account owner through accepting the invitation, including enabling GuardDuty if it is not already enabled.

If you prefer to work in the AWS CLI, you can enable GuardDuty and accept the invitation. To do this, call CreateDetector to enable GuardDuty, and then call AcceptInvitation, which serves the same purpose as accepting the invitation in the GuardDuty console.

  1. After the member account owner accepts the invitation, the Status in the master account is changed to Monitored. The status helps you track the status of each AWS account that you invite.
    Screenshot showing the Status change to Monitored

You have enabled GuardDuty on the member account, and all findings will be forwarded to the master account. You can now monitor the findings about GuardDuty member accounts from the GuardDuty console in the master account.

The member account owner can see GuardDuty findings by default and can control all aspects of the experience in the member account with AWS Identity and Access Management (IAM) permissions. Users with the appropriate permissions can end the multi-account relationship at any time by toggling the Accept button on the Accounts page. Note that ending the relationship changes the Status of the account to Resigned and also triggers a security finding on the side of the master account so that the security team knows the member account is no longer linked to the master account.

Working with GuardDuty findings

Most security teams have ticketing systems, chat operations, security information event management (SIEM) systems, or other security automation systems to which they would like to push GuardDuty findings. For this purpose, GuardDuty sends all findings as JSON-based messages through Amazon CloudWatch Events, a scalable service to which you can subscribe and to which AWS services can stream system events. To access these events, navigate to the CloudWatch Events console and create a rule that subscribes to the GuardDuty-related findings. You then can assign a target such as Amazon Kinesis Data Firehose that can place the findings in a number of services such as Amazon S3. The following screenshot is of the CloudWatch Events console, where I have a rule that pulls all events from GuardDuty and pushes them to a preconfigured AWS Lambda function.

Screenshot of a CloudWatch Events rule

The following example is a subset of GuardDuty findings that includes relevant context and information about the nature of a threat that was detected. In this example, the instanceId, i-00bb62b69b7004a4c, is performing Secure Shell (SSH) brute-force attacks against IP address 172.16.0.28. From a Lambda function, you can access any of the following fields such as the title of the finding and its description, and send those directly to your ticketing system.

Example GuardDuty findings

You can use other AWS services to build custom analytics and visualizations of your security findings. For example, you can connect Kinesis Data Firehose to CloudWatch Events and write events to an S3 bucket in a standard format, which can be encrypted with AWS Key Management Service and then compressed. You also can use Amazon QuickSight to build ad hoc dashboards by using AWS Glue and Amazon Athena. Similarly, you can place the data from Kinesis Data Firehose in Amazon Elasticsearch Service, with which you can use tools such as Kibana to build your own visualizations and dashboards.

Like most other AWS services, GuardDuty is a regional service. This means that when you enable GuardDuty in an AWS Region, all findings are generated and delivered in that region. If you are regulated by a compliance regime, this is often an important requirement to ensure that security findings remain in a specific jurisdiction. Because customers have let us know they would prefer to be able to enable GuardDuty globally and have all findings aggregated in one place, we intend to give the choice of regional or global isolation as we evolve this new service.

Summary

In this blog post, I have demonstrated how to use GuardDuty to monitor a group of GuardDuty member accounts and aggregate security findings in a central master GuardDuty account. You can use this solution whether or not you have direct control over the member accounts.

If you have comments about this blog post, submit them in the “Comments” section below. If you have questions about using GuardDuty, start a thread in the GuardDuty forum or contact AWS Support.

-Tom

Managing AWS Lambda Function Concurrency

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/managing-aws-lambda-function-concurrency/

One of the key benefits of serverless applications is the ease in which they can scale to meet traffic demands or requests, with little to no need for capacity planning. In AWS Lambda, which is the core of the serverless platform at AWS, the unit of scale is a concurrent execution. This refers to the number of executions of your function code that are happening at any given time.

Thinking about concurrent executions as a unit of scale is a fairly unique concept. In this post, I dive deeper into this and talk about how you can make use of per function concurrency limits in Lambda.

Understanding concurrency in Lambda

Instead of diving right into the guts of how Lambda works, here’s an appetizing analogy: a magical pizza.
Yes, a magical pizza!

This magical pizza has some unique properties:

  • It has a fixed maximum number of slices, such as 8.
  • Slices automatically re-appear after they are consumed.
  • When you take a slice from the pizza, it does not re-appear until it has been completely consumed.
  • One person can take multiple slices at a time.
  • You can easily ask to have the number of slices increased, but they remain fixed at any point in time otherwise.

Now that the magical pizza’s properties are defined, here’s a hypothetical situation of some friends sharing this pizza.

Shawn, Kate, Daniela, Chuck, Ian and Avleen get together every Friday to share a pizza and catch up on their week. As there is just six of them, they can easily all enjoy a slice of pizza at a time. As they finish each slice, it re-appears in the pizza pan and they can take another slice again. Given the magical properties of their pizza, they can continue to eat all they want, but with two very important constraints:

  • If any of them take too many slices at once, the others may not get as much as they want.
  • If they take too many slices, they might also eat too much and get sick.

One particular week, some of the friends are hungrier than the rest, taking two slices at a time instead of just one. If more than two of them try to take two pieces at a time, this can cause contention for pizza slices. Some of them would wait hungry for the slices to re-appear. They could ask for a pizza with more slices, but then run the same risk again later if more hungry friends join than planned for.

What can they do?

If the friends agreed to accept a limit for the maximum number of slices they each eat concurrently, both of these issues are avoided. Some could have a maximum of 2 of the 8 slices, or other concurrency limits that were more or less. Just so long as they kept it at or under eight total slices to be eaten at one time. This would keep any from going hungry or eating too much. The six friends can happily enjoy their magical pizza without worry!

Concurrency in Lambda

Concurrency in Lambda actually works similarly to the magical pizza model. Each AWS Account has an overall AccountLimit value that is fixed at any point in time, but can be easily increased as needed, just like the count of slices in the pizza. As of May 2017, the default limit is 1000 “slices” of concurrency per AWS Region.

Also like the magical pizza, each concurrency “slice” can only be consumed individually one at a time. After consumption, it becomes available to be consumed again. Services invoking Lambda functions can consume multiple slices of concurrency at the same time, just like the group of friends can take multiple slices of the pizza.

Let’s take our example of the six friends and bring it back to AWS services that commonly invoke Lambda:

  • Amazon S3
  • Amazon Kinesis
  • Amazon DynamoDB
  • Amazon Cognito

In a single account with the default concurrency limit of 1000 concurrent executions, any of these four services could invoke enough functions to consume the entire limit or some part of it. Just like with the pizza example, there is the possibility for two issues to pop up:

  • One or more of these services could invoke enough functions to consume a majority of the available concurrency capacity. This could cause others to be starved for it, causing failed invocations.
  • A service could consume too much concurrent capacity and cause a downstream service or database to be overwhelmed, which could cause failed executions.

For Lambda functions that are launched in a VPC, you have the potential to consume the available IP addresses in a subnet or the maximum number of elastic network interfaces to which your account has access. For more information, see Configuring a Lambda Function to Access Resources in an Amazon VPC. For information about elastic network interface limits, see Network Interfaces section in the Amazon VPC Limits topic.

One way to solve both of these problems is applying a concurrency limit to the Lambda functions in an account.

Configuring per function concurrency limits

You can now set a concurrency limit on individual Lambda functions in an account. The concurrency limit that you set reserves a portion of your account level concurrency for a given function. All of your functions’ concurrent executions count against this account-level limit by default.

If you set a concurrency limit for a specific function, then that function’s concurrency limit allocation is deducted from the shared pool and assigned to that specific function. AWS also reserves 100 units of concurrency for all functions that don’t have a specified concurrency limit set. This helps to make sure that future functions have capacity to be consumed.

Going back to the example of the consuming services, you could set throttles for the functions as follows:

Amazon S3 function = 350
Amazon Kinesis function = 200
Amazon DynamoDB function = 200
Amazon Cognito function = 150
Total = 900

With the 100 reserved for all non-concurrency reserved functions, this totals the account limit of 1000.

Here’s how this works. To start, create a basic Lambda function that is invoked via Amazon API Gateway. This Lambda function returns a single “Hello World” statement with an added sleep time between 2 and 5 seconds. The sleep time simulates an API providing some sort of capability that can take a varied amount of time. The goal here is to show how an API that is underloaded can reach its concurrency limit, and what happens when it does.
To create the example function

  1. Open the Lambda console.
  2. Choose Create Function.
  3. For Author from scratch, enter the following values:
    1. For Name, enter a value (such as concurrencyBlog01).
    2. For Runtime, choose Python 3.6.
    3. For Role, choose Create new role from template and enter a name aligned with this function, such as concurrencyBlogRole.
  4. Choose Create function.
  5. The function is created with some basic example code. Replace that code with the following:

import time
from random import randint
seconds = randint(2, 5)

def lambda_handler(event, context):
time.sleep(seconds)
return {"statusCode": 200,
"body": ("Hello world, slept " + str(seconds) + " seconds"),
"headers":
{
"Access-Control-Allow-Headers": "Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token",
"Access-Control-Allow-Methods": "GET,OPTIONS",
}}

  1. Under Basic settings, set Timeout to 10 seconds. While this function should only ever take up to 5-6 seconds (with the 5-second max sleep), this gives you a little bit of room if it takes longer.

  1. Choose Save at the top right.

At this point, your function is configured for this example. Test it and confirm this in the console:

  1. Choose Test.
  2. Enter a name (it doesn’t matter for this example).
  3. Choose Create.
  4. In the console, choose Test again.
  5. You should see output similar to the following:

Now configure API Gateway so that you have an HTTPS endpoint to test against.

  1. In the Lambda console, choose Configuration.
  2. Under Triggers, choose API Gateway.
  3. Open the API Gateway icon now shown as attached to your Lambda function:

  1. Under Configure triggers, leave the default values for API Name and Deployment stage. For Security, choose Open.
  2. Choose Add, Save.

API Gateway is now configured to invoke Lambda at the Invoke URL shown under its configuration. You can take this URL and test it in any browser or command line, using tools such as “curl”:


$ curl https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01
Hello world, slept 2 seconds

Throwing load at the function

Now start throwing some load against your API Gateway + Lambda function combo. Right now, your function is only limited by the total amount of concurrency available in an account. For this example account, you might have 850 unreserved concurrency out of a full account limit of 1000 due to having configured a few concurrency limits already (also the 100 concurrency saved for all functions without configured limits). You can find all of this information on the main Dashboard page of the Lambda console:

For generating load in this example, use an open source tool called “hey” (https://github.com/rakyll/hey), which works similarly to ApacheBench (ab). You test from an Amazon EC2 instance running the default Amazon Linux AMI from the EC2 console. For more help with configuring an EC2 instance, follow the steps in the Launch Instance Wizard.

After the EC2 instance is running, SSH into the host and run the following:


sudo yum install go
go get -u github.com/rakyll/hey

“hey” is easy to use. For these tests, specify a total number of tests (5,000) and a concurrency of 50 against the API Gateway URL as follows(replace the URL here with your own):


$ ./go/bin/hey -n 5000 -c 50 https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01

The output from “hey” tells you interesting bits of information:


$ ./go/bin/hey -n 5000 -c 50 https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01

Summary:
Total: 381.9978 secs
Slowest: 9.4765 secs
Fastest: 0.0438 secs
Average: 3.2153 secs
Requests/sec: 13.0891
Total data: 140024 bytes
Size/request: 28 bytes

Response time histogram:
0.044 [1] |
0.987 [2] |
1.930 [0] |
2.874 [1803] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
3.817 [1518] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
4.760 [719] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
5.703 [917] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
6.647 [13] |
7.590 [14] |
8.533 [9] |
9.477 [4] |

Latency distribution:
10% in 2.0224 secs
25% in 2.0267 secs
50% in 3.0251 secs
75% in 4.0269 secs
90% in 5.0279 secs
95% in 5.0414 secs
99% in 5.1871 secs

Details (average, fastest, slowest):
DNS+dialup: 0.0003 secs, 0.0000 secs, 0.0332 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0046 secs
req write: 0.0000 secs, 0.0000 secs, 0.0005 secs
resp wait: 3.2149 secs, 0.0438 secs, 9.4472 secs
resp read: 0.0000 secs, 0.0000 secs, 0.0004 secs

Status code distribution:
[200] 4997 responses
[502] 3 responses

You can see a helpful histogram and latency distribution. Remember that this Lambda function has a random sleep period in it and so isn’t entirely representational of a real-life workload. Those three 502s warrant digging deeper, but could be due to Lambda cold-start timing and the “second” variable being the maximum of 5, causing the Lambda functions to time out. AWS X-Ray and the Amazon CloudWatch logs generated by both API Gateway and Lambda could help you troubleshoot this.

Configuring a concurrency reservation

Now that you’ve established that you can generate this load against the function, I show you how to limit it and protect a backend resource from being overloaded by all of these requests.

  1. In the console, choose Configure.
  2. Under Concurrency, for Reserve concurrency, enter 25.

  1. Click on Save in the top right corner.

You could also set this with the AWS CLI using the Lambda put-function-concurrency command or see your current concurrency configuration via Lambda get-function. Here’s an example command:


$ aws lambda get-function --function-name concurrencyBlog01 --output json --query Concurrency
{
"ReservedConcurrentExecutions": 25
}

Either way, you’ve set the Concurrency Reservation to 25 for this function. This acts as both a limit and a reservation in terms of making sure that you can execute 25 concurrent functions at all times. Going above this results in the throttling of the Lambda function. Depending on the invoking service, throttling can result in a number of different outcomes, as shown in the documentation on Throttling Behavior. This change has also reduced your unreserved account concurrency for other functions by 25.

Rerun the same load generation as before and see what happens. Previously, you tested at 50 concurrency, which worked just fine. By limiting the Lambda functions to 25 concurrency, you should see rate limiting kick in. Run the same test again:


$ ./go/bin/hey -n 5000 -c 50 https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01

While this test runs, refresh the Monitoring tab on your function detail page. You see the following warning message:

This is great! It means that your throttle is working as configured and you are now protecting your downstream resources from too much load from your Lambda function.

Here is the output from a new “hey” command:


$ ./go/bin/hey -n 5000 -c 50 https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01
Summary:
Total: 379.9922 secs
Slowest: 7.1486 secs
Fastest: 0.0102 secs
Average: 1.1897 secs
Requests/sec: 13.1582
Total data: 164608 bytes
Size/request: 32 bytes

Response time histogram:
0.010 [1] |
0.724 [3075] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1.438 [0] |
2.152 [811] |∎∎∎∎∎∎∎∎∎∎∎
2.866 [11] |
3.579 [566] |∎∎∎∎∎∎∎
4.293 [214] |∎∎∎
5.007 [1] |
5.721 [315] |∎∎∎∎
6.435 [4] |
7.149 [2] |

Latency distribution:
10% in 0.0130 secs
25% in 0.0147 secs
50% in 0.0205 secs
75% in 2.0344 secs
90% in 4.0229 secs
95% in 5.0248 secs
99% in 5.0629 secs

Details (average, fastest, slowest):
DNS+dialup: 0.0004 secs, 0.0000 secs, 0.0537 secs
DNS-lookup: 0.0002 secs, 0.0000 secs, 0.0184 secs
req write: 0.0000 secs, 0.0000 secs, 0.0016 secs
resp wait: 1.1892 secs, 0.0101 secs, 7.1038 secs
resp read: 0.0000 secs, 0.0000 secs, 0.0005 secs

Status code distribution:
[502] 3076 responses
[200] 1924 responses

This looks fairly different from the last load test run. A large percentage of these requests failed fast due to the concurrency throttle failing them (those with the 0.724 seconds line). The timing shown here in the histogram represents the entire time it took to get a response between the EC2 instance and API Gateway calling Lambda and being rejected. It’s also important to note that this example was configured with an edge-optimized endpoint in API Gateway. You see under Status code distribution that 3076 of the 5000 requests failed with a 502, showing that the backend service from API Gateway and Lambda failed the request.

Other uses

Managing function concurrency can be useful in a few other ways beyond just limiting the impact on downstream services and providing a reservation of concurrency capacity. Here are two other uses:

  • Emergency kill switch
  • Cost controls

Emergency kill switch

On occasion, due to issues with applications I’ve managed in the past, I’ve had a need to disable a certain function or capability of an application. By setting the concurrency reservation and limit of a Lambda function to zero, you can do just that.

With the reservation set to zero every invocation of a Lambda function results in being throttled. You could then work on the related parts of the infrastructure or application that aren’t working, and then reconfigure the concurrency limit to allow invocations again.

Cost controls

While I mentioned how you might want to use concurrency limits to control the downstream impact to services or databases that your Lambda function might call, another resource that you might be cautious about is money. Setting the concurrency throttle is another way to help control costs during development and testing of your application.

You might want to prevent against a function performing a recursive action too quickly or a development workload generating too high of a concurrency. You might also want to protect development resources connected to this function from generating too much cost, such as APIs that your Lambda function calls.

Conclusion

Concurrent executions as a unit of scale are a fairly unique characteristic about Lambda functions. Placing limits on how many concurrency “slices” that your function can consume can prevent a single function from consuming all of the available concurrency in an account. Limits can also prevent a function from overwhelming a backend resource that isn’t as scalable.

Unlike monolithic applications or even microservices where there are mixed capabilities in a single service, Lambda functions encourage a sort of “nano-service” of small business logic directly related to the integration model connected to the function. I hope you’ve enjoyed this post and configure your concurrency limits today!

MQTT 5: Introduction to MQTT 5

Post Syndicated from The HiveMQ Team original https://www.hivemq.com/blog/mqtt-5-introduction-to-mqtt-5/

MQTT 5 Introduction

Introduction to MQTT 5

Welcome to our brand new blog post series MQTT 5 – Features and Hidden Gems. Without doubt, the MQTT protocol is the most popular and best received Internet of Things protocol as of today (see the Google Trends Chart below), supporting large scale use cases ranging from Connected Cars, Manufacturing Systems, Logistics, Military Use Cases to Enterprise Chat Applications, Mobile Apps and connecting constrained IoT devices. Of course, with huge amounts of production deployments, the wish list for future versions of the MQTT protocol grew bigger and bigger.

MQTT 5 is by far the most extensive and most feature-rich update to the MQTT protocol specification ever. We are going to explore all hidden gems and protocol features with use case discussion and useful background information – one blog post at a time.

Be sure to read the MQTT Essentials Blog Post series first before diving into our new MQTT 5 series. To get the most out of the new blog posts, it’s important to have a basic understanding of the MQTT 3.1.1 protocol as we are going to highlight key changes as well as all improvements.

The Raspberry Pi Christmas shopping list 2017

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/christmas-shopping-list-2017/

Looking for the perfect Christmas gift for a beloved maker in your life? Maybe you’d like to give a relative or friend a taste of the world of coding and Raspberry Pi? Whatever you’re looking for, the Raspberry Pi Christmas shopping list will point you in the right direction.

An ice-skating Raspberry Pi - The Raspberry Pi Christmas Shopping List 2017

For those getting started

Thinking about introducing someone special to the wonders of Raspberry Pi during the holidays? Although you can set up your Pi with peripherals from around your home, such as a mobile phone charger, your PC’s keyboard, and the old mouse dwelling in an office drawer, a starter kit is a nice all-in-one package for the budding coder.



Check out the starter kits from Raspberry Pi Approved Resellers such as Pimoroni, The Pi Hut, ModMyPi, Adafruit, CanaKit…the list is pretty long. Our products page will direct you to your closest reseller, or you can head to element14 to pick up the official Raspberry Pi Starter Kit.



You can also buy the Raspberry Pi Press’s brand-new Raspberry Pi Beginners Book, which includes a Raspberry Pi Zero W, a case, a ready-made SD card, and adapter cables.

Once you’ve presented a lucky person with their first Raspberry Pi, it’s time for them to spread their maker wings and learn some new skills.

MagPi Essentials books - The Raspberry Pi Christmas Shopping List 2017

To help them along, you could pick your favourite from among the Official Projects Book volume 3, The MagPi Essentials guides, and the brand-new third edition of Carrie Anne Philbin’s Adventures in Raspberry Pi. (She is super excited about this new edition!)

And you can always add a link to our free resources on the gift tag.

For the maker in your life

If you’re looking for something for a confident digital maker, you can’t go wrong with adding to their arsenal of electric and electronic bits and bobs that are no doubt cluttering drawers and boxes throughout their house.



Components such as servomotors, displays, and sensors are staples of the maker world. And when it comes to jumper wires, buttons, and LEDs, one can never have enough.



You could also consider getting your person a soldering iron, some helpings hands, or small tools such as a Dremel or screwdriver set.

And to make their life a little less messy, pop it all inside a Really Useful Box…because they’re really useful.



For kit makers

While some people like to dive into making head-first and to build whatever comes to mind, others enjoy working with kits.



The Naturebytes kit allows you to record the animal visitors of your garden with the help of a camera and a motion sensor. Footage of your local badgers, birds, deer, and more will be saved to an SD card, or tweeted or emailed to you if it’s in range of WiFi.

Cortec Tiny 4WD - The Raspberry Pi Christmas Shopping List 2017

Coretec’s Tiny 4WD is a kit for assembling a Pi Zero–powered remote-controlled robot at home. Not only is the robot adorable, building it also a great introduction to motors and wireless control.



Bare Conductive’s Touch Board Pro Kit offers everything you need to create interactive electronics projects using conductive paint.

Pi Hut Arcade Kit - The Raspberry Pi Christmas Shopping List 2017

Finally, why not help your favourite maker create their own gaming arcade using the Arcade Building Kit from The Pi Hut?

For the reader

For those who like to curl up with a good read, or spend too much of their day on public transport, a book or magazine subscription is the perfect treat.

For makers, hackers, and those interested in new technologies, our brand-new HackSpace magazine and the ever popular community magazine The MagPi are ideal. Both are available via a physical or digital subscription, and new subscribers to The MagPi also receive a free Raspberry Pi Zero W plus case.

Cover of CoderDojo Nano Make your own game

Marc Scott Beginner's Guide to Coding Book

You can also check out other publications from the Raspberry Pi family, including CoderDojo’s new CoderDojo Nano: Make Your Own Game, Eben Upton and Gareth Halfacree’s Raspberry Pi User Guide, and Marc Scott’s A Beginner’s Guide to Coding. And have I mentioned Carrie Anne’s Adventures in Raspberry Pi yet?

Stocking fillers for everyone

Looking for something small to keep your loved ones occupied on Christmas morning? Or do you have to buy a Secret Santa gift for the office tech? Here are some wonderful stocking fillers to fill your boots with this season.

Pi Hut 3D Christmas Tree - The Raspberry Pi Christmas Shopping List 2017

The Pi Hut 3D Xmas Tree: available as both a pre-soldered and a DIY version, this gadget will work with any 40-pin Raspberry Pi and allows you to create your own mini light show.



Google AIY Voice kit: build your own home assistant using a Raspberry Pi, the MagPi Essentials guide, and this brand-new kit. “Google, play Mariah Carey again…”



Pimoroni’s Raspberry Pi Zero W Project Kits offer everything you need, including the Pi, to make your own time-lapse cameras, music players, and more.



The official Raspberry Pi Sense HAT, Camera Module, and cases for the Pi 3 and Pi Zero will complete the collection of any Raspberry Pi owner, while also opening up exciting project opportunities.

STEAM gifts that everyone will love

Awesome Astronauts | Building LEGO’s Women of NASA!

LEGO Idea’s bought out this amazing ‘Women of NASA’ set, and I thought it would be fun to build, play and learn from these inspiring women! First up, let’s discover a little more about Sally Ride and Mae Jemison, two AWESOME ASTRONAUTS!

Treat the kids, and big kids, in your life to the newest LEGO Ideas set, the Women of NASA — starring Nancy Grace Roman, Margaret Hamilton, Sally Ride, and Mae Jemison!



Explore the world of wearables with Pimoroni’s sewable, hackable, wearable, adorable Bearables kits.



Add lights and motors to paper creations with the Activating Origami Kit, available from The Pi Hut.




We all loved Hidden Figures, and the STEAM enthusiast you know will do too. The film’s available on DVD, and you can also buy the original book, along with other fascinating non-fiction such as Rebecca Skloot’s The Immortal Life of Henrietta Lacks, Rachel Ignotofsky’s Women in Science, and Sydney Padua’s (mostly true) The Thrilling Adventures of Lovelace and Babbage.

Have we missed anything?

With so many amazing kits, HATs, and books available from members of the Raspberry Pi community, it’s hard to only pick a few. Have you found something splendid for the maker in your life? Maybe you’ve created your own kit that uses the Raspberry Pi? Share your favourites with us in the comments below or via our social media accounts.

The post The Raspberry Pi Christmas shopping list 2017 appeared first on Raspberry Pi.

Newly Updated Whitepaper: FERPA Compliance on AWS

Post Syndicated from Chris Gile original https://aws.amazon.com/blogs/security/newly-updated-whitepaper-ferpa-compliance-on-aws/

One of the main tenets of the Family Educational Rights and Privacy Act (FERPA) is the protection of student education records, including personally identifiable information (PII) and directory information. We recently updated our FERPA Compliance on AWS whitepaper to include AWS service-specific guidance for 24 AWS services. The whitepaper describes how these services can be used to help secure protected data. In conjunction with more detailed service-specific documentation, this updated information helps make it easier for you to plan, deploy, and operate secure environments to meet your compliance requirements in the AWS Cloud.

The updated whitepaper is especially useful for educational institutions and their vendors who need to understand:

  • AWS’s Shared Responsibility Model.
  • How AWS services can be used to help deploy educational and PII workloads securely in the AWS Cloud.
  • Key security disciplines in a security program to help you run a FERPA-compliant program (such as auditing, data destruction, and backup and disaster recovery).

In a related effort to help you secure PII, we also added to the whitepaper a mapping of NIST SP 800-122, which provides guidance for protecting PII, as well as a link to our NIST SP 800-53 Quick Start, a CloudFormation template that automatically configures AWS resources and deploys a multi-tier, Linux-based web application. To learn how this Quick Start works, see the Automate NIST Compliance in AWS GovCloud (US) with AWS Quick Start Tools video. The template helps you streamline and automate secure baselines in AWS—from initial design to operational security readiness—by incorporating the expertise of AWS security and compliance subject matter experts.

For more information about AWS Compliance and FERPA or to request support for your organization, contact your AWS account manager.

– Chris Gile, Senior Manager, AWS Security Assurance

Implementing Dynamic ETL Pipelines Using AWS Step Functions

Post Syndicated from Tara Van Unen original https://aws.amazon.com/blogs/compute/implementing-dynamic-etl-pipelines-using-aws-step-functions/

This post contributed by:
Wangechi Dole, AWS Solutions Architect
Milan Krasnansky, ING, Digital Solutions Developer, SGK
Rian Mookencherry, Director – Product Innovation, SGK

Data processing and transformation is a common use case you see in our customer case studies and success stories. Often, customers deal with complex data from a variety of sources that needs to be transformed and customized through a series of steps to make it useful to different systems and stakeholders. This can be difficult due to the ever-increasing volume, velocity, and variety of data. Today, data management challenges cannot be solved with traditional databases.

Workflow automation helps you build solutions that are repeatable, scalable, and reliable. You can use AWS Step Functions for this. A great example is how SGK used Step Functions to automate the ETL processes for their client. With Step Functions, SGK has been able to automate changes within the data management system, substantially reducing the time required for data processing.

In this post, SGK shares the details of how they used Step Functions to build a robust data processing system based on highly configurable business transformation rules for ETL processes.

SGK: Building dynamic ETL pipelines

SGK is a subsidiary of Matthews International Corporation, a diversified organization focusing on brand solutions and industrial technologies. SGK’s Global Content Creation Studio network creates compelling content and solutions that connect brands and products to consumers through multiple assets including photography, video, and copywriting.

We were recently contracted to build a sophisticated and scalable data management system for one of our clients. We chose to build the solution on AWS to leverage advanced, managed services that help to improve the speed and agility of development.

The data management system served two main functions:

  1. Ingesting a large amount of complex data to facilitate both reporting and product funding decisions for the client’s global marketing and supply chain organizations.
  2. Processing the data through normalization and applying complex algorithms and data transformations. The system goal was to provide information in the relevant context—such as strategic marketing, supply chain, product planning, etc. —to the end consumer through automated data feeds or updates to existing ETL systems.

We were faced with several challenges:

  • Output data that needed to be refreshed at least twice a day to provide fresh datasets to both local and global markets. That constant data refresh posed several challenges, especially around data management and replication across multiple databases.
  • The complexity of reporting business rules that needed to be updated on a constant basis.
  • Data that could not be processed as contiguous blocks of typical time-series data. The measurement of the data was done across seasons (that is, combination of dates), which often resulted with up to three overlapping seasons at any given time.
  • Input data that came from 10+ different data sources. Each data source ranged from 1–20K rows with as many as 85 columns per input source.

These challenges meant that our small Dev team heavily invested time in frequent configuration changes to the system and data integrity verification to make sure that everything was operating properly. Maintaining this system proved to be a daunting task and that’s when we turned to Step Functions—along with other AWS services—to automate our ETL processes.

Solution overview

Our solution included the following AWS services:

  • AWS Step Functions: Before Step Functions was available, we were using multiple Lambda functions for this use case and running into memory limit issues. With Step Functions, we can execute steps in parallel simultaneously, in a cost-efficient manner, without running into memory limitations.
  • AWS Lambda: The Step Functions state machine uses Lambda functions to implement the Task states. Our Lambda functions are implemented in Java 8.
  • Amazon DynamoDB provides us with an easy and flexible way to manage business rules. We specify our rules as Keys. These are key-value pairs stored in a DynamoDB table.
  • Amazon RDS: Our ETL pipelines consume source data from our RDS MySQL database.
  • Amazon Redshift: We use Amazon Redshift for reporting purposes because it integrates with our BI tools. Currently we are using Tableau for reporting which integrates well with Amazon Redshift.
  • Amazon S3: We store our raw input files and intermediate results in S3 buckets.
  • Amazon CloudWatch Events: Our users expect results at a specific time. We use CloudWatch Events to trigger Step Functions on an automated schedule.

Solution architecture

This solution uses a declarative approach to defining business transformation rules that are applied by the underlying Step Functions state machine as data moves from RDS to Amazon Redshift. An S3 bucket is used to store intermediate results. A CloudWatch Event rule triggers the Step Functions state machine on a schedule. The following diagram illustrates our architecture:

Here are more details for the above diagram:

  1. A rule in CloudWatch Events triggers the state machine execution on an automated schedule.
  2. The state machine invokes the first Lambda function.
  3. The Lambda function deletes all existing records in Amazon Redshift. Depending on the dataset, the Lambda function can create a new table in Amazon Redshift to hold the data.
  4. The same Lambda function then retrieves Keys from a DynamoDB table. Keys represent specific marketing campaigns or seasons and map to specific records in RDS.
  5. The state machine executes the second Lambda function using the Keys from DynamoDB.
  6. The second Lambda function retrieves the referenced dataset from RDS. The records retrieved represent the entire dataset needed for a specific marketing campaign.
  7. The second Lambda function executes in parallel for each Key retrieved from DynamoDB and stores the output in CSV format temporarily in S3.
  8. Finally, the Lambda function uploads the data into Amazon Redshift.

To understand the above data processing workflow, take a closer look at the Step Functions state machine for this example.

We walk you through the state machine in more detail in the following sections.

Walkthrough

To get started, you need to:

  • Create a schedule in CloudWatch Events
  • Specify conditions for RDS data extracts
  • Create Amazon Redshift input files
  • Load data into Amazon Redshift

Step 1: Create a schedule in CloudWatch Events
Create rules in CloudWatch Events to trigger the Step Functions state machine on an automated schedule. The following is an example cron expression to automate your schedule:

In this example, the cron expression invokes the Step Functions state machine at 3:00am and 2:00pm (UTC) every day.

Step 2: Specify conditions for RDS data extracts
We use DynamoDB to store Keys that determine which rows of data to extract from our RDS MySQL database. An example Key is MCS2017, which stands for, Marketing Campaign Spring 2017. Each campaign has a specific start and end date and the corresponding dataset is stored in RDS MySQL. A record in RDS contains about 600 columns, and each Key can represent up to 20K records.

A given day can have multiple campaigns with different start and end dates running simultaneously. In the following example DynamoDB item, three campaigns are specified for the given date.

The state machine example shown above uses Keys 31, 32, and 33 in the first ChoiceState and Keys 21 and 22 in the second ChoiceState. These keys represent marketing campaigns for a given day. For example, on Monday, there are only two campaigns requested. The ChoiceState with Keys 21 and 22 is executed. If three campaigns are requested on Tuesday, for example, then ChoiceState with Keys 31, 32, and 33 is executed. MCS2017 can be represented by Key 21 and Key 33 on Monday and Tuesday, respectively. This approach gives us the flexibility to add or remove campaigns dynamically.

Step 3: Create Amazon Redshift input files
When the state machine begins execution, the first Lambda function is invoked as the resource for FirstState, represented in the Step Functions state machine as follows:

"Comment": ” AWS Amazon States Language.", 
  "StartAt": "FirstState",
 
"States": { 
  "FirstState": {
   
"Type": "Task",
   
"Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Start",
    "Next": "ChoiceState" 
  } 

As described in the solution architecture, the purpose of this Lambda function is to delete existing data in Amazon Redshift and retrieve keys from DynamoDB. In our use case, we found that deleting existing records was more efficient and less time-consuming than finding the delta and updating existing records. On average, an Amazon Redshift table can contain about 36 million cells, which translates to roughly 65K records. The following is the code snippet for the first Lambda function in Java 8:

public class LambdaFunctionHandler implements RequestHandler<Map<String,Object>,Map<String,String>> {
    Map<String,String> keys= new HashMap<>();
    public Map<String, String> handleRequest(Map<String, Object> input, Context context){
       Properties config = getConfig(); 
       // 1. Cleaning Redshift Database
       new RedshiftDataService(config).cleaningTable(); 
       // 2. Reading data from Dynamodb
       List<String> keyList = new DynamoDBDataService(config).getCurrentKeys();
       for(int i = 0; i < keyList.size(); i++) {
           keys.put(”key" + (i+1), keyList.get(i)); 
       }
       keys.put(”key" + T,String.valueOf(keyList.size()));
       // 3. Returning the key values and the key count from the “for” loop
       return (keys);
}

The following JSON represents ChoiceState.

"ChoiceState": {
   "Type" : "Choice",
   "Choices": [ 
   {

      "Variable": "$.keyT",
     "StringEquals": "3",
     "Next": "CurrentThreeKeys" 
   }, 
   {

     "Variable": "$.keyT",
    "StringEquals": "2",
    "Next": "CurrentTwooKeys" 
   } 
 ], 
 "Default": "DefaultState"
}

The variable $.keyT represents the number of keys retrieved from DynamoDB. This variable determines which of the parallel branches should be executed. At the time of publication, Step Functions does not support dynamic parallel state. Therefore, choices under ChoiceState are manually created and assigned hardcoded StringEquals values. These values represent the number of parallel executions for the second Lambda function.

For example, if $.keyT equals 3, the second Lambda function is executed three times in parallel with keys, $key1, $key2 and $key3 retrieved from DynamoDB. Similarly, if $.keyT equals two, the second Lambda function is executed twice in parallel.  The following JSON represents this parallel execution:

"CurrentThreeKeys": { 
  "Type": "Parallel",
  "Next": "NextState",
  "Branches": [ 
  {

     "StartAt": “key31",
    "States": { 
       “key31": {

          "Type": "Task",
        "InputPath": "$.key1",
        "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
        "End": true 
       } 
    } 
  }, 
  {

     "StartAt": “key32",
    "States": { 
     “key32": {

        "Type": "Task",
       "InputPath": "$.key2",
         "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
       "End": true 
      } 
     } 
   }, 
   {

      "StartAt": “key33",
       "States": { 
          “key33": {

                "Type": "Task",
             "InputPath": "$.key3",
             "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
           "End": true 
       } 
     } 
    } 
  ] 
} 

Step 4: Load data into Amazon Redshift
The second Lambda function in the state machine extracts records from RDS associated with keys retrieved for DynamoDB. It processes the data then loads into an Amazon Redshift table. The following is code snippet for the second Lambda function in Java 8.

public class LambdaFunctionHandler implements RequestHandler<String, String> {
 public static String key = null;

public String handleRequest(String input, Context context) { 
   key=input; 
   //1. Getting basic configurations for the next classes + s3 client Properties
   config = getConfig();

   AmazonS3 s3 = AmazonS3ClientBuilder.defaultClient(); 
   // 2. Export query results from RDS into S3 bucket 
   new RdsDataService(config).exportDataToS3(s3,key); 
   // 3. Import query results from S3 bucket into Redshift 
    new RedshiftDataService(config).importDataFromS3(s3,key); 
   System.out.println(input); 
   return "SUCCESS"; 
 } 
}

After the data is loaded into Amazon Redshift, end users can visualize it using their preferred business intelligence tools.

Lessons learned

  • At the time of publication, the 1.5–GB memory hard limit for Lambda functions was inadequate for processing our complex workload. Step Functions gave us the flexibility to chunk our large datasets and process them in parallel, saving on costs and time.
  • In our previous implementation, we assigned each key a dedicated Lambda function along with CloudWatch rules for schedule automation. This approach proved to be inefficient and quickly became an operational burden. Previously, we processed each key sequentially, with each key adding about five minutes to the overall processing time. For example, processing three keys meant that the total processing time was three times longer. With Step Functions, the entire state machine executes in about five minutes.
  • Using DynamoDB with Step Functions gave us the flexibility to manage keys efficiently. In our previous implementations, keys were hardcoded in Lambda functions, which became difficult to manage due to frequent updates. DynamoDB is a great way to store dynamic data that changes frequently, and it works perfectly with our serverless architectures.

Conclusion

With Step Functions, we were able to fully automate the frequent configuration updates to our dataset resulting in significant cost savings, reduced risk to data errors due to system downtime, and more time for us to focus on new product development rather than support related issues. We hope that you have found the information useful and that it can serve as a jump-start to building your own ETL processes on AWS with managed AWS services.

For more information about how Step Functions makes it easy to coordinate the components of distributed applications and microservices in any workflow, see the use case examples and then build your first state machine in under five minutes in the Step Functions console.

If you have questions or suggestions, please comment below.

Cr3dOv3r – Credential Reuse Attack Tool

Post Syndicated from Darknet original https://www.darknet.org.uk/2017/12/cr3dov3r-credential-reuse-attack-tool/?utm_source=rss&utm_medium=social&utm_campaign=darknetfeed

Cr3dOv3r – Credential Reuse Attack Tool

Cr3dOv3r is a fairly simple Python-based set of functions that carry out the prelimary work as a credential reuse attack tool.

You just give the tool your target email address then it does two fairly straightforward (but useful) jobs:

  • Search for public leaks for the email and if it any, it returns with all available details about the leak (Using hacked-emails site API).
  • Then you give it this email’s old or leaked password then it checks this credentials against 16 websites (ex: facebook, twitter, google…) and notifies of any successful logins.

Read the rest of Cr3dOv3r – Credential Reuse Attack Tool now! Only available at Darknet.

Мария Габриел в София: #AVMSD

Post Syndicated from nellyo original https://nellyo.wordpress.com/2017/12/02/avmsd-16/

Мария Габриел, член на Европейската комисия с ресор Цифрова икономика и цифрово общество, участва в кръгла маса на тема: “Предизвикателства пред медийния сектор в Европа“,  организирана от Представителството на Европейската комисия в България и Съветът за електронни медии. Бяха представени две теми:

  • актуалните въпроси по Директивата за аудиовизуалните медийни услуги и 
  • обявената обществена консултация по въпросите на фалшивите новини и дезинформацията.

I

Първата тема – ход на ревизията на Директивата за аудиовизуални медийни услуги.

Първоначалният проект и развитията по 2016/0151(COD) могат да се следят тук.

Според Стратегическата 18-месечна програма на Съвета (председателство Естония, България, Австрия) трите председателства ще приключат работата по ключови инициативи, свързани с цифровия единен пазар, включително

улесняване на свързаността и постигане на напредък в развитието на конкурентоспособен и справедлив цифров единен пазар чрез насърчаване на трансграничната електронна търговия (онлайн продажба на стоки, предоставяне на цифрово съдържание, реформа на авторското право, аудиовизуални медийни услуги, доставяне на колетни пратки) и чрез преход към интелигентна икономика (свободно движение на данни, преглед на регулаторната рамка в областта на далекосъобщенията, инициативи в областта на дружественото право) и укрепване на доверието и сигурността в сферата на цифровите услуги (нов пакет за защита на данните)

Работата по  медийната директива (впрочем и по директивата за авторското право) продължават по време на Българското председателство – в този дух е съобщението на посланика на Естония в ЕС:

https://platform.twitter.com/widgets.js

 

От изложението на г-жа Габриел стана ясно, че е тази седмица е проведен  пети триалог, в който тя участва лично. Работата продължава (“финализиране се очаква”) по време на Българското председателство. Като открити бяха посочени три тематични области:

(1) разширяване на обхвата на директивата, включване на стрийминга в услугите,  нови адресати – социалните медии и платформите за видеосподеляне;

(2) квотата на произведенията, създадени от европейски продуценти – и празмерът на квотата (предложения: 20 на сто –  ЕК, 30 на сто –  ЕП), и дефинициите са дискусионни;

(3) търговските съобщения – и правилата (почасова продължителност и продължителност за денонощие), и разполагането (предложение: двоен лимит за светло и тъмно време), и прекъсването (предложение: 20-минутно правило) са още предмет на обсъждане.

По повод откритите въпроси:

  • Директивата разширява обхвата си с всяка ревизия. Все пак, когато се обсъжда отговорност на платформите, да бъде в контекста на Директивата за електронната търговия – защото има риск да се стигне  до мълчаливо преуреждане или отмяна на нейните принципи.  В допълнение практиката на ЕСПЧ (решението Делфи за отговорността за съобщения във форумите) илюстрира проблемите, възникващи при нееднозначно разбиране на отговорността в правото на ЕС и в международното право.  По подобни съображения е важна обсъжданата Директива за авторското право, в частност чл.13 – който (споделям позицията на EDRI) трябва да бъде заличен, а нови права не следва да бъдат създавани (чл.11 проекта).
  • Квотата на европейските произведения съществува реално на по-високи нива от предвиденото в директивата. В тази област проблем  е отношението европейска – национална  квота, защото местните индустрии  настояват именно за национална квота – а това би било мярка със съвсем различни цели от   европейската квота (единен пазар).
  • Уредбата на електронните търговски съобщения се либерализира (в България  – и дерегулира) непрекъснато. Все някъде е добре този процес да спре, за да не се превърне телевизията в поредица търговски съобщения, прекъсвани с основно съдържание. Слушаме уверенията на доставчиците, че потребителите  са най-голямата им ценност и телевизиите не биха програмирали против интересите на зрителите, но все пак нека останат и правни гаранции срещу океаните от търговски съобщения.

Ключови за регулацията остават балансите:

  • индустрии / аудитории – както обикновено, натискът на индустриите е мощен и организиран, докато аудиториите са слабо организирани и слабо представени в диалога – дано ЕП  защити интересите на гражданите;
  • гъвкавост / недопускане на фрагментиране – съвсем точно наблюдение: неслучайно при предната ревизия ЕК обърна внимание, че гъвкавост и свобода е добре, но не и свобода при  дефинициите – защото различните дефиниции в националните мерки компрометират ефективното прилагане;
  • регулиране / саморегулиране – никой не е против саморегулирането, но самата ЕК преди време беше заявила, че има области като интелектуалната собственост и конкурентното право, които не могат по естеството си да бъдат оставени само на саморегулиране.

II

По втората тема – фалшивите новини –   г-жа Габриел поясни, че ЕК работи по следната логика:  дефиниция – обзор на добри практики – мерки, като на този етап не се мисли за законодателство. Още за тази част от срещата –  в медиите.

През 2018 г. се очаква:

  • създаване на работна група на високо равнище и доклад в тримесечен срок;
  • Евробарометър по въпроси, свързани с фалшивите новини;
  • Съобщение на ЕК към средата на 2018 г.

Съвсем наскоро (9 ноември 2017 ) Асоциацията на европейските журналисти проведе международна конференция с последващо обучение по въпросите на дезинформацията и фалшивите новини. Имах възможност да участвам (вж последната част от  записа на конференцията):

  • Смятам, че трябва да се започне с  дефиниране, но на следващо място да се продължи с категоризиране на фалшивите новини.
  • Реакцията към различните категории фалшиви новини  трябва да е различна. Правото вече предвижда санкции за някои категории лъжа – напр. клеветата или подвеждащата реклама. Не съм   сигурна, че правото няма да се намеси с нови мерки в най-сериозните случаи, когато става дума за дезинформационни кампании, които могат да заплашат националната сигурност.
  • Няма единно мнение по въпроса кой идентифицира фалшивите новини или – с други думи: кой владее истината.  Италия предлага това да са нов тип конкурентни регулатори (аналогия с регулирането на подвеждащата реклама), Чехия – звено в МВР (при риск за националната сигурност). ЕК казва, че няма нужда от Министерство на истината – но по-добре ли е това да е частна компания? Виждали сме как Facebook различава морално/неморално – предстои ли да се заеме и с преценката вярно/невярно?  Не всички са съгласни. Но точно това се случва, неслучайно се обсъжда – научаваме – контранотификация срещу свръхпремахване на съдържание.

 

Filed under: EU Law, Media Law Tagged: давму, fake

timeShift(GrafanaBuzz, 1w) Issue 24

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/12/01/timeshiftgrafanabuzz-1w-issue-24/

Welcome to TimeShift

It’s hard to believe it’s already December. Here at Grafana Labs we’ve been spending a lot of time working on new features and enhancements for Grafana v5, and finalizing our selections for GrafanaCon EU. This week we have some interesting articles to share and a number of plugin updates. Enjoy!


Latest Release

Grafana 4.6.2 is now available and includes some bug fixes:

  • Prometheus: Fixes bug with new Prometheus alerts in Grafana. Make sure to download this version if you’re using Prometheus for alerting. More details in the issue. #9777
  • Color picker: Bug after using textbox input field to change/paste color string #9769
  • Cloudwatch: build using golang 1.9.2 #9667, thanks @mtanda
  • Heatmap: Fixed tooltip for “time series buckets” mode #9332
  • InfluxDB: Fixed query editor issue when using > or < operators in WHERE clause #9871

Download Grafana 4.6.2 Now


From the Blogosphere

Monitoring Camel with Prometheus in Red Hat OpenShift: This in-depth walk-through will show you how to build an Apache Camel application from scratch, deploy it in a Kubernetes environment, gather metrics using Prometheus and display them in Grafana.

How to run Grafana with DeviceHive: We see more and more examples of people using Grafana in IoT. This article discusses how to gather data from the IoT platform, DeviceHive, and build useful dashboards.

How to Install Grafana on Linux Servers: Pretty self-explanatory, but this tutorial walks you installing Grafana on Ubuntu 16.04 and CentOS 7. After installation, it covers configuration and plugin installation. This is the first article in an upcoming series about Grafana.

Monitoring your AKS cluster with Grafana: It’s important to know how your application is performing regardless of where it lives; the same applies to Kubernetes. This article focuses on aggregating data from Kubernetes with Heapster and feeding it to a backend for Grafana to visualize.

CoinStatistics: With the price of Bitcoin skyrocketing, more and more people are interested in cryptocurrencies. This is a cool dashboard that has a lot of stats about popular cryptocurrencies, and has a calculator to let you know when you can buy that lambo.

Using OpenNTI As A Collector For Streaming Telemetry From Juniper Devices: Part 1: This series will serve as a quick start guide for getting up and running with streaming real-time telemetry data from Juniper devices. This first article covers some high-level concepts and installation, while part 2 covers configuration options.

How to Get Metrics for Advance Alerting to Prevent Trouble: What good is performance monitoring if you’re never told when something has gone wrong? This article suggests ways to be more proactive to prevent issues and avoid the scramble to troubleshoot issues.

Thoughtworks: Technology Radar: We got a shout-out in the latest Technology Radar in the Tools section, as the dashboard visualization tool of choice for Prometheus!


GrafanaCon Tickets are Going Fast

Tickets are going fast for GrafanaCon EU, but we still have a seat reserved for you. Join us March 1-2, 2018 in Amsterdam for 2 days of talks centered around Grafana and the surrounding monitoring ecosystem including Graphite, Prometheus, InfluxData, Elasticsearch, Kubernetes, and more.

Get Your Ticket Now


Grafana Plugins

We have a number of plugin updates to highlight this week. Authors improve plugins regularly to fix bugs and improve performance, so it’s important to keep your plugins up to date. We’ve made updating easy; for on-prem Grafana, use the Grafana-cli tool, or update with 1 click if you’re using Hosted Grafana.

UPDATED PLUGIN

Clickhouse Data Source – The Clickhouse Data Source received a substantial update this week. It now has support for Ace Editor, which has a reformatting function for the query editor that automatically formats your sql. If you’re using Clickhouse then you should also have a look at CHProxy – see the plugin readme for more details.


Update

UPDATED PLUGIN

Influx Admin Panel – This panel received a number of small fixes. A new version will be coming soon with some new features.

Some of the changes (see the release notes) for more details):

  • Fix issue always showing query results
  • When there is only one row, swap rows/cols (ie: SHOW DIAGNOSTICS)
  • Improve auto-refresh behavior
  • Show ‘message’ response. (ie: please use POST)
  • Fix query time sorting
  • Show ‘status’ field (killed, etc)

Update

UPDATED PLUGIN

Gnocchi Data Source – The latest version of the Gnocchi Data Source adds support for dynamic aggregations.


Update

UPDATED PLUGINS

BT Plugins – All of the BT panel plugins received updates this week.


Upcoming Events:

In between code pushes we like to speak at, sponsor and attend all kinds of conferences and meetups. We have some awesome talks and events coming soon. Hope to see you at one of these!

KubeCon | Austin, TX – Dec. 6-8, 2017: We’re sponsoring KubeCon 2017! This is the must-attend conference for cloud native computing professionals. KubeCon + CloudNativeCon brings together leading contributors in:

  • Cloud native applications and computing
  • Containers
  • Microservices
  • Central orchestration processing
  • And more

Buy Tickets

FOSDEM | Brussels, Belgium – Feb 3-4, 2018: FOSDEM is a free developer conference where thousands of developers of free and open source software gather to share ideas and technology. Carl Bergquist is managing the Cloud and Monitoring Devroom, and we’ve heard there were some great talks submitted. There is no need to register; all are welcome.


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard and show it off! #monitoringLove

YIKES! Glad it’s not – there’s good attention and bad attention.


Grafana Labs is Hiring!

We are passionate about open source software and thrive on tackling complex challenges to build the future. We ship code from every corner of the globe and love working with the community. If this sounds exciting, you’re in luck – WE’RE HIRING!

Check out our Open Positions


How are we doing?

Let us know if you’re finding these weekly roundups valuable. Submit a comment on this article below, or post something at our community forum. Find an article I haven’t included? Send it my way. Help us make timeShift better!

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

AWS Cloud9 – Cloud Developer Environments

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/aws-cloud9-cloud-developer-environments/

One of the first things you learn when you start programming is that, just like any craftsperson, your tools matter. Notepad.exe isn’t going to cut it. A powerful editor and testing pipeline supercharge your productivity. I still remember learning to use Vim for the first time and being able to zip around systems and complex programs. Do you remember how hard it was to setup all your compilers and dependencies on a new machine? How many cycles have you wasted matching versions, tinkering with configs, and then writing documentation to onboard a new developer to a project?

Today we’re launching AWS Cloud9, an Integrated Development Environment (IDE) for writing, running, and debugging code, all from your web browser. Cloud9 comes prepackaged with essential tools for many popular programming languages (Javascript, Python, PHP, etc.) so you don’t have to tinker with installing various compilers and toolchains. Cloud9 also provides a seamless experience for working with serverless applications allowing you to quickly switch between local and remote testing or debugging. Based on the popular open source Ace Editor and c9.io IDE (which we acquired last year), AWS Cloud9 is designed to make collaborative cloud development easy with extremely powerful pair programming features. There are more features than I could ever cover in this post but to give a quick breakdown I’ll break the IDE into 3 components: The editor, the AWS integrations, and the collaboration.

Editing


The Ace Editor at the core of Cloud9 is what lets you write code quickly, easily, and beautifully. It follows a UNIX philosophy of doing one thing and doing it well: writing code.

It has all the typical IDE features you would expect: live syntax checking, auto-indent, auto-completion, code folding, split panes, version control integration, multiple cursors and selections, and it also has a few unique features I want to highlight. First of all, it’s fast, even for large (100000+ line) files. There’s no lag or other issues while typing. It has over two dozen themes built-in (solarized!) and you can bring all of your favorite themes from Sublime Text or TextMate as well. It has built-in support for 40+ language modes and customizable run configurations for your projects. Most importantly though, it has Vim mode (or emacs if your fingers work that way). It also has a keybinding editor that allows you to bend the editor to your will.

The editor supports powerful keyboard navigation and commands (similar to Sublime Text or vim plugins like ctrlp). On a Mac, with ⌘+P you can open any file in your environment with fuzzy search. With ⌘+. you can open up the command pane which allows you to do invoke any of the editor commands by typing the name. It also helpfully displays the keybindings for a command in the pane, for instance to open to a terminal you can press ⌥+T. Oh, did I mention there’s a terminal? It ships with the AWS CLI preconfigured for access to your resources.

The environment also comes with pre-installed debugging tools for many popular languages – but you’re not limited to what’s already installed. It’s easy to add in new programs and define new run configurations.

The editor is just one, admittedly important, component in an IDE though. I want to show you some other compelling features.

AWS Integrations

The AWS Cloud9 IDE is the first IDE I’ve used that is truly “cloud native”. The service is provided at no additional charge, and you only charged for the underlying compute and storage resources. When you create an environment you’re prompted for either: an instance type and an auto-hibernate time, or SSH access to a machine of your choice.

If you’re running in AWS the auto-hibernate feature will stop your instance shortly after you stop using your IDE. This can be a huge cost savings over running a more permanent developer desktop. You can also launch it within a VPC to give it secure access to your development resources. If you want to run Cloud9 outside of AWS, or on an existing instance, you can provide SSH access to the service which it will use to create an environment on the external machine. Your environment is provisioned with automatic and secure access to your AWS account so you don’t have to worry about copying credentials around. Let me say that again: you can run this anywhere.

Serverless Development with AWS Cloud9

I spend a lot of time on Twitch developing serverless applications. I have hundreds of lambda functions and APIs deployed. Cloud9 makes working with every single one of these functions delightful. Let me show you how it works.


If you look in the top right side of the editor you’ll see an AWS Resources tab. Opening this you can see all of the lambda functions in your region (you can see functions in other regions by adjusting your region preferences in the AWS preference pane).

You can import these remote functions to your local workspace just by double-clicking them. This allows you to edit, test, and debug your serverless applications all locally. You can create new applications and functions easily as well. If you click the Lambda icon in the top right of the pane you’ll be prompted to create a new lambda function and Cloud9 will automatically create a Serverless Application Model template for you as well. The IDE ships with support for the popular SAM local tool pre-installed. This is what I use in most of my local testing and serverless development. Since you have a terminal, it’s easy to install additional tools and use other serverless frameworks.

 

Launching an Environment from AWS CodeStar

With AWS CodeStar you can easily provision an end-to-end continuous delivery toolchain for development on AWS. Codestar provides a unified experience for building, testing, deploying, and managing applications using AWS CodeCommit, CodeBuild, CodePipeline, and CodeDeploy suite of services. Now, with a few simple clicks you can provision a Cloud9 environment to develop your application. Your environment will be pre-configured with the code for your CodeStar application already checked out and git credentials already configured.

You can easily share this environment with your coworkers which leads me to another extremely useful set of features.

Collaboration

One of the many things that sets AWS Cloud9 apart from other editors are the rich collaboration tools. You can invite an IAM user to your environment with a few clicks.

You can see what files they’re working on, where their cursors are, and even share a terminal. The chat features is useful as well.

Things to Know

  • There are no additional charges for this service beyond the underlying compute and storage.
  • c9.io continues to run for existing users. You can continue to use all the features of c9.io and add new team members if you have a team account. In the future, we will provide tools for easy migration of your c9.io workspaces to AWS Cloud9.
  • AWS Cloud9 is available in the US West (Oregon), US East (Ohio), US East (N.Virginia), EU (Ireland), and Asia Pacific (Singapore) regions.

I can’t wait to see what you build with AWS Cloud9!

Randall

AWS Systems Manager – A Unified Interface for Managing Your Cloud and Hybrid Resources

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/aws-systems-manager/

AWS Systems Manager is a new way to manage your cloud and hybrid IT environments. AWS Systems Manager provides a unified user interface that simplifies resource and application management, shortens the time to detect and resolve operational problems, and makes it easy to operate and manage your infrastructure securely at scale. This service is absolutely packed full of features. It defines a new experience around grouping, visualizing, and reacting to problems using features from products like Amazon EC2 Systems Manager (SSM) to enable rich operations across your resources.

As I said above, there are a lot of powerful features in this service and we won’t be able to dive deep on all of them but it’s easy to go to the console and get started with any of the tools.

Resource Groupings

Resource Groups allow you to create logical groupings of most resources that support tagging like: Amazon Elastic Compute Cloud (EC2) instances, Amazon Simple Storage Service (S3) buckets, Elastic Load Balancing balancers, Amazon Relational Database Service (RDS) instances, Amazon Virtual Private Cloud, Amazon Kinesis streams, Amazon Route 53 zones, and more. Previously, you could use the AWS Console to define resource groupings but AWS Systems Manager provides this new resource group experience via a new console and API. These groupings are a fundamental building block of Systems Manager in that they are frequently the target of various operations you may want to perform like: compliance management, software inventories, patching, and other automations.

You start by defining a group based on tag filters. From there you can view all of the resources in a centralized console. You would typically use these groupings to differentiate between applications, application layers, and environments like production or dev – but you can make your own rules about how to use them as well. If you imagine a typical 3 tier web-app you might have a few EC2 instances, an ELB, a few S3 buckets, and an RDS instance. You can define a grouping for that application and with all of those different resources simultaneously.

Insights

AWS Systems Manager automatically aggregates and displays operational data for each resource group through a dashboard. You no longer need to navigate through multiple AWS consoles to view all of your operational data. You can easily integrate your exiting Amazon CloudWatch dashboards, AWS Config rules, AWS CloudTrail trails, AWS Trusted Advisor notifications, and AWS Personal Health Dashboard performance and availability alerts. You can also easily view your software inventories across your fleet. AWS Systems Manager also provides a compliance dashboard allowing you to see the state of various security controls and patching operations across your fleets.

Acting on Insights

Building on the success of EC2 Systems Manager (SSM), AWS Systems Manager takes all of the features of SSM and provides a central place to access them. These are all the same experiences you would have through SSM with a more accesible console and centralized interface. You can use the resource groups you’ve defined in Systems Manager to visualize and act on groups of resources.

Automation


Automations allow you to define common IT tasks as a JSON document that specify a list of tasks. You can also use community published documents. These documents can be executed through the Console, CLIs, SDKs, scheduled maintenance windows, or triggered based on changes in your infrastructure through CloudWatch events. You can track and log the execution of each step in the documents and prompt for additional approvals. It also allows you to incrementally roll out changes and automatically halt when errors occur. You can start executing an automation directly on a resource group and it will be able to apply itself to the resources that it understands within the group.

Run Command

Run Command is a superior alternative to enabling SSH on your instances. It provides safe, secure remote management of your instances at scale without logging into your servers, replacing the need for SSH bastions or remote powershell. It has granular IAM permissions that allow you to restrict which roles or users can run certain commands.

Patch Manager, Maintenance Windows, and State Manager

I’ve written about Patch Manager before and if you manage fleets of Windows and Linux instances it’s a great way to maintain a common baseline of security across your fleet.

Maintenance windows allow you to schedule instance maintenance and other disruptive tasks for a specific time window.

State Manager allows you to control various server configuration details like anti-virus definitions, firewall settings, and more. You can define policies in the console or run existing scripts, PowerShell modules, or even Ansible playbooks directly from S3 or GitHub. You can query State Manager at any time to view the status of your instance configurations.

Things To Know

There’s some interesting terminology here. We haven’t done the best job of naming things in the past so let’s take a moment to clarify. EC2 Systems Manager (sometimes called SSM) is what you used before today. You can still invoke aws ssm commands. However, AWS Systems Manager builds on and enhances many of the tools provided by EC2 Systems Manager and allows those same tools to be applied to more than just EC2. When you see the phrase “Systems Manager” in the future you should think of AWS Systems Manager and not EC2 Systems Manager.

AWS Systems Manager with all of this useful functionality is provided at no additional charge. It is immediately available in all public AWS regions.

The best part about these services is that even with their tight integrations each one is designed to be used in isolation as well. If you only need one component of these services it’s simple to get started with only that component.

There’s a lot more than I could ever document in this post so I encourage you all to jump into the console and documentation to figure out where you can start using AWS Systems Manager.

Randall

EU Court: Cloud-Based TV Recorder Requires Rightsholder Permission

Post Syndicated from Andy original https://torrentfreak.com/eu-court-cloud-based-tv-recorder-requires-rightsholder-permission-171130/

Over the years, many useful devices have come along which enable the public to make copies of copyright works, the VCR (video cassette recorder) being a prime example.

But while many such devices have been consumed by history, their modern equivalents still pose tricky questions for copyright law. One such service is VCAST, which markets itself as a Video Cloud Recorder. It functions in a notionally similar way to its older cousin but substitutes cassette storage for that in the cloud.

VCAST targets the Italian market, allowing users to sign up in order to gain access to more than 50 digital terrestrial TV channels. However, rather than simply watching live, the user can tell VCAST to receive TV shows (via its own antenna system) while recording them to private cloud storage (such as Google Drive) for subsequent viewing over the Internet.

VCAST attracted the negative interests of rightsholders, including Mediaset-owned RTI, who doubted the legality of the service. So, in response, VCAST sued RTI at the Turin Court of First Instance, seeking a judgment confirming the legality of its operations. The company believed that since the recordings are placed in users’ own cloud storage, the Italian private copying exception would apply and rightsholders would be compensated.

Perhaps unsurprisingly given the complexity of the case, the Turin Court decided to refer questions to the European Court of Justice. It essentially asked whether the private copying exception is applicable when the copying requires a service like VCAST and whether such a service is allowed to operate without permission from copyright holders.

In September, Advocate General Szpunar published his opinion, concluding that EU law prohibits this kind of service when copyright holders haven’t given their permission. Today, the ECJ handed down its decision, broadly agreeing with Szpunar’s conclusion.

“By today’s judgment, the Court finds that the service provided by VCAST has a dual functionality, consisting in ensuring both the reproduction and the making available of protected works. To the extent that the service offered by VCAST consists in the making available of protected works, it falls within communication to the public,” the ECJ announced.

“In that regard, the Court recalls that, according to the directive, any communication to the public, including the making available of a protected work or subject-matter, requires the rightholder’s consent, given that the right of communication of works to the public should be understood, in a broad sense, as covering any transmission or retransmission of a work to the public by wire or wireless means, including broadcasting.”

The ECJ notes that the original transmission made by RTI was intended for one audience. In turn, the transmission by VCAST was intended for another. In this respect, the subsequent VCAST transmission was made to a “new public”, which means that copyright holder permission is required under EU law.

“Accordingly, such a remote recording service cannot fall within the private copying exception,” the ECJ concludes.

The full text of the judgment can be found here.

The key ruling reads as follows:

Directive 2001/29/EC of the European Parliament and of the Council of 22 May 2001 on the harmonisation of certain aspects of copyright and related rights in the information society, in particular Article 5(2)(b) thereof, must be interpreted as precluding national legislation which permits a commercial undertaking to provide private individuals with a cloud service for the remote recording of private copies of works protected by copyright, by means of a computer system, by actively involving itself in the recording, without the rightholder’s consent.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

GDPR – A Practical Guide For Developers

Post Syndicated from Bozho original https://techblog.bozho.net/gdpr-practical-guide-developers/

You’ve probably heard about GDPR. The new European data protection regulation that applies practically to everyone. Especially if you are working in a big company, it’s most likely that there’s already a process for gettign your systems in compliance with the regulation.

The regulation is basically a law that must be followed in all European countries (but also applies to non-EU companies that have users in the EU). In this particular case, it applies to companies that are not registered in Europe, but are having European customers. So that’s most companies. I will not go into yet another “12 facts about GDPR” or “7 myths about GDPR” posts/whitepapers, as they are often aimed at managers or legal people. Instead, I’ll focus on what GDPR means for developers.

Why am I qualified to do that? A few reasons – I was advisor to the deputy prime minister of a EU country, and because of that I’ve been both exposed and myself wrote some legislation. I’m familiar with the “legalese” and how the regulatory framework operates in general. I’m also a privacy advocate and I’ve been writing about GDPR-related stuff in the past, i.e. “before it was cool” (protecting sensitive data, the right to be forgotten). And finally, I’m currently working on a project that (among other things) aims to help with covering some GDPR aspects.

I’ll try to be a bit more comprehensive this time and cover as many aspects of the regulation that concern developers as I can. And while developers will mostly be concerned about how the systems they are working on have to change, it’s not unlikely that a less informed manager storms in in late spring, realizing GDPR is going to be in force tomorrow, asking “what should we do to get our system/website compliant”.

The rights of the user/client (referred to as “data subject” in the regulation) that I think are relevant for developers are: the right to erasure (the right to be forgotten/deleted from the system), right to restriction of processing (you still keep the data, but mark it as “restricted” and don’t touch it without further consent by the user), the right to data portability (the ability to export one’s data), the right to rectification (the ability to get personal data fixed), the right to be informed (getting human-readable information, rather than long terms and conditions), the right of access (the user should be able to see all the data you have about them), the right to data portability (the user should be able to get a machine-readable dump of their data).

Additionally, the relevant basic principles are: data minimization (one should not collect more data than necessary), integrity and confidentiality (all security measures to protect data that you can think of + measures to guarantee that the data has not been inappropriately modified).

Even further, the regulation requires certain processes to be in place within an organization (of more than 250 employees or if a significant amount of data is processed), and those include keeping a record of all types of processing activities carried out, including transfers to processors (3rd parties), which includes cloud service providers. None of the other requirements of the regulation have an exception depending on the organization size, so “I’m small, GDPR does not concern me” is a myth.

It is important to know what “personal data” is. Basically, it’s every piece of data that can be used to uniquely identify a person or data that is about an already identified person. It’s data that the user has explicitly provided, but also data that you have collected about them from either 3rd parties or based on their activities on the site (what they’ve been looking at, what they’ve purchased, etc.)

Having said that, I’ll list a number of features that will have to be implemented and some hints on how to do that, followed by some do’s and don’t’s.

  • “Forget me” – you should have a method that takes a userId and deletes all personal data about that user (in case they have been collected on the basis of consent, and not due to contract enforcement or legal obligation). It is actually useful for integration tests to have that feature (to cleanup after the test), but it may be hard to implement depending on the data model. In a regular data model, deleting a record may be easy, but some foreign keys may be violated. That means you have two options – either make sure you allow nullable foreign keys (for example an order usually has a reference to the user that made it, but when the user requests his data be deleted, you can set the userId to null), or make sure you delete all related data (e.g. via cascades). This may not be desirable, e.g. if the order is used to track available quantities or for accounting purposes. It’s a bit trickier for event-sourcing data models, or in extreme cases, ones that include some sort of blcokchain/hash chain/tamper-evident data structure. With event sourcing you should be able to remove a past event and re-generate intermediate snapshots. For blockchain-like structures – be careful what you put in there and avoid putting personal data of users. There is an option to use a chameleon hash function, but that’s suboptimal. Overall, you must constantly think of how you can delete the personal data. And “our data model doesn’t allow it” isn’t an excuse.
  • Notify 3rd parties for erasure – deleting things from your system may be one thing, but you are also obligated to inform all third parties that you have pushed that data to. So if you have sent personal data to, say, Salesforce, Hubspot, twitter, or any cloud service provider, you should call an API of theirs that allows for the deletion of personal data. If you are such a provider, obviously, your “forget me” endpoint should be exposed. Calling the 3rd party APIs to remove data is not the full story, though. You also have to make sure the information does not appear in search results. Now, that’s tricky, as Google doesn’t have an API for removal, only a manual process. Fortunately, it’s only about public profile pages that are crawlable by Google (and other search engines, okay…), but you still have to take measures. Ideally, you should make the personal data page return a 404 HTTP status, so that it can be removed.
  • Restrict processing – in your admin panel where there’s a list of users, there should be a button “restrict processing”. The user settings page should also have that button. When clicked (after reading the appropriate information), it should mark the profile as restricted. That means it should no longer be visible to the backoffice staff, or publicly. You can implement that with a simple “restricted” flag in the users table and a few if-clasues here and there.
  • Export data – there should be another button – “export data”. When clicked, the user should receive all the data that you hold about them. What exactly is that data – depends on the particular usecase. Usually it’s at least the data that you delete with the “forget me” functionality, but may include additional data (e.g. the orders the user has made may not be delete, but should be included in the dump). The structure of the dump is not strictly defined, but my recommendation would be to reuse schema.org definitions as much as possible, for either JSON or XML. If the data is simple enough, a CSV/XLS export would also be fine. Sometimes data export can take a long time, so the button can trigger a background process, which would then notify the user via email when his data is ready (twitter, for example, does that already – you can request all your tweets and you get them after a while).
  • Allow users to edit their profile – this seems an obvious rule, but it isn’t always followed. Users must be able to fix all data about them, including data that you have collected from other sources (e.g. using a “login with facebook” you may have fetched their name and address). Rule of thumb – all the fields in your “users” table should be editable via the UI. Technically, rectification can be done via a manual support process, but that’s normally more expensive for a business than just having the form to do it. There is one other scenario, however, when you’ve obtained the data from other sources (i.e. the user hasn’t provided their details to you directly). In that case there should still be a page where they can identify somehow (via email and/or sms confirmation) and get access to the data about them.
  • Consent checkboxes – this is in my opinion the biggest change that the regulation brings. “I accept the terms and conditions” would no longer be sufficient to claim that the user has given their consent for processing their data. So, for each particular processing activity there should be a separate checkbox on the registration (or user profile) screen. You should keep these consent checkboxes in separate columns in the database, and let the users withdraw their consent (by unchecking these checkboxes from their profile page – see the previous point). Ideally, these checkboxes should come directly from the register of processing activities (if you keep one). Note that the checkboxes should not be preselected, as this does not count as “consent”.
  • Re-request consent – if the consent users have given was not clear (e.g. if they simply agreed to terms & conditions), you’d have to re-obtain that consent. So prepare a functionality for mass-emailing your users to ask them to go to their profile page and check all the checkboxes for the personal data processing activities that you have.
  • “See all my data” – this is very similar to the “Export” button, except data should be displayed in the regular UI of the application rather than an XML/JSON format. For example, Google Maps shows you your location history – all the places that you’ve been to. It is a good implementation of the right to access. (Though Google is very far from perfect when privacy is concerned)
  • Age checks – you should ask for the user’s age, and if the user is a child (below 16), you should ask for parent permission. There’s no clear way how to do that, but my suggestion is to introduce a flow, where the child should specify the email of a parent, who can then confirm. Obviosuly, children will just cheat with their birthdate, or provide a fake parent email, but you will most likely have done your job according to the regulation (this is one of the “wishful thinking” aspects of the regulation).

Now some “do’s”, which are mostly about the technical measures needed to protect personal data. They may be more “ops” than “dev”, but often the application also has to be extended to support them. I’ve listed most of what I could think of in a previous post.

  • Encrypt the data in transit. That means that communication between your application layer and your database (or your message queue, or whatever component you have) should be over TLS. The certificates could be self-signed (and possibly pinned), or you could have an internal CA. Different databases have different configurations, just google “X encrypted connections. Some databases need gossiping among the nodes – that should also be configured to use encryption
  • Encrypt the data at rest – this again depends on the database (some offer table-level encryption), but can also be done on machine-level. E.g. using LUKS. The private key can be stored in your infrastructure, or in some cloud service like AWS KMS.
  • Encrypt your backups – kind of obvious
  • Implement pseudonymisation – the most obvious use-case is when you want to use production data for the test/staging servers. You should change the personal data to some “pseudonym”, so that the people cannot be identified. When you push data for machine learning purposes (to third parties or not), you can also do that. Technically, that could mean that your User object can have a “pseudonymize” method which applies hash+salt/bcrypt/PBKDF2 for some of the data that can be used to identify a person
  • Protect data integrity – this is a very broad thing, and could simply mean “have authentication mechanisms for modifying data”. But you can do something more, even as simple as a checksum, or a more complicated solution (like the one I’m working on). It depends on the stakes, on the way data is accessed, on the particular system, etc. The checksum can be in the form of a hash of all the data in a given database record, which should be updated each time the record is updated through the application. It isn’t a strong guarantee, but it is at least something.
  • Have your GDPR register of processing activities in something other than Excel – Article 30 says that you should keep a record of all the types of activities that you use personal data for. That sounds like bureaucracy, but it may be useful – you will be able to link certain aspects of your application with that register (e.g. the consent checkboxes, or your audit trail records). It wouldn’t take much time to implement a simple register, but the business requirements for that should come from whoever is responsible for the GDPR compliance. But you can advise them that having it in Excel won’t make it easy for you as a developer (imagine having to fetch the excel file internally, so that you can parse it and implement a feature). Such a register could be a microservice/small application deployed separately in your infrastructure.
  • Log access to personal data – every read operation on a personal data record should be logged, so that you know who accessed what and for what purpose
  • Register all API consumers – you shouldn’t allow anonymous API access to personal data. I’d say you should request the organization name and contact person for each API user upon registration, and add those to the data processing register. Note: some have treated article 30 as a requirement to keep an audit log. I don’t think it is saying that – instead it requires 250+ companies to keep a register of the types of processing activities (i.e. what you use the data for). There are other articles in the regulation that imply that keeping an audit log is a best practice (for protecting the integrity of the data as well as to make sure it hasn’t been processed without a valid reason)

Finally, some “don’t’s”.

  • Don’t use data for purposes that the user hasn’t agreed with – that’s supposed to be the spirit of the regulation. If you want to expose a new API to a new type of clients, or you want to use the data for some machine learning, or you decide to add ads to your site based on users’ behaviour, or sell your database to a 3rd party – think twice. I would imagine your register of processing activities could have a button to send notification emails to users to ask them for permission when a new processing activity is added (or if you use a 3rd party register, it should probably give you an API). So upon adding a new processing activity (and adding that to your register), mass email all users from whom you’d like consent.
  • Don’t log personal data – getting rid of the personal data from log files (especially if they are shipped to a 3rd party service) can be tedious or even impossible. So log just identifiers if needed. And make sure old logs files are cleaned up, just in case
  • Don’t put fields on the registration/profile form that you don’t need – it’s always tempting to just throw as many fields as the usability person/designer agrees on, but unless you absolutely need the data for delivering your service, you shouldn’t collect it. Names you should probably always collect, but unless you are delivering something, a home address or phone is unnecessary.
  • Don’t assume 3rd parties are compliant – you are responsible if there’s a data breach in one of the 3rd parties (e.g. “processors”) to which you send personal data. So before you send data via an API to another service, make sure they have at least a basic level of data protection. If they don’t, raise a flag with management.
  • Don’t assume having ISO XXX makes you compliant – information security standards and even personal data standards are a good start and they will probably 70% of what the regulation requires, but they are not sufficient – most of the things listed above are not covered in any of those standards

Overall, the purpose of the regulation is to make you take conscious decisions when processing personal data. It imposes best practices in a legal way. If you follow the above advice and design your data model, storage, data flow , API calls with data protection in mind, then you shouldn’t worry about the huge fines that the regulation prescribes – they are for extreme cases, like Equifax for example. Regulators (data protection authorities) will most likely have some checklists into which you’d have to somehow fit, but if you follow best practices, that shouldn’t be an issue.

I think all of the above features can be implemented in a few weeks by a small team. Be suspicious when a big vendor offers you a generic plug-and-play “GDPR compliance” solution. GDPR is not just about the technical aspects listed above – it does have organizational/process implications. But also be suspicious if a consultant claims GDPR is complicated. It’s not – it relies on a few basic principles that are in fact best practices anyway. Just don’t ignore them.

The post GDPR – A Practical Guide For Developers appeared first on Bozho's tech blog.

Object models

Post Syndicated from Eevee original https://eev.ee/blog/2017/11/28/object-models/

Anonymous asks, with dollars:

More about programming languages!

Well then!

I’ve written before about what I think objects are: state and behavior, which in practice mostly means method calls.

I suspect that the popular impression of what objects are, and also how they should work, comes from whatever C++ and Java happen to do. From that point of view, the whole post above is probably nonsense. If the baseline notion of “object” is a rigid definition woven tightly into the design of two massively popular languages, then it doesn’t even make sense to talk about what “object” should mean — it does mean the features of those languages, and cannot possibly mean anything else.

I think that’s a shame! It piles a lot of baggage onto a fairly simple idea. Polymorphism, for example, has nothing to do with objects — it’s an escape hatch for static type systems. Inheritance isn’t the only way to reuse code between objects, but it’s the easiest and fastest one, so it’s what we get. Frankly, it’s much closer to a speed tradeoff than a fundamental part of the concept.

We could do with more experimentation around how objects work, but that’s impossible in the languages most commonly thought of as object-oriented.

Here, then, is a (very) brief run through the inner workings of objects in four very dynamic languages. I don’t think I really appreciated objects until I’d spent some time with Python, and I hope this can help someone else whet their own appetite.

Python 3

Of the four languages I’m going to touch on, Python will look the most familiar to the Java and C++ crowd. For starters, it actually has a class construct.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
class Vector:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __neg__(self):
        return Vector(-self.x, -self.y)

    def __div__(self, denom):
        return Vector(self.x / denom, self.y / denom)

    @property
    def magnitude(self):
        return (self.x ** 2 + self.y ** 2) ** 0.5

    def normalized(self):
        return self / self.magnitude

The __init__ method is an initializer, which is like a constructor but named differently (because the object already exists in a usable form by the time the initializer is called). Operator overloading is done by implementing methods with other special __dunder__ names. Properties can be created with @property, where the @ is syntax for applying a wrapper function to a function as it’s defined. You can do inheritance, even multiply:

1
2
3
4
class Foo(A, B, C):
    def bar(self, x, y, z):
        # do some stuff
        super().bar(x, y, z)

Cool, a very traditional object model.

Except… for some details.

Some details

For one, Python objects don’t have a fixed layout. Code both inside and outside the class can add or remove whatever attributes they want from whatever object they want. The underlying storage is just a dict, Python’s mapping type. (Or, rather, something like one. Also, it’s possible to change, which will probably be the case for everything I say here.)

If you create some attributes at the class level, you’ll start to get a peek behind the curtains:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
class Foo:
    values = []

    def add_value(self, value):
        self.values.append(value)

a = Foo()
b = Foo()
a.add_value('a')
print(a.values)  # ['a']
b.add_value('b')
print(b.values)  # ['a', 'b']

The [] assigned to values isn’t a default assigned to each object. In fact, the individual objects don’t know about it at all! You can use vars(a) to get at the underlying storage dict, and you won’t see a values entry in there anywhere.

Instead, values lives on the class, which is a value (and thus an object) in its own right. When Python is asked for self.values, it checks to see if self has a values attribute; in this case, it doesn’t, so Python keeps going and asks the class for one.

Python’s object model is secretly prototypical — a class acts as a prototype, as a shared set of fallback values, for its objects.

In fact, this is also how method calls work! They aren’t syntactically special at all, which you can see by separating the attribute lookup from the call.

1
2
3
print("abc".startswith("a"))  # True
meth = "abc".startswith
print(meth("a"))  # True

Reading obj.method looks for a method attribute; if there isn’t one on obj, Python checks the class. Here, it finds one: it’s a function from the class body.

Ah, but wait! In the code I just showed, meth seems to “know” the object it came from, so it can’t just be a plain function. If you inspect the resulting value, it claims to be a “bound method” or “built-in method” rather than a function, too. Something funny is going on here, and that funny something is the descriptor protocol.

Descriptors

Python allows attributes to implement their own custom behavior when read from or written to. Such an attribute is called a descriptor. I’ve written about them before, but here’s a quick overview.

If Python looks up an attribute, finds it in a class, and the value it gets has a __get__ method… then instead of using that value, Python will use the return value of its __get__ method.

The @property decorator works this way. The magnitude property in my original example was shorthand for doing this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
class MagnitudeDescriptor:
    def __get__(self, instance, owner):
        if instance is None:
            return self
        return (instance.x ** 2 + instance.y ** 2) ** 0.5

class Vector:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    magnitude = MagnitudeDescriptor()

When you ask for somevec.magnitude, Python checks somevec but doesn’t find magnitude, so it consults the class instead. The class does have a magnitude, and it’s a value with a __get__ method, so Python calls that method and somevec.magnitude evaluates to its return value. (The instance is None check is because __get__ is called even if you get the descriptor directly from the class via Vector.magnitude. A descriptor intended to work on instances can’t do anything useful in that case, so the convention is to return the descriptor itself.)

You can also intercept attempts to write to or delete an attribute, and do absolutely whatever you want instead. But note that, similar to operating overloading in Python, the descriptor must be on a class; you can’t just slap one on an arbitrary object and have it work.

This brings me right around to how “bound methods” actually work. Functions are descriptors! The function type implements __get__, and when a function is retrieved from a class via an instance, that __get__ bundles the function and the instance together into a tiny bound method object. It’s essentially:

1
2
3
4
5
class FunctionType:
    def __get__(self, instance, owner):
        if instance is None:
            return self
        return functools.partial(self, instance)

The self passed as the first argument to methods is not special or magical in any way. It’s built out of a few simple pieces that are also readily accessible to Python code.

Note also that because obj.method() is just an attribute lookup and a call, Python doesn’t actually care whether method is a method on the class or just some callable thing on the object. You won’t get the auto-self behavior if it’s on the object, but otherwise there’s no difference.

More attribute access, and the interesting part

Descriptors are one of several ways to customize attribute access. Classes can implement __getattr__ to intervene when an attribute isn’t found on an object; __setattr__ and __delattr__ to intervene when any attribute is set or deleted; and __getattribute__ to implement unconditional attribute access. (That last one is a fantastic way to create accidental recursion, since any attribute access you do within __getattribute__ will of course call __getattribute__ again.)

Here’s what I really love about Python. It might seem like a magical special case that descriptors only work on classes, but it really isn’t. You could implement exactly the same behavior yourself, in pure Python, using only the things I’ve just told you about. Classes are themselves objects, remember, and they are instances of type, so the reason descriptors only work on classes is that type effectively does this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
class type:
    def __getattribute__(self, name):
        value = super().__getattribute__(name)
        # like all op overloads, __get__ must be on the type, not the instance
        ty = type(value)
        if hasattr(ty, '__get__'):
            # it's a descriptor!  this is a class access so there is no instance
            return ty.__get__(value, None, self)
        else:
            return value

You can even trivially prove to yourself that this is what’s going on by skipping over types behavior:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
class Descriptor:
    def __get__(self, instance, owner):
        print('called!')

class Foo:
    bar = Descriptor()

Foo.bar  # called!
type.__getattribute__(Foo, 'bar')  # called!
object.__getattribute__(Foo, 'bar')  # ...

And that’s not all! The mysterious super function, used to exhaustively traverse superclass method calls even in the face of diamond inheritance, can also be expressed in pure Python using these primitives. You could write your own superclass calling convention and use it exactly the same way as super.

This is one of the things I really like about Python. Very little of it is truly magical; virtually everything about the object model exists in the types rather than the language, which means virtually everything can be customized in pure Python.

Class creation and metaclasses

A very brief word on all of this stuff, since I could talk forever about Python and I have three other languages to get to.

The class block itself is fairly interesting. It looks like this:

1
2
class Name(*bases, **kwargs):
    # code

I’ve said several times that classes are objects, and in fact the class block is one big pile of syntactic sugar for calling type(...) with some arguments to create a new type object.

The Python documentation has a remarkably detailed description of this process, but the gist is:

  • Python determines the type of the new class — the metaclass — by looking for a metaclass keyword argument. If there isn’t one, Python uses the “lowest” type among the provided base classes. (If you’re not doing anything special, that’ll just be type, since every class inherits from object and object is an instance of type.)

  • Python executes the class body. It gets its own local scope, and any assignments or method definitions go into that scope.

  • Python now calls type(name, bases, attrs, **kwargs). The name is whatever was right after class; the bases are position arguments; and attrs is the class body’s local scope. (This is how methods and other class attributes end up on the class.) The brand new type is then assigned to Name.

Of course, you can mess with most of this. You can implement __prepare__ on a metaclass, for example, to use a custom mapping as storage for the local scope — including any reads, which allows for some interesting shenanigans. The only part you can’t really implement in pure Python is the scoping bit, which has a couple extra rules that make sense for classes. (In particular, functions defined within a class block don’t close over the class body; that would be nonsense.)

Object creation

Finally, there’s what actually happens when you create an object — including a class, which remember is just an invocation of type(...).

Calling Foo(...) is implemented as, well, a call. Any type can implement calls with the __call__ special method, and you’ll find that type itself does so. It looks something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# oh, a fun wrinkle that's hard to express in pure python: type is a class, so
# it's an instance of itself
class type:
    def __call__(self, *args, **kwargs):
        # remember, here 'self' is a CLASS, an instance of type.
        # __new__ is a true constructor: object.__new__ allocates storage
        # for a new blank object
        instance = self.__new__(self, *args, **kwargs)
        # you can return whatever you want from __new__ (!), and __init__
        # is only called on it if it's of the right type
        if isinstance(instance, self):
            instance.__init__(*args, **kwargs)
        return instance

Again, you can trivially confirm this by asking any type for its __call__ method. Assuming that type doesn’t implement __call__ itself, you’ll get back a bound version of types implementation.

1
2
>>> list.__call__
<method-wrapper '__call__' of type object at 0x7fafb831a400>

You can thus implement __call__ in your own metaclass to completely change how subclasses are created — including skipping the creation altogether, if you like.

And… there’s a bunch of stuff I haven’t even touched on.

The Python philosophy

Python offers something that, on the surface, looks like a “traditional” class/object model. Under the hood, it acts more like a prototypical system, where failed attribute lookups simply defer to a superclass or metaclass.

The language also goes to almost superhuman lengths to expose all of its moving parts. Even the prototypical behavior is an implementation of __getattribute__ somewhere, which you are free to completely replace in your own types. Proxying and delegation are easy.

Also very nice is that these features “bundle” well, by which I mean a library author can do all manner of convoluted hijinks, and a consumer of that library doesn’t have to see any of it or understand how it works. You only need to inherit from a particular class (which has a metaclass), or use some descriptor as a decorator, or even learn any new syntax.

This meshes well with Python culture, which is pretty big on the principle of least surprise. These super-advanced features tend to be tightly confined to single simple features (like “makes a weak attribute“) or cordoned with DSLs (e.g., defining a form/struct/database table with a class body). In particular, I’ve never seen a metaclass in the wild implement its own __call__.

I have mixed feelings about that. It’s probably a good thing overall that the Python world shows such restraint, but I wonder if there are some very interesting possibilities we’re missing out on. I implemented a metaclass __call__ myself, just once, in an entity/component system that strove to minimize fuss when communicating between components. It never saw the light of day, but I enjoyed seeing some new things Python could do with the same relatively simple syntax. I wouldn’t mind seeing, say, an object model based on composition (with no inheritance) built atop Python’s primitives.

Lua

Lua doesn’t have an object model. Instead, it gives you a handful of very small primitives for building your own object model. This is pretty typical of Lua — it’s a very powerful language, but has been carefully constructed to be very small at the same time. I’ve never encountered anything else quite like it, and “but it starts indexing at 1!” really doesn’t do it justice.

The best way to demonstrate how objects work in Lua is to build some from scratch. We need two key features. The first is metatables, which bear a passing resemblance to Python’s metaclasses.

Tables and metatables

The table is Lua’s mapping type and its primary data structure. Keys can be any value other than nil. Lists are implemented as tables whose keys are consecutive integers starting from 1. Nothing terribly surprising. The dot operator is sugar for indexing with a string key.

1
2
3
4
5
local t = { a = 1, b = 2 }
print(t['a'])  -- 1
print(t.b)  -- 2
t.c = 3
print(t['c'])  -- 3

A metatable is a table that can be associated with another value (usually another table) to change its behavior. For example, operator overloading is implemented by assigning a function to a special key in a metatable.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
local t = { a = 1, b = 2 }
--print(t + 0)  -- error: attempt to perform arithmetic on a table value

local mt = {
    __add = function(left, right)
        return 12
    end,
}
setmetatable(t, mt)
print(t + 0)  -- 12

Now, the interesting part: one of the special keys is __index, which is consulted when the base table is indexed by a key it doesn’t contain. Here’s a table that claims every key maps to itself.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
local t = {}
local mt = {
    __index = function(table, key)
        return key
    end,
}
setmetatable(t, mt)
print(t.foo)  -- foo
print(t.bar)  -- bar
print(t[3])  -- 3

__index doesn’t have to be a function, either. It can be yet another table, in which case that table is simply indexed with the key. If the key still doesn’t exist and that table has a metatable with an __index, the process repeats.

With this, it’s easy to have several unrelated tables that act as a single table. Call the base table an object, fill the __index table with functions and call it a class, and you have half of an object system. You can even get prototypical inheritance by chaining __indexes together.

At this point things are a little confusing, since we have at least three tables going on, so here’s a diagram. Keep in mind that Lua doesn’t actually have anything called an “object”, “class”, or “method” — those are just convenient nicknames for a particular structure we might build with Lua’s primitives.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
                    ╔═══════════╗        ...
                    ║ metatable ║         ║
                    ╟───────────╢   ┌─────╨───────────────────────┐
                    ║ __index   ╫───┤ lookup table ("superclass") │
                    ╚═══╦═══════╝   ├─────────────────────────────┤
  ╔═══════════╗         ║           │ some other method           ┼─── function() ... end
  ║ metatable ║         ║           └─────────────────────────────┘
  ╟───────────╢   ┌─────╨──────────────────┐
  ║ __index   ╫───┤ lookup table ("class") │
  ╚═══╦═══════╝   ├────────────────────────┤
      ║           │ some method            ┼─── function() ... end
      ║           └────────────────────────┘
┌─────╨─────────────────┐
│ base table ("object") │
└───────────────────────┘

Note that a metatable is not the same as a class; it defines behavior, not methods. Conversely, if you try to use a class directly as a metatable, it will probably not do much. (This is pretty different from e.g. Python, where operator overloads are just methods with funny names. One nice thing about the Lua approach is that you can keep interface-like functionality separate from methods, and avoid clogging up arbitrary objects’ namespaces. You could even use a dummy table as a key and completely avoid name collisions.)

Anyway, code!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
local class = {
    foo = function(a)
        print("foo got", a)
    end,
}
local mt = { __index = class }
-- setmetatable returns its first argument, so this is nice shorthand
local obj1 = setmetatable({}, mt)
local obj2 = setmetatable({}, mt)
obj1.foo(7)  -- foo got 7
obj2.foo(9)  -- foo got 9

Wait, wait, hang on. Didn’t I call these methods? How do they get at the object? Maybe Lua has a magical this variable?

Methods, sort of

Not quite, but this is where the other key feature comes in: method-call syntax. It’s the lightest touch of sugar, just enough to have method invocation.

1
2
3
4
5
6
7
8
9
-- note the colon!
a:b(c, d, ...)

-- exactly equivalent to this
-- (except that `a` is only evaluated once)
a.b(a, c, d, ...)

-- which of course is really this
a["b"](a, c, d, ...)

Now we can write methods that actually do something.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
local class = {
    bar = function(self)
        print("our score is", self.score)
    end,
}
local mt = { __index = class }
local obj1 = setmetatable({ score = 13 }, mt)
local obj2 = setmetatable({ score = 25 }, mt)
obj1:bar()  -- our score is 13
obj2:bar()  -- our score is 25

And that’s all you need. Much like Python, methods and data live in the same namespace, and Lua doesn’t care whether obj:method() finds a function on obj or gets one from the metatable’s __index. Unlike Python, the function will be passed self either way, because self comes from the use of : rather than from the lookup behavior.

(Aside: strictly speaking, any Lua value can have a metatable — and if you try to index a non-table, Lua will always consult the metatable’s __index. Strings all have the string library as a metatable, so you can call methods on them: try ("%s %s"):format(1, 2). I don’t think Lua lets user code set the metatable for non-tables, so this isn’t that interesting, but if you’re writing Lua bindings from C then you can wrap your pointers in metatables to give them methods implemented in C.)

Bringing it all together

Of course, writing all this stuff every time is a little tedious and error-prone, so instead you might want to wrap it all up inside a little function. No problem.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
local function make_object(body)
    -- create a metatable
    local mt = { __index = body }
    -- create a base table to serve as the object itself
    local obj = setmetatable({}, mt)
    -- and, done
    return obj
end

-- you can leave off parens if you're only passing in 
local Dog = {
    -- this acts as a "default" value; if obj.barks is missing, __index will
    -- kick in and find this value on the class.  but if obj.barks is assigned
    -- to, it'll go in the object and shadow the value here.
    barks = 0,

    bark = function(self)
        self.barks = self.barks + 1
        print("woof!")
    end,
}

local mydog = make_object(Dog)
mydog:bark()  -- woof!
mydog:bark()  -- woof!
mydog:bark()  -- woof!
print(mydog.barks)  -- 3
print(Dog.barks)  -- 0

It works, but it’s fairly barebones. The nice thing is that you can extend it pretty much however you want. I won’t reproduce an entire serious object system here — lord knows there are enough of them floating around — but the implementation I have for my LÖVE games lets me do this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
local Animal = Object:extend{
    cries = 0,
}

-- called automatically by Object
function Animal:init()
    print("whoops i couldn't think of anything interesting to put here")
end

-- this is just nice syntax for adding a first argument called 'self', then
-- assigning this function to Animal.cry
function Animal:cry()
    self.cries = self.cries + 1
end

local Cat = Animal:extend{}

function Cat:cry()
    print("meow!")
    Cat.__super.cry(self)
end

local cat = Cat()
cat:cry()  -- meow!
cat:cry()  -- meow!
print(cat.cries)  -- 2

When I say you can extend it however you want, I mean that. I could’ve implemented Python (2)-style super(Cat, self):cry() syntax; I just never got around to it. I could even make it work with multiple inheritance if I really wanted to — or I could go the complete opposite direction and only implement composition. I could implement descriptors, customizing the behavior of individual table keys. I could add pretty decent syntax for composition/proxying. I am trying very hard to end this section now.

The Lua philosophy

Lua’s philosophy is to… not have a philosophy? It gives you the bare minimum to make objects work, and you can do absolutely whatever you want from there. Lua does have something resembling prototypical inheritance, but it’s not so much a first-class feature as an emergent property of some very simple tools. And since you can make __index be a function, you could avoid the prototypical behavior and do something different entirely.

The very severe downside, of course, is that you have to find or build your own object system — which can get pretty confusing very quickly, what with the multiple small moving parts. Third-party code may also have its own object system with subtly different behavior. (Though, in my experience, third-party code tries very hard to avoid needing an object system at all.)

It’s hard to say what the Lua “culture” is like, since Lua is an embedded language that’s often a little different in each environment. I imagine it has a thousand millicultures, instead. I can say that the tedium of building my own object model has led me into something very “traditional”, with prototypical inheritance and whatnot. It’s partly what I’m used to, but it’s also just really dang easy to get working.

Likewise, while I love properties in Python and use them all the dang time, I’ve yet to use a single one in Lua. They wouldn’t be particularly hard to add to my object model, but having to add them myself (or shop around for an object model with them and also port all my code to use it) adds a huge amount of friction. I’ve thought about designing an interesting ECS with custom object behavior, too, but… is it really worth the effort? For all the power and flexibility Lua offers, the cost is that by the time I have something working at all, I’m too exhausted to actually use any of it.

JavaScript

JavaScript is notable for being preposterously heavily used, yet not having a class block.

Well. Okay. Yes. It has one now. It didn’t for a very long time, and even the one it has now is sugar.

Here’s a vector class again:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
class Vector {
    constructor(x, y) {
        this.x = x;
        this.y = y;
    }

    get magnitude() {
        return Math.sqrt(this.x * this.x + this.y * this.y);
    }

    dot(other) {
        return this.x * other.x + this.y * other.y;
    }
}

In “classic” JavaScript, this would be written as:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
function Vector(x, y) {
    this.x = x;
    this.y = y;
}

Object.defineProperty(Vector.prototype, 'magnitude', {
    configurable: true,
    enumerable: true,
    get: function() {
        return Math.sqrt(this.x * this.x + this.y * this.y);
    },
});


Vector.prototype.dot = function(other) {
    return this.x * other.x + this.y * other.y;
};

Hm, yes. I can see why they added class.

The JavaScript model

In JavaScript, a new type is defined in terms of a function, which is its constructor.

Right away we get into trouble here. There is a very big difference between these two invocations, which I actually completely forgot about just now after spending four hours writing about Python and Lua:

1
2
let vec = Vector(3, 4);
let vec = new Vector(3, 4);

The first calls the function Vector. It assigns some properties to this, which here is going to be window, so now you have a global x and y. It then returns nothing, so vec is undefined.

The second calls Vector with this set to a new empty object, then evaluates to that object. The result is what you’d actually expect.

(You can detect this situation with the strange new.target expression, but I have never once remembered to do so.)

From here, we have true, honest-to-god, first-class prototypical inheritance. The word “prototype” is even right there. When you write this:

1
vec.dot(vec2)

JavaScript will look for dot on vec and (presumably) not find it. It then consults vecs prototype, an object you can see for yourself by using Object.getPrototypeOf(). Since vec is a Vector, its prototype is Vector.prototype.

I stress that Vector.prototype is not the prototype for Vector. It’s the prototype for instances of Vector.

(I say “instance”, but the true type of vec here is still just object. If you want to find Vector, it’s automatically assigned to the constructor property of its own prototype, so it’s available as vec.constructor.)

Of course, Vector.prototype can itself have a prototype, in which case the process would continue if dot were not found. A common (and, arguably, very bad) way to simulate single inheritance is to set Class.prototype to an instance of a superclass to get the prototype right, then tack on the methods for Class. Nowadays we can do Object.create(Superclass.prototype).

Now that I’ve been through Python and Lua, though, this isn’t particularly surprising. I kinda spoiled it.

I suppose one difference in JavaScript is that you can tack arbitrary attributes directly onto Vector all you like, and they will remain invisible to instances since they aren’t in the prototype chain. This is kind of backwards from Lua, where you can squirrel stuff away in the metatable.

Another difference is that every single object in JavaScript has a bunch of properties already tacked on — the ones in Object.prototype. Every object (and by “object” I mean any mapping) has a prototype, and that prototype defaults to Object.prototype, and it has a bunch of ancient junk like isPrototypeOf.

(Nit: it’s possible to explicitly create an object with no prototype via Object.create(null).)

Like Lua, and unlike Python, JavaScript doesn’t distinguish between keys found on an object and keys found via a prototype. Properties can be defined on prototypes with Object.defineProperty(), but that works just as well directly on an object, too. JavaScript doesn’t have a lot of operator overloading, but some things like Symbol.iterator also work on both objects and prototypes.

About this

You may, at this point, be wondering what this is. Unlike Lua and Python (and the last language below), this is a special built-in value — a context value, invisibly passed for every function call.

It’s determined by where the function came from. If the function was the result of an attribute lookup, then this is set to the object containing that attribute. Otherwise, this is set to the global object, window. (You can also set this to whatever you want via the call method on functions.)

This decision is made lexically, i.e. from the literal source code as written. There are no Python-style bound methods. In other words:

1
2
3
4
5
// this = obj
obj.method()
// this = window
let meth = obj.method
meth()

Also, because this is reassigned on every function call, it cannot be meaningfully closed over, which makes using closures within methods incredibly annoying. The old approach was to assign this to some other regular name like self (which got syntax highlighting since it’s also a built-in name in browsers); then we got Function.bind, which produced a callable thing with a fixed context value, which was kind of nice; and now finally we have arrow functions, which explicitly close over the current this when they’re defined and don’t change it when called. Phew.

Class syntax

I already showed class syntax, and it’s really just one big macro for doing all the prototype stuff The Right Way. It even prevents you from calling the type without new. The underlying model is exactly the same, and you can inspect all the parts.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
class Vector { ... }

console.log(Vector.prototype);  // { dot: ..., magnitude: ..., ... }
let vec = new Vector(3, 4);
console.log(Object.getPrototypeOf(vec));  // same as Vector.prototype

// i don't know why you would subclass vector but let's roll with it
class Vectest extends Vector { ... }

console.log(Vectest.prototype);  // { ... }
console.log(Object.getPrototypeOf(Vectest.prototype))  // same as Vector.prototype

Alas, class syntax has a couple shortcomings. You can’t use the class block to assign arbitrary data to either the type object or the prototype — apparently it was deemed too confusing that mutations would be shared among instances. Which… is… how prototypes work. How Python works. How JavaScript itself, one of the most popular languages of all time, has worked for twenty-two years. Argh.

You can still do whatever assignment you want outside of the class block, of course. It’s just a little ugly, and not something I’d think to look for with a sugary class.

A more subtle result of this behavior is that a class block isn’t quite the same syntax as an object literal. The check for data isn’t a runtime thing; class Foo { x: 3 } fails to parse. So JavaScript now has two largely but not entirely identical styles of key/value block.

Attribute access

Here’s where things start to come apart at the seams, just a little bit.

JavaScript doesn’t really have an attribute protocol. Instead, it has two… extension points, I suppose.

One is Object.defineProperty, seen above. For common cases, there’s also the get syntax inside a property literal, which does the same thing. But unlike Python’s @property, these aren’t wrappers around some simple primitives; they are the primitives. JavaScript is the only language of these four to have “property that runs code on access” as a completely separate first-class concept.

If you want to intercept arbitrary attribute access (and some kinds of operators), there’s a completely different primitive: the Proxy type. It doesn’t let you intercept attribute access or operators; instead, it produces a wrapper object that supports interception and defers to the wrapped object by default.

It’s cool to see composition used in this way, but also, extremely weird. If you want to make your own type that overloads in or calling, you have to return a Proxy that wraps your own type, rather than actually returning your own type. And (unlike the other three languages in this post) you can’t return a different type from a constructor, so you have to throw that away and produce objects only from a factory. And instanceof would be broken, but you can at least fix that with Symbol.hasInstance — which is really operator overloading, implement yet another completely different way.

I know the design here is a result of legacy and speed — if any object could intercept all attribute access, then all attribute access would be slowed down everywhere. Fair enough. It still leaves the surface area of the language a bit… bumpy?

The JavaScript philosophy

It’s a little hard to tell. The original idea of prototypes was interesting, but it was hidden behind some very awkward syntax. Since then, we’ve gotten a bunch of extra features awkwardly bolted on to reflect the wildly varied things the built-in types and DOM API were already doing. We have class syntax, but it’s been explicitly designed to avoid exposing the prototype parts of the model.

I admit I don’t do a lot of heavy JavaScript, so I might just be overlooking it, but I’ve seen virtually no code that makes use of any of the recent advances in object capabilities. Forget about custom iterators or overloading call; I can’t remember seeing any JavaScript in the wild that even uses properties yet. I don’t know if everyone’s waiting for sufficient browser support, nobody knows about them, or nobody cares.

The model has advanced recently, but I suspect JavaScript is still shackled to its legacy of “something about prototypes, I don’t really get it, just copy the other code that’s there” as an object model. Alas! Prototypes are so good. Hopefully class syntax will make it a bit more accessible, as it has in Python.

Perl 5

Perl 5 also doesn’t have an object system and expects you to build your own. But where Lua gives you two simple, powerful tools for building one, Perl 5 feels more like a puzzle with half the pieces missing. Clearly they were going for something, but they only gave you half of it.

In brief, a Perl object is a reference that has been blessed with a package.

I need to explain a few things. Honestly, one of the biggest problems with the original Perl object setup was how many strange corners and unique jargon you had to understand just to get off the ground.

(If you want to try running any of this code, you should stick a use v5.26; as the first line. Perl is very big on backwards compatibility, so you need to opt into breaking changes, and even the mundane say builtin is behind a feature gate.)

References

A reference in Perl is sort of like a pointer, but its main use is very different. See, Perl has the strange property that its data structures try very hard to spill their contents all over the place. Despite having dedicated syntax for arrays — @foo is an array variable, distinct from the single scalar variable $foo — it’s actually impossible to nest arrays.

1
2
3
my @foo = (1, 2, 3, 4);
my @bar = (@foo, @foo);
# @bar is now a flat list of eight items: 1, 2, 3, 4, 1, 2, 3, 4

The idea, I guess, is that an array is not one thing. It’s not a container, which happens to hold multiple things; it is multiple things. Anywhere that expects a single value, such as an array element, cannot contain an array, because an array fundamentally is not a single value.

And so we have “references”, which are a form of indirection, but also have the nice property that they’re single values. They add containment around arrays, and in general they make working with most of Perl’s primitive types much more sensible. A reference to a variable can be taken with the \ operator, or you can use [ ... ] and { ... } to directly create references to anonymous arrays or hashes.

1
2
3
my @foo = (1, 2, 3, 4);
my @bar = (\@foo, \@foo);
# @bar is now a nested list of two items: [1, 2, 3, 4], [1, 2, 3, 4]

(Incidentally, this is the sole reason I initially abandoned Perl for Python. Non-trivial software kinda requires nesting a lot of data structures, so you end up with references everywhere, and the syntax for going back and forth between a reference and its contents is tedious and ugly.)

A Perl object must be a reference. Perl doesn’t care what kind of reference — it’s usually a hash reference, since hashes are a convenient place to store arbitrary properties, but it could just as well be a reference to an array, a scalar, or even a sub (i.e. function) or filehandle.

I’m getting a little ahead of myself. First, the other half: blessing and packages.

Packages and blessing

Perl packages are just namespaces. A package looks like this:

1
2
3
4
5
6
7
package Foo::Bar;

sub quux {
    say "hi from quux!";
}

# now Foo::Bar::quux() can be called from anywhere

Nothing shocking, right? It’s just a named container. A lot of the details are kind of weird, like how a package exists in some liminal quasi-value space, but the basic idea is a Bag Of Stuff.

The final piece is “blessing,” which is Perl’s funny name for binding a package to a reference. A very basic class might look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
package Vector;

# the name 'new' is convention, not special
sub new {
    # perl argument passing is weird, don't ask
    my ($class, $x, $y) = @_;

    # create the object itself -- here, unusually, an array reference makes sense
    my $self = [ $x, $y ];

    # associate the package with that reference
    # note that $class here is just the regular string, 'Vector'
    bless $self, $class;

    return $self;
}

sub x {
    my ($self) = @_;
    return $self->[0];
}

sub y {
    my ($self) = @_;
    return $self->[1];
}

sub magnitude {
    my ($self) = @_;
    return sqrt($self->x ** 2 + $self->y ** 2);
}

# switch back to the "default" package
package main;

# -> is method call syntax, which passes the invocant as the first argument;
# for a package, that's just the package name
my $vec = Vector->new(3, 4);
say $vec->magnitude;  # 5

A few things of note here. First, $self->[0] has nothing to do with objects; it’s normal syntax for getting the value of a index 0 out of an array reference called $self. (Most classes are based on hashrefs and would use $self->{value} instead.) A blessed reference is still a reference and can be treated like one.

In general, -> is Perl’s dereferencey operator, but its exact behavior depends on what follows. If it’s followed by brackets, then it’ll apply the brackets to the thing in the reference: ->{} to index a hash reference, ->[] to index an array reference, and ->() to call a function reference.

But if -> is followed by an identifier, then it’s a method call. For packages, that means calling a function in the package and passing the package name as the first argument. For objects — blessed references — that means calling a function in the associated package and passing the object as the first argument.

This is a little weird! A blessed reference is a superposition of two things: its normal reference behavior, and some completely orthogonal object behavior. Also, object behavior has no notion of methods vs data; it only knows about methods. Perl lets you omit parentheses in a lot of places, including when calling a method with no arguments, so $vec->magnitude is really $vec->magnitude().

Perl’s blessing bears some similarities to Lua’s metatables, but ultimately Perl is much closer to Ruby’s “message passing” approach than the above three languages’ approaches of “get me something and maybe it’ll be callable”. (But this is no surprise — Ruby is a spiritual successor to Perl 5.)

All of this leads to one little wrinkle: how do you actually expose data? Above, I had to write x and y methods. Am I supposed to do that for every single attribute on my type?

Yes! But don’t worry, there are third-party modules to help with this incredibly fundamental task. Take Class::Accessor::Fast, so named because it’s faster than Class::Accessor:

1
2
3
package Foo;
use base qw(Class::Accessor::Fast);
__PACKAGE__->mk_accessors(qw(fred wilma barney));

(__PACKAGE__ is the lexical name of the current package; qw(...) is a list literal that splits its contents on whitespace.)

This assumes you’re using a hashref with keys of the same names as the attributes. $obj->fred will return the fred key from your hashref, and $obj->fred(4) will change it to 4.

You also, somewhat bizarrely, have to inherit from Class::Accessor::Fast. Speaking of which,

Inheritance

Inheritance is done by populating the package-global @ISA array with some number of (string) names of parent packages. Most code instead opts to write use base ...;, which does the same thing. Or, more commonly, use parent ...;, which… also… does the same thing.

Every package implicitly inherits from UNIVERSAL, which can be freely modified by Perl code.

A method can call its superclass method with the SUPER:: pseudo-package:

1
2
3
4
sub foo {
    my ($self) = @_;
    $self->SUPER::foo;
}

However, this does a depth-first search, which means it almost certainly does the wrong thing when faced with multiple inheritance. For a while the accepted solution involved a third-party module, but Perl eventually grew an alternative you have to opt into: C3, which may be more familiar to you as the order Python uses.

1
2
3
4
5
6
use mro 'c3';

sub foo {
    my ($self) = @_;
    $self->next::method;
}

Offhand, I’m not actually sure how next::method works, seeing as it was originally implemented in pure Perl code. I suspect it involves peeking at the caller’s stack frame. If so, then this is a very different style of customizability from e.g. Python — the MRO was never intended to be pluggable, and the use of a special pseudo-package means it isn’t really, but someone was determined enough to make it happen anyway.

Operator overloading and whatnot

Operator overloading looks a little weird, though really it’s pretty standard Perl.

1
2
3
4
5
6
7
8
package MyClass;

use overload '+' => \&_add;

sub _add {
    my ($self, $other, $swap) = @_;
    ...
}

use overload here is a pragma, where “pragma” means “regular-ass module that does some wizardry when imported”.

\&_add is how you get a reference to the _add sub so you can pass it to the overload module. If you just said &_add or _add, that would call it.

And that’s it; you just pass a map of operators to functions to this built-in module. No worry about name clashes or pollution, which is pretty nice. You don’t even have to give references to functions that live in the package, if you don’t want them to clog your namespace; you could put them in another package, or even inline them anonymously.

One especially interesting thing is that Perl lets you overload every operator. Perl has a lot of operators. It considers some math builtins like sqrt and trig functions to be operators, or at least operator-y enough that you can overload them. You can also overload the “file text” operators, such as -e $path to test whether a file exists. You can overload conversions, including implicit conversion to a regex. And most fascinating to me, you can overload dereferencing — that is, the thing Perl does when you say $hashref->{key} to get at the underlying hash. So a single object could pretend to be references of multiple different types, including a subref to implement callability. Neat.

Somewhat related: you can overload basic operators (indexing, etc.) on basic types (not references!) with the tie function, which is designed completely differently and looks for methods with fixed names. Go figure.

You can intercept calls to nonexistent methods by implementing a function called AUTOLOAD, within which the $AUTOLOAD global will contain the name of the method being called. Originally this feature was, I think, intended for loading binary components or large libraries on-the-fly only when needed, hence the name. Offhand I’m not sure I ever saw it used the way __getattr__ is used in Python.

Is there a way to intercept all method calls? I don’t think so, but it is Perl, so I must be forgetting something.

Actually no one does this any more

Like a decade ago, a council of elder sages sat down and put together a whole whizbang system that covers all of it: Moose.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
package Vector;
use Moose;

has x => (is => 'rw', isa => 'Int');
has y => (is => 'rw', isa => 'Int');

sub magnitude {
    my ($self) = @_;
    return sqrt($self->x ** 2 + $self->y ** 2);
}

Moose has its own way to do pretty much everything, and it’s all built on the same primitives. Moose also adds metaclasses, somehow, despite that the underlying model doesn’t actually support them? I’m not entirely sure how they managed that, but I do remember doing some class introspection with Moose and it was much nicer than the built-in way.

(If you’re wondering, the built-in way begins with looking at the hash called %Vector::. No, that’s not a typo.)

I really cannot stress enough just how much stuff Moose does, but I don’t want to delve into it here since Moose itself is not actually the language model.

The Perl philosophy

I hope you can see what I meant with what I first said about Perl, now. It has multiple inheritance with an MRO, but uses the wrong one by default. It has extensive operator overloading, which looks nothing like how inheritance works, and also some of it uses a totally different mechanism with special method names instead. It only understands methods, not data, leaving you to figure out accessors by hand.

There’s 70% of an object system here with a clear general design it was gunning for, but none of the pieces really look anything like each other. It’s weird, in a distinctly Perl way.

The result is certainly flexible, at least! It’s especially cool that you can use whatever kind of reference you want for storage, though even as I say that, I acknowledge it’s no different from simply subclassing list or something in Python. It feels different in Perl, but maybe only because it looks so different.

I haven’t written much Perl in a long time, so I don’t know what the community is like any more. Moose was already ubiquitous when I left, which you’d think would let me say “the community mostly focuses on the stuff Moose can do” — but even a decade ago, Moose could already do far more than I had ever seen done by hand in Perl. It’s always made a big deal out of roles (read: interfaces), for instance, despite that I’d never seen anyone care about them in Perl before Moose came along. Maybe their presence in Moose has made them more popular? Who knows.

Also, I wrote Perl seriously, but in the intervening years I’ve only encountered people who only ever used Perl for one-offs. Maybe it’ll come as a surprise to a lot of readers that Perl has an object model at all.

End

Well, that was fun! I hope any of that made sense.

Special mention goes to Rust, which doesn’t have an object model you can fiddle with at runtime, but does do things a little differently.

It’s been really interesting thinking about how tiny differences make a huge impact on what people do in practice. Take the choice of storage in Perl versus Python. Perl’s massively common URI class uses a string as the storage, nothing else; I haven’t seen anything like that in Python aside from markupsafe, which is specifically designed as a string type. I would guess this is partly because Perl makes you choose — using a hashref is an obvious default, but you have to make that choice one way or the other. In Python (especially 3), inheriting from object and getting dict-based storage is the obvious thing to do; the ability to use another type isn’t quite so obvious, and doing it “right” involves a tiny bit of extra work.

Or, consider that Lua could have descriptors, but the extra bit of work (especially design work) has been enough of an impediment that I’ve never implemented them. I don’t think the object implementations I’ve looked at have included them, either. Super weird!

In that light, it’s only natural that objects would be so strongly associated with the features Java and C++ attach to them. I think that makes it all the more important to play around! Look at what Moose has done. No, really, you should bear in mind my description of how Perl does stuff and flip through the Moose documentation. It’s amazing what they’ve built.

What’s the Best Solution for Managing Digital Photos and Videos?

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/discovering-best-solution-for-photo-video-backup/

Digital Asset Management (DAM)

If you have spent any time, as we have, talking to photographers and videographers about how they back up and archive their digital photos and videos, then you know that there’s no one answer or solution that users have discovered to meet their needs.

Based on what we’ve heard, visual media artists are still searching for the best combination of software, hardware, and cloud storage to preserve their media, and to be able to search, retrieve, and reuse that media as easily as possible.

Yes, there are a number of solutions out there, and some users have created combinations of hardware, software, and services to meet their needs, but we have met few who claim to be satisfied with their solution for digital asset management (DAM), or expect that they will be using the same solution in just a year or two.

We’d like to open a dialog with professionals and serious amateurs to learn more about what you’re doing, what you’d like to do, and how Backblaze might fit into that solution.

We have a bit of cred in this field, as we currently have hundreds of petabytes of digital media files in our data centers from users of Backblaze Backup and Backblaze B2 Cloud Storage. We want to make our cloud services as useful as possible for photographers and videographers.

Tell Us Both Your Current Solution and Your Dream Solution

To get started, we’d love to hear from you about how you’re managing your photos and videos. Whether you’re an amateur or a professional, your experiences are valuable and will help us understand how to provide the best cloud component of a digital asset management solution.

Here are some questions to consider:

  • Are you using direct-attached drives, NAS (Network-Attached Storage), or offline storage for your media?
  • Do you use the cloud for media you’re actively working on?
  • Do you back up or archive to the cloud?
  • Did you have a catalog or record of the media that you’ve archived that you use to search and retrieve media?
  • What’s different about how you work in the field (or traveling) versus how you work in a studio (or at home)?
  • What software and/or hardware currently works for you?
  • What’s the biggest impediment to working in the way you’d really like to?
  • How could the cloud work better for you?

Please Contribute Your Ideas

To contribute, please answer the following two questions in the comments below or send an email to [email protected]. Please comment or email your response by December 22, 2017.

  1. How are you currently backing up your digital photos, video files, and/or file libraries/catalogs? Do you have a backup system that uses attached drives, a local network, the cloud, or offline storage media? Does it work well for you?
  2. Imagine your ideal digital asset backup setup. What would it look like? Don’t be constrained by current products, technologies, brands, or solutions. Invent a technology or product if you wish. Describe an ideal system that would work the way you want it to.

We know you have opinions about managing photos and videos. Bring them on!

We’re soliciting answers far and wide from amateurs and experts, weekend video makers and well-known professional photographers. We have a few amateur and professional photographers and videographers here at Backblaze, and they are contributing their comments, as well.

Once we have gathered all the responses, we’ll write a post on what we learned about how people are currently working and what they would do if anything were possible. Look for that post after the beginning of the year.

Don’t Miss Future Posts on Media Management

We don’t want you to miss our future posts on photography, videography, and digital asset management. To receive email notices of blog updates (and no spam, we promise), enter your email address above using the Join button at the top of the page.

Come Back on Thursday for our Photography Post (and a Special Giveaway, too)

This coming Thursday we’ll have a blog post about the different ways that photographers and videographers are currently managing their digital media assets.

Plus, you’ll have the chance to win a valuable hardware/software combination for digital media management that I am sure you will appreciate. (You’ll have to wait until Thursday to find out what the prize is, but it has a total value of over $700.)

Past Posts on Photography, Videography, and Digital Asset Management

We’ve written a number of blog posts about photos, videos, and managing digital assets. We’ve posted links to some of them below.

Four Tips To Help Photographers and Videographers Get The Most From B2

Four Tips To Help Photographers and Videographers Get The Most From B2

How to Back Up Your Mac’s Photos Library

How to Back Up Your Mac’s Photos Library

How To Back Up Your Flickr Library

How To Back Up Your Flickr Library

Getting Video Archives Out of Your Closet

Getting Video Archives Out of Your Closet

B2 Cloud Storage Roundup

B2 Cloud Storage Roundup

Backing Up Photos While Traveling

Backing up photos while traveling – feedback

Should I Use an External Drive for Backup?

Should I use an external drive for backup?

How to Connect your Synology NAS to B2

How to Connect your Synology NAS to B2

The post What’s the Best Solution for Managing Digital Photos and Videos? appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Raspberry Pi clusters come of age

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/raspberry-pi-clusters-come-of-age/

In today’s guest post, Bruce Tulloch, CEO and Managing Director of BitScope Designs, discusses the uses of cluster computing with the Raspberry Pi, and the recent pilot of the Los Alamos National Laboratory 3000-Pi cluster built with the BitScope Blade.

Raspberry Pi cluster

High-performance computing and Raspberry Pi are not normally uttered in the same breath, but Los Alamos National Laboratory is building a Raspberry Pi cluster with 3000 cores as a pilot before scaling up to 40 000 cores or more next year.

That’s amazing, but why?

I was asked this question more than any other at The International Conference for High-Performance Computing, Networking, Storage and Analysis in Denver last week, where one of the Los Alamos Raspberry Pi Cluster Modules was on display at the University of New Mexico’s Center for Advanced Research Computing booth.

The short answer to this question is: the Raspberry Pi cluster enables Los Alamos National Laboratory (LANL) to conduct exascale computing R&D.

The Pi cluster breadboard

Exascale refers to computing systems at least 50 times faster than the most powerful supercomputers in use today. The problem faced by LANL and similar labs building these things is one of scale. To get the required performance, you need a lot of nodes, and to make it work, you need a lot of R&D.

However, there’s a catch-22: how do you write the operating systems, networks stacks, launch and boot systems for such large computers without having one on which to test it all? Use an existing supercomputer? No — the existing large clusters are fully booked 24/7 doing science, they cost millions of dollars per year to run, and they may not have the architecture you need for your next-generation machine anyway. Older machines retired from science may be available, but at this scale they cost far too much to use and are usually very hard to maintain.

The Los Alamos solution? Build a “model supercomputer” with Raspberry Pi!

Think of it as a “cluster development breadboard”.

The idea is to design, develop, debug, and test new network architectures and systems software on the “breadboard”, but at a scale equivalent to the production machines you’re currently building. Raspberry Pi may be a small computer, but it can run most of the system software stacks that production machines use, and the ratios of its CPU speed, local memory, and network bandwidth scale proportionately to the big machines, much like an architect’s model does when building a new house. To learn more about the project, see the news conference and this interview with insideHPC at SC17.

Traditional Raspberry Pi clusters

Like most people, we love a good cluster! People have been building them with Raspberry Pi since the beginning, because it’s inexpensive, educational, and fun. They’ve been built with the original Pi, Pi 2, Pi 3, and even the Pi Zero, but none of these clusters have proven to be particularly practical.

That’s not stopped them being useful though! I saw quite a few Raspberry Pi clusters at the conference last week.

One tiny one that caught my eye was from the people at openio.io, who used a small Raspberry Pi Zero W cluster to demonstrate their scalable software-defined object storage platform, which on big machines is used to manage petabytes of data, but which is so lightweight that it runs just fine on this:

Raspberry Pi Zero cluster

There was another appealing example at the ARM booth, where the Berkeley Labs’ singularity container platform was demonstrated running very effectively on a small cluster built with Raspberry Pi 3s.

Raspberry Pi 3 cluster demo at a conference stall

My show favourite was from the Edinburgh Parallel Computing Center (EPCC): Nick Brown used a cluster of Pi 3s to explain supercomputers to kids with an engaging interactive application. The idea was that visitors to the stand design an aircraft wing, simulate it across the cluster, and work out whether an aircraft that uses the new wing could fly from Edinburgh to New York on a full tank of fuel. Mine made it, fortunately!

Raspberry Pi 3 cluster demo at a conference stall

Next-generation Raspberry Pi clusters

We’ve been building small-scale industrial-strength Raspberry Pi clusters for a while now with BitScope Blade.

When Los Alamos National Laboratory approached us via HPC provider SICORP with a request to build a cluster comprising many thousands of nodes, we considered all the options very carefully. It needed to be dense, reliable, low-power, and easy to configure and to build. It did not need to “do science”, but it did need to work in almost every other way as a full-scale HPC cluster would.

Some people argue Compute Module 3 is the ideal cluster building block. It’s very small and just as powerful as Raspberry Pi 3, so one could, in theory, pack a lot of them into a very small space. However, there are very good reasons no one has ever successfully done this. For a start, you need to build your own network fabric and I/O, and cooling the CM3s, especially when densely packed in a cluster, is tricky given their tiny size. There’s very little room for heatsinks, and the tiny PCBs dissipate very little excess heat.

Instead, we saw the potential for Raspberry Pi 3 itself to be used to build “industrial-strength clusters” with BitScope Blade. It works best when the Pis are properly mounted, powered reliably, and cooled effectively. It’s important to avoid using micro SD cards and to connect the nodes using wired networks. It has the added benefit of coming with lots of “free” USB I/O, and the Pi 3 PCB, when mounted with the correct air-flow, is a remarkably good heatsink.

When Gordon announced netboot support, we became convinced the Raspberry Pi 3 was the ideal candidate when used with standard switches. We’d been making smaller clusters for a while, but netboot made larger ones practical. Assembling them all into compact units that fit into existing racks with multiple 10 Gb uplinks is the solution that meets LANL’s needs. This is a 60-node cluster pack with a pair of managed switches by Ubiquiti in testing in the BitScope Lab:

60-node Raspberry Pi cluster pack

Two of these packs, built with Blade Quattro, and one smaller one comprising 30 nodes, built with Blade Duo, are the components of the Cluster Module we exhibited at the show. Five of these modules are going into Los Alamos National Laboratory for their pilot as I write this.

Bruce Tulloch at a conference stand with a demo of the Raspberry Pi cluster for LANL

It’s not only research clusters like this for which Raspberry Pi is well suited. You can build very reliable local cloud computing and data centre solutions for research, education, and even some industrial applications. You’re not going to get much heavy-duty science, big data analytics, AI, or serious number crunching done on one of these, but it is quite amazing to see just how useful Raspberry Pi clusters can be for other purposes, whether it’s software-defined networks, lightweight MaaS, SaaS, PaaS, or FaaS solutions, distributed storage, edge computing, industrial IoT, and of course, education in all things cluster and parallel computing. For one live example, check out Mythic Beasts’ educational compute cloud, built with Raspberry Pi 3.

For more information about Raspberry Pi clusters, drop by BitScope Clusters.

I’ll read and respond to your thoughts in the comments below this post too.

Editor’s note:

Here is a photo of Bruce wearing a jetpack. Cool, right?!

Bruce Tulloch wearing a jetpack

The post Raspberry Pi clusters come of age appeared first on Raspberry Pi.

UI Testing at Scale with AWS Lambda

Post Syndicated from Stas Neyman original https://aws.amazon.com/blogs/devops/ui-testing-at-scale-with-aws-lambda/

This is a guest blog post by Wes Couch and Kurt Waechter from the Blackboard Internal Product Development team about their experience using AWS Lambda.

One year ago, one of our UI test suites took hours to run. Last month, it took 16 minutes. Today, it takes 39 seconds. Here’s how we did it.

The backstory:

Blackboard is a global leader in delivering robust and innovative education software and services to clients in higher education, government, K12, and corporate training. We have a large product development team working across the globe in at least 10 different time zones, with an internal tools team providing support for quality and workflows. We have been using Selenium Webdriver to perform automated cross-browser UI testing since 2007. Because we are now practicing continuous delivery, the automated UI testing challenge has grown due to the faster release schedule. On top of that, every commit made to each branch triggers an execution of our automated UI test suite. If you have ever implemented an automated UI testing infrastructure, you know that it can be very challenging to scale and maintain. Although there are services that are useful for testing different browser/OS combinations, they don’t meet our scale needs.

It used to take three hours to synchronously run our functional UI suite, which revealed the obvious need for parallel execution. Previously, we used Mesos to orchestrate a Selenium Grid Docker container for each test run. This way, we were able to run eight concurrent threads for test execution, which took an average of 16 minutes. Although this setup is fine for a single workflow, the cracks started to show when we reached the scale required for Blackboard’s mature product lines. Going beyond eight concurrent sessions on a single container introduced performance problems that impact the reliability of tests (for example, issues in Webdriver or the browser popping up frequently). We tried Mesos and considered Kubernetes for Selenium Grid orchestration, but the answer to scaling a Selenium Grid was to think smaller, not larger. This led to our breakthrough with AWS Lambda.

The solution:

We started using AWS Lambda for UI testing because it doesn’t require costly infrastructure or countless man hours to maintain. The steps we outline in this blog post took one work day, from inception to implementation. By simply packaging the UI test suite into a Lambda function, we can execute these tests in parallel on a massive scale. We use a custom JUnit test runner that invokes the Lambda function with a request to run each test from the suite. The runner then aggregates the results returned from each Lambda test execution.

Selenium is the industry standard for testing UI at scale. Although there are other options to achieve the same thing in Lambda, we chose this mature suite of tools. Selenium is backed by Google, Firefox, and others to help the industry drive their browsers with code. This makes Lambda and Selenium a compelling stack for achieving UI testing at scale.

Making Chrome Run in Lambda

Currently, Chrome for Linux will not run in Lambda due to an absent mount point. By rebuilding Chrome with a slight modification, as Marco Lüthy originally demonstrated, you can run it inside Lambda anyway! It took about two hours to build the current master branch of Chromium to build on a c4.4xlarge. Unfortunately, the current version of ChromeDriver, 2.33, does not support any version of Chrome above 62, so we’ll be using Marco’s modified version of version 60 for the near future.

Required System Libraries

The Lambda runtime environment comes with a subset of common shared libraries. This means we need to include some extra libraries to get Chrome and ChromeDriver to work. Anything that exists in the java resources folder during compile time is included in the base directory of the compiled jar file. When this jar file is deployed to Lambda, it is placed in the /var/task/ directory. This allows us to simply place the libraries in the java resources folder under a folder named lib/ so they are right where they need to be when the Lambda function is invoked.

To get these libraries, create an EC2 instance and choose the Amazon Linux AMI.

Next, use ssh to connect to the server. After you connect to the new instance, search for the libraries to find their locations.

sudo find / -name libgconf-2.so.4
sudo find / -name libORBit-2.so.0

Now that you have the locations of the libraries, copy these files from the EC2 instance and place them in the java resources folder under lib/.

Packaging the Tests

To deploy the test suite to Lambda, we used a simple Gradle tool called ShadowJar, which is similar to the Maven Shade Plugin. It packages the libraries and dependencies inside the jar that is built. Usually test dependencies and sources aren’t included in a jar, but for this instance we want to include them. To include the test dependencies, add this section to the build.gradle file.

shadowJar {
   from sourceSets.test.output
   configurations = [project.configurations.testRuntime]
}

Deploying the Test Suite

Now that our tests are packaged with the dependencies in a jar, we need to get them into a running Lambda function. We use  simple SAM  templates to upload the packaged jar into S3, and then deploy it to Lambda with our settings.

{
   "AWSTemplateFormatVersion": "2010-09-09",
   "Transform": "AWS::Serverless-2016-10-31",
   "Resources": {
       "LambdaTestHandler": {
           "Type": "AWS::Serverless::Function",
           "Properties": {
               "CodeUri": "./build/libs/your-test-jar-all.jar",
               "Runtime": "java8",
               "Handler": "com.example.LambdaTestHandler::handleRequest",
               "Role": "<YourLambdaRoleArn>",
               "Timeout": 300,
               "MemorySize": 1536
           }
       }
   }
}

We use the maximum timeout available to ensure our tests have plenty of time to run. We also use the maximum memory size because this ensures our Lambda function can support Chrome and other resources required to run a UI test.

Specifying the handler is important because this class executes the desired test. The test handler should be able to receive a test class and method. With this information it will then execute the test and respond with the results.

public LambdaTestResult handleRequest(TestRequest testRequest, Context context) {
   LoggerContainer.LOGGER = new Logger(context.getLogger());
  
   BlockJUnit4ClassRunner runner = getRunnerForSingleTest(testRequest);
  
   Result result = new JUnitCore().run(runner);

   return new LambdaTestResult(result);
}

Creating a Lambda-Compatible ChromeDriver

We provide developers with an easily accessible ChromeDriver for local test writing and debugging. When we are running tests on AWS, we have configured ChromeDriver to run them in Lambda.

To configure ChromeDriver, we first need to tell ChromeDriver where to find the Chrome binary. Because we know that ChromeDriver is going to be unzipped into the root task directory, we should point the ChromeDriver configuration at that location.

The settings for getting ChromeDriver running are mostly related to Chrome, which must have its working directories pointed at the tmp/ folder.

Start with the default DesiredCapabilities for ChromeDriver, and then add the following settings to enable your ChromeDriver to start in Lambda.

public ChromeDriver createLambdaChromeDriver() {
   ChromeOptions options = new ChromeOptions();

   // Set the location of the chrome binary from the resources folder
   options.setBinary("/var/task/chrome");

   // Include these settings to allow Chrome to run in Lambda
   options.addArguments("--disable-gpu");
   options.addArguments("--headless");
   options.addArguments("--window-size=1366,768");
   options.addArguments("--single-process");
   options.addArguments("--no-sandbox");
   options.addArguments("--user-data-dir=/tmp/user-data");
   options.addArguments("--data-path=/tmp/data-path");
   options.addArguments("--homedir=/tmp");
   options.addArguments("--disk-cache-dir=/tmp/cache-dir");
  
   DesiredCapabilities desiredCapabilities = DesiredCapabilities.chrome();
   desiredCapabilities.setCapability(ChromeOptions.CAPABILITY, options);
  
   return new ChromeDriver(desiredCapabilities);
}

Executing Tests in Parallel

You can approach parallel test execution in Lambda in many different ways. Your approach depends on the structure and design of your test suite. For our solution, we implemented a custom test runner that uses reflection and JUnit libraries to create a list of test cases we want run. When we have the list, we create a TestRequest object to pass into the Lambda function that we have deployed. In this TestRequest, we place the class name, test method, and the test run identifier. When the Lambda function receives this TestRequest, our LambdaTestHandler generates and runs the JUnit test. After the test is complete, the test result is sent to the test runner. The test runner compiles a result after all of the tests are complete. By executing the same Lambda function multiple times with different test requests, we can effectively run the entire test suite in parallel.

To get screenshots and other test data, we pipe those files during test execution to an S3 bucket under the test run identifier prefix. When the tests are complete, we link the files to each test execution in the report generated from the test run. This lets us easily investigate test executions.

Pro Tip: Dynamically Loading Binaries

AWS Lambda has a limit of 250 MB of uncompressed space for packaged Lambda functions. Because we have libraries and other dependencies to our test suite, we hit this limit when we tried to upload a function that contained Chrome and ChromeDriver (~140 MB). This test suite was not originally intended to be used with Lambda. Otherwise, we would have scrutinized some of the included libraries. To get around this limit, we used the Lambda functions temporary directory, which allows up to 500 MB of space at runtime. Downloading these binaries at runtime moves some of that space requirement into the temporary directory. This allows more room for libraries and dependencies. You can do this by grabbing Chrome and ChromeDriver from an S3 bucket and marking them as executable using built-in Java libraries. If you take this route, be sure to point to the new location for these executables in order to create a ChromeDriver.

private static void downloadS3ObjectToExecutableFile(String key) throws IOException {
   File file = new File("/tmp/" + key);

   GetObjectRequest request = new GetObjectRequest("s3-bucket-name", key);

   FileUtils.copyInputStreamToFile(s3client.getObject(request).getObjectContent(), file);
   file.setExecutable(true);
}

Lambda-Selenium Project Source

We have compiled an open source example that you can grab from the Blackboard Github repository. Grab the code and try it out!

https://blackboard.github.io/lambda-selenium/

Conclusion

One year ago, one of our UI test suites took hours to run. Last month, it took 16 minutes. Today, it takes 39 seconds. Thanks to AWS Lambda, we can reduce our build times and perform automated UI testing at scale!