Friday Squid Blogging: A Tracking Device for Squid

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2019/02/friday_squid_bl_664.html

Really:

After years of “making do” with the available technology for his squid studies, Mooney created a versatile tag that allows him to research squid behavior. With the help of Kakani Katija, an engineer adapting the tag for jellyfish at California’s Monterey Bay Aquarium Research Institute (MBARI), Mooney’s team is creating a replicable system flexible enough to work across a range of soft-bodied marine animals. As Mooney and Katija refine the tags, they plan to produce an adaptable, open-source package that scientists researching other marine invertebrates can also use.

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

Read my blog posting guidelines here.

Spark enhancements for elasticity and resiliency on Amazon EMR

Post Syndicated from Udit Mehrotra original https://aws.amazon.com/blogs/big-data/spark-enhancements-for-elasticity-and-resiliency-on-amazon-emr/

Customers take advantage of the elasticity in Amazon EMR to save costs by scaling in clusters when workflows are completed, or when running lighter jobs. This also applies to launching clusters with low-cost Amazon EC2 spot instances.

The Automatic Scaling feature in Amazon EMR lets customers dynamically scale clusters in and out, based on cluster usage or other job-related metrics. These features help you use resources efficiently, but they can also cause EC2 instances to shut down in the middle of a running job. This could result in the loss of computation and data, which can affect the stability of the job or result in duplicate work through recomputing.

To gracefully shut down nodes without affecting running jobs, Amazon EMR uses Apache Hadoop‘s decommissioning mechanism, which the Amazon EMR team developed and contributed back to the community. This works well for most Hadoop workloads, but not so much for Apache Spark. Spark currently faces various shortcomings while dealing with node loss. This can cause jobs to get stuck trying to recover and recompute lost tasks and data, and in some cases eventually crashing the job. For more information about some of the open issues in Spark, see the following links:

To avoid some of these issues and help customers take full advantage of Amazon EMR’s elasticity features with Spark, Amazon EMR has customizations to open-source Spark that make it more resilient to node loss. Recomputation is minimized, and jobs can recover faster from node failures and EC2 instance termination. These improvements are in Amazon EMR release version 5.9.0 and later.

This blog post provides an overview of the issues with how open-source Spark handles node loss and the improvements in Amazon EMR to address the issues.

How Spark handles node loss

When a node goes down during an active Spark job, it has the following risks:

  • Tasks that are actively running on the node might fail to complete and have to run on another node.
  • Cached RDDs (resilient distributed dataset) on the node might be lost. While this does impact performance, it does not cause failures or impact the stability of the application.
  • Shuffle output files in memory, or those written to disk on the node, would be lost. Because Amazon EMR enables the External Shuffle Service by default, the shuffle output is written to disk. Losing shuffle files can bring the application to a halt until they are recomputed on another active node, because future tasks might depend on them. For more information about shuffle operations, see Shuffle operations.

To recover from node loss, Spark should be able to do the following:

  • If actively running tasks are lost, they must be scheduled on another node. In addition, computing for the unscheduled remaining tasks must resume.
  • Shuffle output that was computed on the lost node must be recomputed by re-executing the tasks that produced those shuffle blocks.

The following is the sequence of events for Spark to recover when a node is lost:

  • Spark considers actively running tasks on the node as failed and reruns them on another active node.
  • If the node had shuffle output files that are needed by future tasks, the target executors on other active nodes get a FetchFailedException while trying to fetch missing shuffle blocks from the failed node.
  • When the FetchFailedException happens, the target executors retry fetching the blocks from the failed node for a time determined by the spark.shuffle.io.maxRetries and spark.shuffle.io.retryWait configuration values. After all the retry attempts are exhausted, the failure is propagated to the driver.
  • When the driver receives the FetchFailedException, it marks the currently running shuffle stage during which the failure occurred as failed and stops its execution. It also marks the shuffle output on the node or executors from which shuffle blocks could not be fetched as unavailable/lost, so that they can be recomputed. This triggers the previous Map stage to re-attempt recomputing those missing shuffle blocks.
  • After the missing shuffle output is computed, a re-attempt of the failed shuffle stage is triggered to resume the job from where it stopped. It then runs tasks that failed or had not been scheduled yet.

Issues with Spark’s handling of node loss

Spark’s recovery process helps it recover random executor and node failures that can occur in any cloud environment. However, the recovery process begins only after the node has already failed and Spark gets a FetchFailedException while trying to fetch shuffle blocks. This causes some of the issues described in this section.

Amazon EMR can begin the recovery early, as it knows when and which nodes are going down because of a manual resize, an EC2-triggered Spot instance termination, or an automatic scaling event. It can inform Spark immediately about these nodes, so that Spark can take pro-active actions to gracefully handle loss of nodes and start recovery early. However, Spark currently does not have any mechanism through which it can be notified that a node is going down, such as YARN decommissioning. Therefore, it can not take immediate and relevant actions to help recover faster. As a result, here are some of the issues with Spark’s recovery:

  • The node goes down in the middle of the Map stage, as shown in the following diagram:

In this scenario, the shuffle stage is scheduled unnecessarily, and the application must wait for the FetchFailedException before recomputing the lost shuffle. This takes a lot of time. Instead, it would be better if all lost shuffles could be immediately recomputed in the Map stage before even proceeding to the shuffle stage.

  • The node goes down in the middle of a shuffle stage, as shown in the following diagram:

If there was way to immediately inform Spark about node loss, instead of it depending on FetchFailedException and retry fetching, that would save on recovery time.

  • The Spark driver starts recomputation when it gets the first FetchFailedException. It considers the shuffle files on the lost node as missing. However, if multiple nodes go down at the same time, in its first re-attempt of the previous Map stage, the Spark driver recomputes only the shuffle output for the first node from which it received a FetchFailedException. In the short time between receiving the first fetch failure and starting the re-attempt, it is possible that the driver receives fetch failures from other failed nodes. As a result, it can recompute shuffles for multiple lost nodes in the same re-attempt, but there is no guarantee.

    In most cases, even though nodes go down at the same time, Spark requires multiple re-attempts of the map and shuffle stages to recompute all of the lost shuffle output. This can easily cause a job to be blocked for a significant amount of time. Ideally, Spark could recompute in only one retry the shuffle output on all nodes that were lost around the same time.

  • As long as it can reach a node that is about to go down, Spark can continue to schedule more tasks on it. This causes more shuffle outputs to be computed, which may eventually need to be recomputed. Ideally, these tasks can be redirected to healthy nodes to prevent recomputation and improve recovery time.
  • Spark has a limit on the number of consecutive failed attempts allowed for a stage before it aborts a job. This is configurable with spark.stage.maxConsecutiveAttempts. When a node fails and a FetchFailedException occurs, Spark marks running shuffle stage as failed and triggers a re-attempt after computing the missing shuffle outputs. Frequent scaling of nodes during shuffle stages can easily cause stage failures to reach the threshold and abort the jobs. Ideally, when a stage fails for valid reasons such as a manual scale in, an automatic scaling event, or an EC2-triggered Spot instance termination, there should be a way to tell Spark not to count this toward spark.stage.maxConsecutiveAttempts for that stage.

How Amazon EMR resolves these issues

 This section describes the three main enhancements that Amazon EMR has done to its Spark to resolve the issues described in the previous section.

Integrate with YARN’s decommissioning mechanism

 Spark on Amazon EMR uses YARN as the underlying manager for cluster resources. Amazon EMR has its own implementation of a graceful decommissioning mechanism for YARN that provides a way to gracefully shut down YARN node managers by not scheduling new containers on a node in the Decommissioning state. Amazon EMR does this by waiting for the existing tasks on running containers to complete, or time out, before the node is decommissioned. This decommissioning mechanism has recently been contributed back to open source Hadoop.

We integrated Spark with YARN’s decommissioning mechanism so that the Spark driver is notified when a node goes through Decommissioning or Decommissioned states in YARN. This is shown in the following diagram:

This notification allows the driver to take appropriate actions and start the recovery early, because all nodes go through the decommissioning process before being removed.

Extend Spark’s blacklisting mechanism

YARN’s decommissioning mechanism works well for Hadoop MapReduce jobs by not launching any more containers on decommissioning nodes. This prevents more Hadoop MapReduce tasks from being scheduled on that node. However, this does not work well for Spark jobs because in Spark each executor is assigned a YARN container that is long-lived and keeps receiving tasks.

Preventing new containers from being launched only prevents more executors from being assigned to the node. Already active executors/containers continue to schedule new tasks until the node goes down, and they can end up failing and have to be rerun. Also, if these tasks write shuffle output, they would also be lost. This increases the recomputation and the time that it takes for recovery.

To address this, Amazon EMR extends Spark’s blacklisting mechanism to blacklist a node when the Spark driver receives a YARN decommissioning signal for it. This is shown in the following diagram:

This prevents new tasks from being scheduled on the blacklisted node. Instead they are scheduled on healthy nodes. As soon as tasks already running on the node are complete, the node can be safely decommissioned without the risk of task failures or losses. This also speeds up the recovery process by not producing more shuffle output on a node that is going down. This reduces the number of shuffle outputs to be recomputed. If the node comes out of the Decommissioning state and is active again, Amazon EMR removes the node from the blacklists so that new tasks can be scheduled on it.

This blacklisting extension is enabled by default in Amazon EMR with the spark.blacklist.decommissioning.enabled property set to true. You can control the time for which the node is blacklisted using the spark.blacklist.decommissioning.timeout property, which is set to 1 hour by default, equal to the default value for yarn.resourcemanager.nodemanager-graceful-decommission-timeout-secs. We recommend setting spark.blacklist.decommissioning.timeout to a value equal to or greater than yarn.resourcemanager.nodemanager-graceful-decommission-timeout-secs to make sure that Amazon EMR blacklists the node for the entire decommissioning period.

Actions for decommissioned nodes

After a node is decommissioning, no new tasks are getting scheduled, and the active containers become idle (or the timeout expires), the node gets decommissioned. When the Spark driver receives the decommissioned signal, it can take the following additional actions to start the recovery process sooner rather than waiting for a fetch failure to occur:

  • All of the shuffle outputs on the decommissioned node are unregistered, thus marking them as unavailable. Amazon EMR enables this by default with the setting spark.resourceManager.cleanupExpiredHost set to true. This has the following advantages:
    • If a node is lost in the middle of a map stage and gets decommissioned, Spark initiates recovery and recomputes the lost shuffle outputs on the decommissioned node, before proceeding to the next Stage. This prevents fetch failures in the shuffle stage, because Spark has all of the shuffle blocks computed and available at the end of map stage, which significantly speeds up recovery.
    • If a node is lost in the middle of a shuffle stage, the target executors trying to get shuffle blocks from the lost node immediately notice that the shuffle output is unavailable. It then sends the failure to the driver instead of retrying and failing multiple times to fetch them. The driver then immediately fails the stage and starts recomputing the lost shuffle output. This reduces the time spent trying to fetch shuffle blocks from lost nodes.
    • The most significant advantage of unregistering shuffle outputs is when a cluster is scaled in by a large number of nodes. Because all of the nodes go down around the same time, they all get decommissioned around the same time, and their shuffle outputs are unregistered. When Spark schedules the first re-attempt to compute the missing blocks, it notices all of the missing blocks from decommissioned nodes and recovers in only one attempt. This speeds up the recovery process significantly over the open-source Spark implementation, where stages might be rescheduled multiple times to recompute missing shuffles from all nodes, and prevent jobs from being stuck for hours failing and recomputing.
  • When a stage fails because of fetch failures from a node being decommissioned, by default, Amazon EMR does not count the stage failure toward the maximum number of failures allowed for a stage as set by spark.stage.maxConsecutiveAttempts. This is determined by the setting spark.stage.attempt.ignoreOnDecommissionFetchFailure being set to true. This prevents a job from failing if a stage fails multiple times because of node failures for valid reasons such as a manual resize, an automatic scaling event, or an EC2-triggered Spot instance termination.

Conclusion

This post described how Spark handles node loss and some of the issues that can occur if a cluster is scaled in during an active Spark job. It also showed the customizations that Amazon EMR has built on Spark, and the configurations available to make Spark on Amazon EMR more resilient, so that you can take full advantage of the elasticity features offered by Amazon EMR.

If you have questions or suggestions, please leave a comment.

 


About the Author

Udit Mehrotra is an software development engineer at Amazon Web Services. He works on cutting-edge features of EMR and is also involved in open source projects such as Apache Spark, Apache Hadoop and Apache Hive. In his spare time, he likes to play guitar, travel, binge watch and hang out with friends.

Implementing GitFlow Using AWS CodePipeline, AWS CodeCommit, AWS CodeBuild, and AWS CodeDeploy

Post Syndicated from Ashish Gore original https://aws.amazon.com/blogs/devops/implementing-gitflow-using-aws-codepipeline-aws-codecommit-aws-codebuild-and-aws-codedeploy/

This blog post shows how AWS customers who use a GitFlow branching model can model their merge and release process by using AWS CodePipeline, AWS CodeCommit, AWS CodeBuild, and AWS CodeDeploy. This post provides a framework, AWS CloudFormation templates, and AWS CLI commands.

Before we begin, we want to point out that GitFlow isn’t something that we practice at Amazon because it is incompatible with the way we think about CI/CD. Continuous integration means that every developer is regularly merging changes back to master (at least once per day). As we’ll explain later, GitFlow involves creating multiple levels of branching off of master where changes to feature branches are only periodically merged all the way back to master to trigger a release. Continuous delivery requires the capability to get every change into production quickly, safely, and sustainably. Research by groups such as DORA has shown that teams that practice CI/CD get features to customers more quickly, are able to recover from issues more quickly, experience fewer failed deployments, and have higher employee satisfaction.

Despite our differing view, we recognize that our customers have requirements that might make branching models like GitFlow attractive (or even mandatory). For this reason, we want to provide information that helps them use our tools to automate merge and release tasks and get as close to CI/CD as possible. With that disclaimer out of the way, let’s dive in!

When Linus Torvalds introduced Git version control in 2005, it really changed the way developers thought about branching and merging. Before Git, these tasks were scary and mostly avoided. As the tools became more mature, branching and merging became both cheap and simple. They are now part of the daily development workflow. In 2010, Vincent Driessen introduced GitFlow, which became an extremely popular branch and release management model. It introduced the concept of a develop branch as the mainline integration and the well-known master branch, which is always kept in a production-ready state. Both master and develop are permanent branches, but GitFlow also recommends short-lived feature, hotfix, and release branches, like so:

GitFlow guidelines:

  • Use development as a continuous integration branch.
  • Use feature branches to work on multiple features.
  • Use release branches to work on a particular release (multiple features).
  • Use hotfix branches off of master to push a hotfix.
  • Merge to master after every release.
  • Master contains production-ready code.

Now that you have some background, let’s take a look at how we can implement this model using services that are part of AWS Developer Tools: AWS CodePipeline, AWS CodeCommit, AWS CodeBuild, and AWS CodeDeploy. In this post, we assume you are familiar with these AWS services. If you aren’t, see the links in the Reference section before you begin. We also assume that you have installed and configured the AWS CLI.

Throughout the post, we use the popular GitFlow tool. It’s written on top of Git and automates the process of branch creation and merging. The tool follows the GitFlow branching model guidelines. You don’t have to use this tool. You can use Git commands instead.

For simplicity, production-like pipelines that have approval or testing stages have been omitted, but they can easily fit into this model. Also, in an ideal production scenario, you would keep Dev and Prod accounts separate.

AWS Developer Tools and GitFlow

Let’s take a look at how can we model AWS CodePipeline with GitFlow. The idea is to create a pipeline per branch. Each pipeline has a lifecycle that is tied to the branch. When a new, short-lived branch is created, we create the pipeline and required resources. After the short-lived branch is merged into develop, we clean up the pipeline and resources to avoid recurring costs.

The following would be permanent and would have same lifetime as the master and develop branches:

  • AWS CodeCommit master/develop branch
  • AWS CodeBuild project across all branches
  • AWS CodeDeploy application across all branches
  • AWS Cloudformation stack (EC2 instance) for master (prod) and develop (stage)

The following would be temporary and would have the same lifetime as the short-lived branches:

  • AWS CodeCommit feature/hotfix/release branch
  • AWS CodePipeline per branch
  • AWS CodeDeploy deployment group per branch
  • AWS Cloudformation stack (EC2 instance) per branch

Here’s how it would look:

Basic guidelines (assuming EC2/on-premises):

  • Each branch has an AWS CodePipeline.
  • AWS CodePipeline is configured with AWS CodeCommit as the source provider, AWS CodeBuild as the build provider, and AWS CodeDeploy as the deployment provider.
  • AWS CodeBuild is configured with AWS CodePipeline as the source.
  • Each AWS CodePipeline has an AWS CodeDeploy deployment group that uses the Name tag to deploy.
  • A single Amazon S3 bucket is used as the artifact store, but you can choose to keep separate buckets based on repo.

 

Step 1: Use the following AWS CloudFormation templates to set up the required roles and environment for master and develop, including the commit repo, VPC, EC2 instance, CodeBuild, CodeDeploy, and CodePipeline.

$ aws cloudformation create-stack --stack-name GitFlowEnv \
--template-body https://s3.amazonaws.com/devops-workshop-0526-2051/git-flow/aws-devops-workshop-environment-setup.template \
--capabilities CAPABILITY_IAM 

$ aws cloudformation create-stack --stack-name GitFlowCiCd \
--template-body https://s3.amazonaws.com/devops-workshop-0526-2051/git-flow/aws-pipeline-commit-build-deploy.template \
--capabilities CAPABILITY_IAM \
--parameters ParameterKey=MainBranchName,ParameterValue=master ParameterKey=DevBranchName,ParameterValue=develop 

Here is how the pipelines should appear in the CodePipeline console:

Step 2: Push the contents to the AWS CodeCommit repo.

Download https://s3.amazonaws.com/gitflowawsdevopsblogpost/WebAppRepo.zip. Unzip the file, clone the repo, and then commit and push the contents to CodeCommit – WebAppRepo.

Step 3: Run git flow init in the repo to initialize the branches.

$ git flow init

Assume you need to start working on a new feature and create a branch.

$ git flow feature start <branch>

Step 4: Update the stack to create another pipeline for feature-x branch.

$ aws cloudformation update-stack --stack-name GitFlowCiCd \
--template-body https://s3.amazonaws.com/devops-workshop-0526-2051/git-flow/aws-pipeline-commit-build-deploy-update.template \
--capabilities CAPABILITY_IAM \
--parameters ParameterKey=MainBranchName,ParameterValue=master ParameterKey=DevBranchName,ParameterValue=develop ParameterKey=FeatureBranchName,ParameterValue=feature-x

When you’re done, you should see the feature-x branch in the CodePipeline console. It’s ready to build and deploy. To test, make a change to the branch and view the pipeline in action.

After you have confirmed the branch works as expected, use the finish command to merge changes into the develop branch.

$ git flow feature finish <feature>

After the changes are merged, update the AWS CloudFormation stack to remove the branch. This will help you avoid charges for resources you no longer need.

$ aws cloudformation update-stack --stack-name GitFlowCiCd \
--template-body https://s3.amazonaws.com/devops-workshop-0526-2051/git-flow/aws-pipeline-commit-build-deploy.template \
--capabilities CAPABILITY_IAM \
--parameters ParameterKey=MainBranchName,ParameterValue=master ParameterKey=DevBranchName,ParameterValue=develop

The steps for the release and hotfix branches are the same.

End result: Pipelines and deployment groups

You should end up with pipelines that look like this.

Next steps

If you take the CLI commands and wrap them in your own custom bash script, you can use GitFlow and the script to quickly set up and tear down pipelines and resources for short-lived branches. This helps you avoid being charged for resources you no longer need. Alternatively, you can write a scheduled Lambda function that, based on creation date, deletes the short-lived pipelines on a regular basis.

Summary

In this blog post, we showed how AWS CodePipeline, AWS CodeCommit, AWS CodeBuild, and AWS CodeDeploy can be used to model GitFlow. We hope you can use the information in this post to improve your CI/CD strategy, specifically to get your developers working in feature/release/hotfixes branches and to provide them with an environment where they can collaborate, test, and deploy changes quickly.

References

19 години без проф. Тончо Жечев

Post Syndicated from nellyo original https://nellyo.wordpress.com/2019/02/22/19/

Toncho01100213

Професор Тончо Жечев си отиде на 23 февруари 2000 година.

Така помня професор Жечев. Малко умислен,  малко загрижен, малко тревожен от света наоколо,  в мир с това зад нас и с  любопитство към това пред нас.

 

Собствеността на доставчиците на радио и телевизия

Post Syndicated from nellyo original https://nellyo.wordpress.com/2019/02/22/transp_rtv/

От сайта на СЕМ:

Поименно разпределение на собствеността в електронните медии.

Актуално към 01 февруари 2019 г.

[$] Containers as kernel objects — again

Post Syndicated from corbet original https://lwn.net/Articles/780364/rss

Linus Torvalds once famously said
that there is no design behind the Linux kernel. That may be true, but
there are still some guiding principles behind the evolution of the kernel;
one of those, to date, has been that the kernel does not recognize
“containers” as objects in their own right. Instead, the kernel provides
the necessary low-level features, such as namespaces and control groups, to
allow user space to create its own container abstraction. This refusal to
dictate the nature of containers has led to a diverse variety of container
models and a lot of experimentation. But that doesn’t stop those who would
still like to see the kernel recognize containers as first-class
kernel-supported objects.

Security updates for Friday

Post Syndicated from jake original https://lwn.net/Articles/780543/rss

Security updates have been issued by Mageia (libreoffice, libtiff, spice, and spice-gtk), openSUSE (build, mosquitto, and nodejs6), Red Hat (firefox, flatpak, and systemd), Scientific Linux (firefox, flatpak, and systemd), SUSE (kernel-firmware and texlive), and Ubuntu (bind9 and ghostscript).

What we are learning about learning

Post Syndicated from Oliver Quinlan original https://www.raspberrypi.org/blog/what-we-are-learning-about-learning/

Across Code Clubs, CoderDojos, Raspberry Jams, and all our other education programmes, we’re working with hundreds of thousands of young people. They are all making different projects and learning different things while they are making. The research team at the Raspberry Pi Foundation does lots of work to help us understand what exactly these young people learn, and how the adults and peers who mentor them share their skills with them.

Coolest Projects International 2018

Senior Research Manager Oliver Quinlan chats to participants at Coolest Projects 2018

We do our research work by:

  • Visiting clubs, Dojos, and events, seeing how they run, and talking to the adults and young people involved
  • Running surveys to get feedback on how people are helping young people learn
  • Testing new approaches and resources with groups of clubs and Dojos to try different ways which might help to engage more young people or help them learn more effectively

Over the last few months, we’ve been running lots of research projects and gained some fascinating insights into how young people are engaging with digital making. As well as using these findings to shape our education work, we also publish what we find, for free, over on our research page.

How do children tackle digital making projects?

We found that making ambitious digital projects is a careful balance between ideas, technology, and skills. Using this new understanding, we will help children and the adults that support them plan a process for exploring open-ended projects.

Coolest Projects USA 2018

Coolest Projects USA 2018

For this piece of research, we interviewed children and young people at last year’s Coolest Projects International and Coolest Projects UK , asking questions about the kinds of projects they made and how they created them. We found that the challenge they face is finding a balance between three things: the ideas and problems they want to address, the technologies they have access to, and their skills. Different children approached their projects in different ways, some starting with the technology they had access to, others starting with an idea or with a problem they wanted to solve.

Achieving big ambitions with the technology you have to hand while also learning the skills you need can be tricky. We’re planning to develop more resources to help young people with this.

Coolest Projects International 2018

Research Assistant Lucia Florianova learns about Rebel Girls at Coolest Projects International 2018

We also found out a lot about the power of seeing other children’s projects, what children learn, and the confidence they develop in presenting their projects at these events. Alongside our analysis, we’ve put together some case studies of the teams we interviewed, so people can read in-depth about their projects and the stories of how they created them.

Who comes to Code Club?

In another research project, we found that Code Clubs in schools are often diverse and cater well for the communities the schools serve; Code Club is not an exclusive club, but something for everyone.

Code Club Athens

Code Clubs are run by volunteers in all sorts of schools, libraries, and other venues across the world; we know a lot about the spaces the clubs take place in and the volunteers who run them, but less about the children who choose to take part. We’ve started to explore this through structured visits to clubs in a sample of schools across the West Midlands in England, interviewing teachers about the groups of children in their club. We knew Code Clubs were reaching schools that cater for a whole range of communities, and the evidence of this project suggests that the children who attend the Code Club in those schools come from a range of backgrounds themselves.

Scouts Raspberry Pi

Photo c/o Dave Bird — thanks, Dave!

We found that in these primary schools, children were motivated to join Code Club more because the club is fun rather than because the children see themselves as people who are programmers. This is partly because adults set up Code Clubs with an emphasis on fun: although children are learning, they are not perceiving Code Club as an academic activity linked with school work. Our project also showed us how Code Clubs fit in with the other after-school clubs in schools, and that children often choose Code Club as part of a menu of after-school clubs.

Raspberry Jam

Visitors to Pi Towers Raspberry Jam get hands-on with coding

In the last few months we’ve also published insights into how Raspberry Pi Certified Educators are using their training in schools, and into how schools are using Raspberry Pi computers. You can find our reports on all of these topics over at our research page.

Thanks to all the volunteers, educators, and young people who are finding time to help us with their research. If you’re involved in any of our education programmes and want to take part in a research project, or if you are doing your own research into computing education and want to start a conversation, then reach out to us via [email protected].

The post What we are learning about learning appeared first on Raspberry Pi.

MTG съобщава за продажбата на Нова Бродкастинг Груп

Post Syndicated from nellyo original https://nellyo.wordpress.com/2019/02/22/mtg-nova/

 

22 февруари 2019 г.

Modern Times Group (MTG) продава своя 95% дял в Nova Broadcasting Group в България на Advance Media Group.
Продажбата подлежи  на одобрение от местните регулаторните органи, което се очаква да стане през второто тримесечие на 2019 г.

MTG първо влиза в България през 2007 г., като инвестира в Balkan Media Group Limited, а след това придобива Nova през 2008 г., преди да обедини двата бизнеса през 2009 г.

Адванс Медия Груп ЕАД е дъщерно дружество на Адванс Пропъртис ООД, която притежава 117 компании в 23 държави. Фирмените компании на Адванс Пропъртис оперират в повече от 10 индустрии, включително фармацевтични продукти, корабоплаване, пристанищни операции, недвижими имоти и производство на електроенергия. Групата е собственост на българските бизнесмени Кирил Домусчиев и Георги Домусчиев.

Съобщението на сайта на Нова телевизия: „Адванс Медиа Груп“ ЕАД и шведската медийна група Modern Times Group (MTG) съобщиха, че е подписан договор за продажбата на Нова Броудкастинг Груп. Споразумението предвижда българската компания да придобие 100% от Нова Броудкастинг Груп АД.

Gen. Nakasone on US CyberCommand

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2019/02/gen_nakasone_on.html

Really interesting article by and interview with Paul M. Nakasone (Commander of U.S. Cyber Command, Director of the National Security Agency, and Chief of the Central Security Service) in the current issue of Joint Forces Quarterly. He talks about the evolving role of US CyberCommand, and it’s new posture of “persistent engagement” using a “cyber-presistant force”:

From the article:

We must “defend forward” in cyberspace, as we do in the physical domains. Our naval forces do not defend by staying in port, and our airpower does not remain at airfields. They patrol the seas and skies to ensure they are positioned to defend our country before our borders are crossed. The same logic applies in cyberspace. Persistent engagement of our adversaries in cyberspace cannot be successful if our actions are limited to DOD networks. To defend critical military and national interests, our forces must operate against our enemies on their virtual territory as well. Shifting from a response outlook to a persistence force that defends forward moves our cyber capabilities out of their virtual garrisons, adopting a posture that matches the cyberspace operational environment.

From the interview:

As we think about cyberspace, we should agree on a few foundational concepts. First, our nation is in constant contact with its adversaries; we’re not waiting for adversaries to come to us. Our adversaries understand this, and they are always working to improve that contact. Second, our security is challenged in cyberspace. We have to actively defend; we have to conduct reconnaissance; we have to understand where our adversary is and his capabilities; and we have to understand their intent. Third, superiority in cyberspace is temporary; we may achieve it for a period of time, but it’s ephemeral. That’s why we must operate continuously to seize and maintain the initiative in the face of persistent threats. Why do the threats persist in cyberspace? They persist because the barriers to entry are low and the capabilities are rapidly available and can be easily repurposed. Fourth, in this domain, the advantage favors those who have initiative. If we want to have an advantage in cyberspace, we have to actively work to either improve our defenses, create new accesses, or upgrade our capabilities. This is a domain that requires constant action because we’re going to get reactions from our adversary.

[…]

Persistent engagement is the concept that states we are in constant contact with our adversaries in cyberspace, and success is determined by how we enable and act. In persistent engagement, we enable other interagency partners. Whether it’s the FBI or DHS, we enable them with information or intelligence to share with elements of the CIKR [critical infrastructure and key resources] or with select private-sector companies. The recent midterm elections is an example of how we enabled our partners. As part of the Russia Small Group, USCYBERCOM and the National Security Agency [NSA] enabled the FBI and DHS to prevent interference and influence operations aimed at our political processes. Enabling our partners is two-thirds of persistent engagement. The other third rests with our ability to act — that is, how we act against our adversaries in cyberspace. Acting includes defending forward. How do we warn, how do we influence our adversaries, how do we position ourselves in case we have to achieve outcomes in the future? Acting is the concept of operating outside our borders, being outside our networks, to ensure that we understand what our adversaries are doing. If we find ourselves defending inside our own networks, we have lost the initiative and the advantage.

[…]

The concept of persistent engagement has to be teamed with “persistent presence” and “persistent innovation.” Persistent presence is what the Intelligence Community is able to provide us to better understand and track our adversaries in cyberspace. The other piece is persistent innovation. In the last couple of years, we have learned that capabilities rapidly change; accesses are tenuous; and tools, techniques, and tradecraft must evolve to keep pace with our adversaries. We rely on operational structures that are enabled with the rapid development of capabilities. Let me offer an example regarding the need for rapid change in technologies. Compare the air and cyberspace domains. Weapons like JDAMs [Joint Direct Attack Munitions] are an important armament for air operations. How long are those JDAMs good for? Perhaps 5, 10, or 15 years, some-times longer given the adversary. When we buy a capability or tool for cyberspace…we rarely get a prolonged use we can measure in years. Our capabilities rarely last 6 months, let alone 6 years. This is a big difference in two important domains of future conflict. Thus, we will need formations that have ready access to developers.

Solely from a military perspective, these are obviously the right things to be doing. From a societal perspective — from the perspective a potential arms race — I’m much less sure. I’m also worried about the singular focus on nation-state actors in an environment where capabilities diffuse so quickly. But CyberCommand’s job is not cybersecurity and resilience.

The whole thing is worth reading, regardless of whether you agree or disagree.

Ревизия на авторското право в ЕС: позиция на държавите, гласували против

Post Syndicated from nellyo original https://nellyo.wordpress.com/2019/02/22/copyright_dir-2/

Пет държави не подкрепиха проекта на директива за авторското право на 20 февруари (COREPER 1) – Полша, Нидерландия, Люксембург, Финландия и Италия. Общата им позиция е официално публикувана от Нидерландия:

Целите на настоящата директива е да се подобри доброто функциониране на вътрешния пазар и да се стимулират иновациите, творчеството, инвестициите и производството на ново съдържание, също и в цифровата среда.

Държавите, гласуващи против, подкрепят тези цели. Цифровите технологии радикално промениха начина, по който се произвежда, разпространява и получава достъп до съдържанието. Законодателната рамка трябва да отразява и ръководи тези промени.

Въпреки това, според нас, окончателният текст на директивата не изпълнява адекватно гореспоменатите цели. Според нас  директивата в сегашната й форма е крачка назад, а не крачка напред.

Най-вече изразяваме съжаление, че директивата не постига правилния баланс между защитата на носителите на права и интересите на гражданите и дружествата в ЕС. По този начин тя рискува да възпрепятства иновациите, а не да ги насърчава,  и да има отрицателно въздействие върху конкурентоспособността на европейския цифров единен пазар.

Освен това смятаме, че директивата няма правна яснота, ще доведе до правна несигурност за много заинтересовани страни и може да наруши правата на гражданите на ЕС.

Ето защо не можем да изразим съгласието си с предложения текст на директивата.

Какво следва: правна комисия на ЕП и пленарно гласуване през април.

Copyright_timeline_infographics_20190220

източник EDRI

Get to know the newest AWS Heroes – Winter 2019

Post Syndicated from Ross Barich original https://aws.amazon.com/blogs/aws/get-to-know-the-newest-aws-heroes-winter-2019/

AWS Heroes are superusers who possess advanced technical skills and are early adopters of emerging technologies. Heroes are passionate about sharing their extensive AWS knowledge with others. Some get involved in-person by running meetups, workshops, and speaking at conferences, while others share with online AWS communities via social media, blog posts, and open source contributions.

2019 is off to a roaring start and we’re thrilled to introduce you to the latest AWS Heroes:

Aileen Gemma Smith
Ant Stanley
Gaurav Kamboj
Jeremy Daly
Kurt Lee
Matt Weagle
Shingo Yoshida

Aileen Gemma Smith – Sydney, Australia

Community Hero Aileen Gemma Smith is the founder and CEO of Vizalytics Technology. The team at Vizalytics serves public and private sector clients worldwide in transportation, tourism, and economic development. She shared their story in the Building Complex Workloads in the Cloud session, at AWS Canberra Summit 2017. Aileen has a keen interest in diversity and inclusion initiatives and is constantly working to elevate the work and voices of underestimated engineers and founders. At AWS Public Sector Summit Canberra in 2018, she was a panelist for We Power Tech, Inclusive Conversations with Women in Technology. She has supported and encouraged the creation of internships and mentoring programs for high school and university students with a focus on building out STEAM initiatives.
 

 

 

 

 

 

Ant Stanley – London, United Kingdom

Serverless Hero Ant Stanley is a consultant and community organizer. He founded and currently runs the Serverless London user group, and he is part of the ServerlessDays London organizing team and the global ServerlessDays leadership team. Previously, Ant was a co-founder of A Cloud Guru, and responsible for organizing the first Serverlessconf event in New York in May 2016. Living in London since 2009, Ant’s background before serverless is primarily as a solutions architect at various organizations, from managed service providers to Tier 1 telecommunications providers. His current focus is serverless, GraphQL, and Node.js.

 

 

 

 

 

 

 

 

Gaurav Kamboj – Mumbai, India

Community Hero Gaurav Kamboj is a cloud architect at Hotstar, India’s leading OTT provider with a global concurrency record for live streaming to 11Mn+ viewers. At Hotstar, he loves building cost-efficient infrastructure that can scale to millions in minutes. He is also passionate about chaos engineering and cloud security. Gaurav holds the original “all-five” AWS certifications, is co-founder of AWS User Group Mumbai, and speaks at local tech conferences. He also conducts guest lectures and workshops on cloud computing for students at engineering colleges affiliated with the University of Mumbai.

 

 

 

 

 

 

 

 

Jeremy Daly – Boston, USA

Serverless Hero Jeremy Daly is the CTO of AlertMe, a startup based in NYC that uses machine learning and natural language processing to help publishers better connect with their readers. He began building cloud-based applications with AWS in 2009. After discovering Lambda, became a passionate advocate for FaaS and managed services. He now writes extensively about serverless on his blog, jeremydaly.com, and publishes Off-by-none, a weekly newsletter that focuses on all things serverless. As an active member of the serverless community, Jeremy contributes to a number of open-source serverless projects, and has created several others, including Lambda API, Serverless MySQL, and Lambda Warmer.

 

 

 

 

 

 

 

Kurt Lee – Seoul, South Korea

Serverless Hero Kurt Lee works at Vingle Inc. as their tech lead. As one of the original team members, he has been involved in nearly all backend applications there. Most recently, he led Vingle’s full migration to serverless, cutting 40% of the server cost. He’s known for sharing his experience of adapting serverless, along with its technical and organizational value, through Medium. He and his team maintain multiple open-source projects, which they developed during the migration. Kurt hosts [email protected] regularly, and often presents at AWSKRUG about various aspects of serverless and pushing more things to serverless.

 

 

 

 

 

 

 

Matt Weagle – Seattle, USA

Serverless Hero Matt Weagle leverages machine learning, serverless techniques, and a servicefull mindset at Lyft, to create innovative transportation experiences in an operationally sustainable and secure manner. Matt looks to serverless as a way to increase collaboration across development, operational, security, and financial concerns and support rapid business-value creation. He has been involved in the serverless community for several years. Currently, he is the organizer of Serverless – Seattle and co-organizer of the serverlessDays Seattle event. He writes about serverless topics on Medium and Twitter.

 

 

 

 

 

 

 

Shingo Yoshida – Tokyo, Japan

Serverless Hero Shingo Yoshida is the CEO of Section-9, CTO of CYDAS, as well as a founder of Serverless Community(JP) and a member of JAWS-UG (AWS User Group – Japan). Since 2012, Shingo has not only built a system with just AWS, but has also built with a cloud-native architecture to make his customers happy. Serverless Community(JP) was established in 2016, and meetups have been held 20 times in Tokyo, Osaka, Fukuoka, and Sapporo, including three full-day conferences. Through this community, thousands of participants have discovered the value of serverless. Shingo has contributed to these serverless scenes with many blog posts and books about serverless, including Serverless Architectures on AWS.

 

 

 

 

 

 

 

There are now 80 AWS Heroes worldwide. Learn about all of them and connect with an AWS Hero.

Съд на ЕС: Обработване на лични данни за целите на журналистическа дейност

Post Syndicated from nellyo original https://nellyo.wordpress.com/2019/02/22/court_eu_dp/

Преюдициално запитване — Обработване на лични данни — Директива 95/46/ЕО — Член 3 — Приложно поле — Видеозапис на полицейски служители в полицейско управление при осъществяване на процесуални действия — Публикуване в интернет сайтове за видеоматериали — Член 9 — Обработване на лични данни единствено за целите на журналистическа дейност — Понятие — Свобода на словото — Защита на личния живот

Стана известно решение на Съда на ЕС по дело C‑345/17 с предмет преюдициално запитване, отправено  от Augstākā tiesa (Върховен съд, Латвия)  в рамките на производство по дело Sergejs Buivids.

 Запитването е отправено в рамките на спор между г‑н Sergejs Buivids и Datu valsts inspekcija (Национален орган за защита на данните, Латвия) по повод на жалба, с която се иска да се обяви за незаконосъобразно решението на този орган, съгласно което г‑н Buivids нарушил националното законодателство, като е публикувал в интернет сайта http://www.youtube.com заснет от самия него видеозапис със снемането на показанията му в помещенията на участък на националната полиция в рамките на административно-наказателно производство.

Вследствие на публикацията Националният орган за защита на личните данни приема, че г‑н Buivids е нарушил закона, тъй като не е предоставил на полицейските служители в качеството им на засегнати лица информация относно целта на обработването на личните им данни. Освен това г‑н Buivids не е предоставил на Националния орган за защита на личните данни информация за целта на заснемането на спорния видеозапис и публикуването му в интернет, за да докаже, че преследваната цел отговаря на Закона за защита на личните данни. Националният орган за защита на личните данни е поискал от г‑н Buivids да предприеме действия за заличаване на видеозаписа от интернет сайта http://www.youtube.com и други интернет сайтове.

Делото между гражданина и администрацията стига до ВС, като г‑н Buivids твърди, че спорният видеозапис показва служители на националната полиция, т.е. публични личности на обществено достъпно място, и поради това не попада в личния обхват на Закона за защита на личните данни.

При тези обстоятелства Augstākā tiesa (Върховен съд) решава да спре производството и да постави на Съда следните преюдициални въпроси:

„1)      Попадат ли в приложното поле на Директива 95/46 дейности като разглежданите в настоящото производство, а именно заснемане на полицейски служители в полицейско управление при осъществяване на процесуални действия и публикуване на видеоматериала в интернет сайта http://www.youtube.com?

2)      Трябва ли Директива 95/46 да се тълкува в смисъл, че посочените действия могат да се разглеждат като обработване на лични данни за журналистически цели съгласно член 9 от посочената директива?“.

По първия въпрос

30      Понятието „лични данни“ по смисъла на тази разпоредба съгласно определението в член 2, буква a) от Директивата обхваща „всяка информация, свързана с идентифицирано или подлежащо на идентификация лице“. За подлежащо на идентифициране лице се смята „това лице, което може да бъде идентифицирано, пряко или непряко, по-специално чрез […] един или повече специфични признаци, отнасящи се до неговата физическа самоличност“.

31      Съгласно практиката на Съда образът на дадено лице, заснето с камера, представлява „лични данни“ по смисъла на член 2, буква а) от Директива 95/46, доколкото позволява да се идентифицира засегнатото лице (вж. в този смисъл решение от 11 декември 2014 г., Ryneš, C‑212/13, EU:C:2014:2428, т. 22).

32      В случая, видно от акта за преюдициално запитване, полицейските служители могат да бъдат видени и чути на спорния видеозапис, поради което следва да се приеме, че образите на заснетите лица действително представляват лични данни по смисъла на член 2, буква а) от Директива 95/46.

39      […] публикуването в интернет сайт, към който потребителите могат да изпращат, гледат и споделят видеоматериали като спорния видеозапис, в който се съдържат лични данни, представлява автоматизирано обработване изцяло или частично на тези данни по смисъла на член 2, буква б) и на член 3, параграф 1 от Директива 95/46.

40      Съгласно член 3, параграф 2 от Директива 95/46 тя не се прилага към обработването на лични данни в две хипотези –  дейности, които не попадат в приложното поле на правото на Общността, като предвидените в дялове V и VI от Договора за Европейския съюз в редакцията му преди влизането в сила на Договора от Лисабон, и при всички положения за обработване с цел обществената сигурност, отбраната, държавната сигурност и дейностите на държавата в областта на наказателното право. Посочената разпоредба изключва обработването на лични данни, осъществено от физическо лице в хода на изцяло лични или домашни занимания.

43      […]    доколкото г‑н Buivids е публикувал, без ограничения в достъпа, спорния видеозапис в интернет сайт за видеоматериали, към който потребителите могат да изпращат гледат и споделят такива материали, като по този начин се предоставя достъп до лични данни на неопределен брой хора, разглежданото в главното производство обработване на лични данни не се вписва в рамките на дейност, която е изцяло лична или домашна.

Следователно, заснемането и публикуването в YouTube попада в обхвата на директивата.

По втория въпрос

Съдът започва с баланса на права:  Директива 95/46 се стреми държавите  да защитават основните права и свободи на физическите лица, и в частност правото им на личен живот, при обработването на лични данни, като същевременно дават възможност за свободно движение на тези данни. Не е възможно обаче тази цел да се следва, без да се вземе предвид обстоятелството, че трябва в определена степен да се съгласуват посочените основни права и основното право на свобода на словото. Съображение 37 от Директивата уточнява, че целта на член 9 е да се съгласуват две основни права, а именно защитата на личния живот, от една страна, и свободата на словото, от друга. Това е задача на държавите членки (вж. в този смисъл решение от 16 декември 2008 г., Satakunnan Markkinapörssi и Satamedia, C‑73/07, EU:C:2008:727, т. 52—54).

Предвидените в член 9   изключения и дерогации се прилагат не само към медийните предприятия, но и към всяко лице, което упражнява журналистическа дейност. От практиката на Съда следва, че „журналистически дейности“ са тези, чиято цел е публичното разгласяване на информация, мнения или идеи независимо от средството за предаването им. Съдът вече е постановил, че носителят, чрез който се предават обработените данни, независимо дали е класически като хартия, или радиовълни, или електронен като интернет, не е определящ при преценката дали е налице дейност, извършвана „единствено за целите на журналистическа дейност“.

Запитващата юрисдикция може по-специално да вземе предвид факта, че според г‑н Buivids спорният видеозапис е публикуван в интернет сайт, за да привлече вниманието на обществеността върху практики на полицията, за които се твърди, че са неправомерни, и които били осъществени при снемане на показанията му.

69      С оглед на гореизложените съображения на втория въпрос следва да се отговори, че член 9 от Директива 95/46 трябва да се тълкува в смисъл, че обстоятелства като разглежданите в главното производство, а именно видеозапис на полицейски служители в полицейско управление при снемане на показания и публикуване на видеозаписа в интернет сайт за видеоматериали, към който потребителите могат да изпращат, гледат и споделят такива материали, могат да представляват обработване на лични данни единствено за целите на журналистическа дейност по смисъла на тази разпоредба, при условие че от посочения запис е видно, че заснемането и публикуваното имат за цел единствено публичното разгласяване на информация, мнения или идеи, което следва да се провери от запитващата юрисдикция.

The Linux Foundation Launches ELISA Project Enabling Linux In Safety-Critical Systems

Post Syndicated from jake original https://lwn.net/Articles/780493/rss

The Linux Foundation has announced the formation of the Enabling Linux in Safety Applications (ELISA) project to create tools and processes for companies to use to build and certify safety-critical Linux applications. “Building off the work being done by SIL2LinuxMP project and Real-Time Linux project, ELISA will make it easier for companies to build safety-critical systems such as robotic devices, medical devices, smart factories, transportation systems and autonomous driving using Linux. Founding members of ELISA include Arm, BMW Car IT GmbH, KUKA, Linutronix, and Toyota.

To be trusted, safety-critical systems must meet functional safety objectives for the overall safety of the system, including how it responds to actions such as user errors, hardware failures, and environmental changes. Companies must demonstrate that their software meets strict demands for reliability, quality assurance, risk management, development process, and documentation. Because there is no clear method for certifying Linux, it can be difficult for a company to demonstrate that their Linux-based system meets these safety objectives.”

Improve Build Performance and Save Time Using Local Caching in AWS CodeBuild

Post Syndicated from Kausalya Rani Krishna Samy original https://aws.amazon.com/blogs/devops/improve-build-performance-and-save-time-using-local-caching-in-aws-codebuild/

AWS CodeBuild now supports local caching, which makes it possible for you to persist intermediate build artifacts locally on the build host so that they are available for reuse in subsequent build runs.

Your build project can use one of two types of caching: Amazon S3 or local. In this blog post, we will discuss how to use the local caching feature.

Local caching stores a cache on a build host. The cache is available to that build host only for a limited time and until another build is complete. For example, when you are dealing with large Java projects, compilation might take a long time. You can speed up subsequent builds by using local caching. This is a good option for large intermediate build artifacts because the cache is immediately available on the build host.

Local caching increases build performance for:

  • Projects with a large, monolithic source code repository.
  • Projects that generate and reuse many intermediate build artifacts.
  • Projects that build large Docker images.
  • Projects with many source dependencies.

To use local caching

1. Open AWS CodeBuild console at https://console.aws.amazon.com/codesuite/codebuild/home.

2. Choose Create project.

3. In Project configuration, enter a name and description for the build project.

4. In Source, for Source provider, choose the source code provider type. In this example, we use an AWS CodeCommit repository name.

5. For Environment image, choose Managed image or Custom image, as appropriate. For environment type, choose Linux or Windows Server. Specify a runtime, runtime version, and service role for your project.

6. Configure the buildspec file for your project.

7. In Artifacts, expand Additional Configuration. For Cache type, choose Local, as shown here.

Local caching supports the following caching modes:

Source cache mode caches Git metadata for primary and secondary sources. After the cache is created, subsequent builds pull only the change between commits. This mode is a good choice for projects with a clean working directory and a source that is a large Git repository. If you choose this option and your project does not use a Git repository (GitHub, GitHub Enterprise, or Bitbucket), the option is ignored. No changes are required in the buildspec file.

Docker layer cache mode caches existing Docker layers. This mode is a good choice for projects that build or pull large Docker images. It can prevent the performance issues caused by pulling large Docker images down from the network.

Note

  • You can use a Docker layer cache in the Linux environment only.
  • The privileged flag must be set so that your project has the required Docker permissions
  • You should consider the security implications before you use a Docker layer cache.

Custom cache mode caches directories you specify in the buildspec file. This mode is a good choice if your build scenario is not suited to one of the other two local cache modes. If you use a custom cache:

  • Only directories can be specified for caching. You cannot specify individual files.
  • Symlinks are used to reference cached directories.
  • Cached directories are linked to your build before it downloads its project sources. Cached items are overridden if a source item has the same name. Directories are specified using cache paths in the buildspec file.

To use source cache mode

In the build project configuration, under Artifacts, expand Additional Configuration. For Cache type, choose Local. Select Source cache, as shown here.

To use Docker layer cache mode

In the build project configuration, under Artifacts, expand Additional Configuration. For Cache type, choose Local. Select Docker layer cache, as shown here.

Under Privileged, select Enable this flag if you want to build Docker images or want your builds to get elevated privileges. This grants elevated privileges to the Docker process running on the build host.

To use custom cache mode

In your buildspec file, specify the cache path, as shown here.

In the build project configuration, under Artifacts, expand Additional Configuration. For Cache type, choose Local. Select Custom cache, as shown here.


version: 0.2
phases:
  pre_build:
    commands:
      - echo "Enter pre_build commands"
  build:
    commands:
      - echo "Enter build commands"
      
cache:
  paths:
    - '/root/.m2/**/*'
    - '/root/.npm/**/*'
    - 'build/**/*'

Conclusion

We hope you find the information in this post helpful. If you have feedback, please leave it in the Comments section below. If you have questions, start a new thread on the AWS CodeBuild forum or contact AWS Support.

 

 

 

 

 

Scalable deep learning training using multi-node parallel jobs with AWS Batch and Amazon FSx for Lustre

Post Syndicated from Geoff Murase original https://aws.amazon.com/blogs/compute/scalable-deep-learning-training-using-multi-node-parallel-jobs-with-aws-batch-and-amazon-fsx-for-lustre/

Contributed by Amr Ragab, HPC Application Consultant, AWS Professional Services

How easy is it to take an AWS reference architecture and implement a production solution? At re:Invent 2018, Toyota Research Institute presented their production DL HPC architecture. This was based on a reference architecture for a scalable, deep learning, high performance computing solution, released earlier in the year.  The architecture was designed to run ImageNet and ResNet-50 benchmarks on Apache MXNet and TensorFlow machine learning (ML) frameworks. It used cloud best practices to take advantage of the scale and elasticity that AWS offers.

With the pace of innovation at AWS, I can now show an evolution of that deep learning solution with new services.

A three-component HPC cluster is common in tightly coupled, multi-node distributed training solutions. The base layer is a high-performance file system optimized for reading the images packed as TFRecords or RecordIO as well as in its original form. The reference architecture originally referenced BeeGFS. In this post, I use the high performance Amazon FSx for Lustre file system, announced at re:Invent 2018. The second layer is the scalable compute, which originally used p3.16xl instances containing eight NVIDIA Tesla V100 per node. Finally, a job scheduler is the third layer for managing multiuser access to plan and distribute the workload across the available nodes.

In this post, I demonstrate how to create a fully managed HPC infrastructure, execute the distributed training job, and collapse it using native AWS services. In the three-component HPC design, the scheduler and compute layers are achieved by using AWS Batch as a managed service built to run thousands of batch computing jobs. AWS Batch dynamically provisions compute resources based on the specific job requirements of the distributed training job.

AWS Batch recently started supporting multi-node parallel jobs, allowing tightly coupled jobs to be executed. This compute layer can be coupled with the FSx for Lustre file system.

FSx for Lustre is a fully managed, parallel file system based on Lustre that can scale to millions of IOPS, and hundreds of gigabytes per second throughput. FSx for Lustre is seamlessly integrated with Amazon S3 to parallelize the ingestion of data from the object store.

 

Coupled together, this provides a core compute solution for running workloads requiring high performance layers. One additional benefit is that AWS Batch and FSx for Lustre are API-driven services and can be programmatically orchestrated.

The goal of this post is to showcase an innovative architecture, replacing self-managed roll-your-own file system and compute to platform managed services using FSx for Lustre and AWS Batch running containerized applications, hence reducing complexity and maintenance. This can also serve as a template for other HPC applications requiring similar compute/networking and storage topologies. With that in mind, benchmarks related to distributed deep learning are out of scope. As you see at the end of this post, I achieved linear scalability over a broad range (8 – 160) of GPUs spanning 1–20 p3.16xlarge nodes.

Deployment

Much of the deployment was covered in a previous post, Building a tightly coupled molecular dynamics workflow with multi-node parallel jobs in AWS Batch. However, some feature updates since then have simplified the initial deployment.

In brief, you provision the following resources:

  • A FSx for Lustre file system hydrated from a S3 bucket that provides the source ImageNet 2012 images
  • A new Ubuntu 16.04 ECS instance:
    • Lustre kernel driver and FS mount
    • CUDA 10 with NVIDIA Tesla 410 driver
    • Docker 18.09-ce including nvidia-docker2
    • A multi-node parallel batch–compatible TensorFlow container with the following stack:
      • Ubuntu 18.04 container image
      • TENSORFLOW_VERSION=1.12.0
      • HOROVOD_VERSION=0.15.2
      • CUDNN_VERSION=7.4.2.24-1+cuda10.0
      • NCCL_VERSION=2.3.7-1+cuda10.0
      • OPENMPI 4.0.0

FSx for Lustre setup

First, create a file system in the FSx for Lustre console. The default minimum file system size of 3600 GiB is sufficient.

  • File system name: ImageNet2012 dataset
  • Storage capacity: 3600 (GiB)

In the console, ensure that you have specified the appropriate network access and security groups so that clients can access the FSx for Lustre file system. For this post, find the scripts to prepare the dataset in the deep-learning-models GitHub repo.

  • Data repository type: Amazon S3
  • Import path: Point to an S3 bucket holding the ImageNet 2012 dataset.

While the FSx for Lustre layer is being provisioned, spin up an instance in the Amazon EC2 console with the Ubuntu 16.04 ECS AMI using a p3.2xlarge instance type. One modification required, when preparing the ecs-agent systemd file. Replace the ExecStart= stanza with the following:

ExecStart=docker run --name ecs-agent \
  --init \
  --restart=on-failure:10 \
  --volume=/var/run:/var/run \
  --volume=/var/log/ecs/:/log \
  --volume=/var/lib/ecs/data:/data \
  --volume=/etc/ecs:/etc/ecs \
  --volume=/sbin:/sbin \
  --volume=/lib:/lib \
  --volume=/lib64:/lib64 \
  --volume=/usr/lib:/usr/lib \
  --volume=/proc:/host/proc \
  --volume=/sys/fs/cgroup:/sys/fs/cgroup \
  --volume=/var/lib/ecs/dhclient:/var/lib/dhclient \
  --net=host \
  --env ECS_LOGFILE=/log/ecs-agent.log \
  --env ECS_DATADIR=/data \
  --env ECS_UPDATES_ENABLED=false \
  --env ECS_AVAILABLE_LOGGING_DRIVERS='["json-file","syslog","awslogs"]' \
  --env ECS_ENABLE_TASK_IAM_ROLE=true \
  --env ECS_ENABLE_TASK_IAM_ROLE_NETWORK_HOST=true \
  --env ECS_UPDATES_ENABLED=true \
  --env ECS_ENABLE_TASK_ENI=true \
  --env-file=/etc/ecs/ecs.config \
  --cap-add=sys_admin \
  --cap-add=net_admin \
  -d \
  amazon/amazon-ecs-agent:latest

During the provisioning workflow, add a 500 GB SSD (gp2) Amazon EBS volume. For ease of installation, install the Lustre kernel driver first. Also, modify the kernel for compatibility. Install the dkms package first.

sudo apt install -y dkms git

Follow the instructions for Ubuntu 16.04.

Install the CUDA 10 and NVIDIA 410 driver branch according to the instructions provided by NVIDIA. It’s important that the dkms system is installed with the kernel modules being built against the kernel installed earlier.

When complete, install the latest Docker release, as well as nvidia-docker2, according to the instructions in the nvidia-docker GitHub repo, setting the default runtime to “nvidia.”

{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    },
    "default-runtime": "nvidia"
}

At this stage, you can create this AMI and keep it for future deployments. This saves time in bootstrapping, as the generic AMI can be used for a diverse set of applications.

When the FSx for Lustre file system is complete, add the file system information into /etc/fstab:

<file_system_dns_name>@tcp:/fsx /fsx lustre defaults,_netdev 0 0

Confirm that the mounting is successful by using the following command:

sudo mkdir /fsx && sudo mount -a

Building the multi-node parallel batch TensorFlow Docker image

Now, set up the multi-node TensorFlow container image. Keep in mind that this process takes approximately two hours to build on a p3.2xlarge. Use the Dockerfile build scripts for setting up multinode parallel batch jobs.

git clone https://github.com/aws-samples/aws-mnpbatch-template.git
cd aws-mnpbatch-template
docker build -t nvidia/mnp-batch-tensorflow .

As part of the Docker container’s ENTRYPOINT, use the mpi-run.sh script from the Building a tightly coupled molecular dynamics workflow with multi-node parallel jobs in AWS Batch post. Optimize it for running the TensorFlow distributed training as follows:

cd $SCRATCH_DIR
 export INTERFACE=eth0
 export MODEL_HOME=/root/deep-learning-models/models/resnet/tensorflow
 /opt/openmpi/bin/mpirun --allow-run-as-root -np $MPI_GPUS --machinefile ${HOST_FILE_PATH}-deduped -mca plm_rsh_no_tree_spawn 1 \
                        -bind-to socket -map-by slot \
                        $EXTRA_MPI_PARAMS -x LD_LIBRARY_PATH -x PATH -mca pml ob1 -mca btl ^openib \
                        -x NCCL_SOCKET_IFNAME=$INTERFACE -mca btl_tcp_if_include $INTERFACE \
                        -x TF_CPP_MIN_LOG_LEVEL=0 \
                        python3 -W ignore $MODEL_HOME/train_imagenet_resnet_hvd.py \
                        --data_dir $JOB_DIR --num_epochs 90 -b $BATCH_SIZE \
                        --lr_decay_mode poly --warmup_epochs 10 --clear_log

There are some undefined environment variables in the startup command. Those are filled in when you create the multi-node batch job definition file in later stages of this post.

Upon successfully building the Docker image, commit this image to the Amazon ECR registry, to be pulled later. Consult the ECR push commands in the registry by selecting the registry and choose View Push Commands.

One additional tip:  Notice that the Docker image is approximately 12 GB, to ensure that your container instance starts up quickly. I would cache this image in the Docker cache so that incremental layer updates can be pulled from ECR instead of pulling the entire image, which takes more time.

Finally, you should be ready to create this AMI for the AWS Batch compute environment phase of the workflow. In the AWS Batch console, choose Compute environment and create an environment with the following parameters.

Compute environment

  • Compute environment type:  Managed
  • Compute environment name:  tensorflow-gpu-fsx-ce
  • Service role:  AWSBatchServiceRole
  • EC2 instance role:  ecsInstanceRole

Compute resources

Set the minimum and desired vCPUs at 0. When a job is submitted, the underlying AWS Batch service recruits the nodes, taking advantage of the elasticity and scale offered on AWS.

  • Provisioning model: On-Demand
  • Allowed instance types: p3 family, p3dn.24xlarge
  • Minimum vCPUs: 0
  • Desired vCPUs: 0
  • Maximum vCPUs: 4096
  • User-specified AMI: Use the Amazon Linux 2 AMI mentioned earlier.

Networking

AWS Batch makes it easy to specify the placement groups. If you do this, the internode communication between instances has the lowest latencies possible, which is a requirement when running tightly coupled workloads.

  • VPC Id: Choose a VPC that allows access to the FSx cluster created earlier.
  • Security groups: FSx security group, Cluster security group
  • Placement group: tf-group (Create the placement group.)

EC2 tags

  • Key: Name
  • Value: tensorflow-gpu-fsx-processor

Associate this compute environment with a queue called tf-queue. Finally, create a job definition that ties the process together and executes the container.

The following parameters in JSON format sets up the mnp-tensorflow job definition.

{
    "jobDefinitionName": "mnptensorflow-gpu-mnp1",
    "jobDefinitionArn": "arn:aws:batch:us-east-2:<accountid>:job-definition/mnptensorflow-gpu-mnp1:1",
    "revision": 2,
    "status": "ACTIVE",
    "type": "multinode",
    "parameters": {},
    "retryStrategy": {
        "attempts": 1
    },
    "nodeProperties": {
        "numNodes": 20,
        "mainNode": 0,
        "nodeRangeProperties": [
            {
                "targetNodes": "0:19",
                "container": {
                    "image": "<accountid>.dkr.ecr.us-east-2.amazonaws.com/mnp-tensorflow",
                    "vcpus": 62,
                    "memory": 424000,
                    "command": [],
                    "jobRoleArn": "arn:aws:iam::<accountid>:role/ecsTaskExecutionRole",
                    "volumes": [
                        {
                            "host": {
                                "sourcePath": "/scratch"
                            },
                            "name": "scratch"
                        },
                        {
                            "host": {
                                "sourcePath": "/fsx"
                            },
                            "name": "fsx"
                        }
                    ],
                    "environment": [
                        {
                            "name": "SCRATCH_DIR",
                            "value": "/scratch"
                        },
                        {
                            "name": "JOB_DIR",
                            "value": "/fsx/resized"
                        },
                        {
                            "name": "BATCH_SIZE",
                            "value": "256"
                        },
                        {
                            "name": "EXTRA_MPI_PARAMS",
                            "value": "-x HOROVOD_HIERARCHICAL_ALLREDUCE=1 -x HOROVOD_FUSION_THRESHOLD=16777216 -x NCCL_MIN_NRINGS=8 -x NCCL_LAUNCH_MODE=PARALLEL"
                        },
                        {
                            "name": "MPI_GPUS",
                            "value": "160"
                        }
                    ],
                    "mountPoints": [
                        {
                            "containerPath": "/fsx",
                            "sourceVolume": "fsx"
                        },
                        {
                            "containerPath": "/scratch",
                            "sourceVolume": "scratch"
                        }
                    ],
                    "ulimits": [],
                    "instanceType": "p3.16xlarge"
                }
            }
        ]
    }
}

MPI_GPUS 

Total number of GPUs in the cluster. In this case, it’s 20 x p3.16xlarge = 160.

BATCH_SIZE

Number of images of per GPU to load at time for training on 16 GB of memory per GPU = 256.

JOB_DIR

Location of the TFrecords prepared earlier optimized for the number of shards = /fsx/resized.

SCRATCH_DIR

Path to the model outputs = /scratch.

One additional tip:  You have the freedom to expose additional parameters in the job definition. This means that you can also expose model training hyperparameters, which opens the door to multi-parameter optimization (MPO) studies on the AWS Batch layer.

With the job definition created, submit a new job sourcing this job definition, executing on the tf-queue created earlier. This spawns the compute environment.

The AWS Batch service only launches the requested number of nodes. You don’t pay for the running EC2 instances until all requested nodes are launched in your compute environment.

After the job enters the RUNNING state, you can monitor the main container:0 for activity with the CloudWatch log stream created for this job. Some of key entries are as follows, with the 20 nodes joining the cluster. One additional tip: It is possible to use this infrastructure to push the model parameters and training performance to a Tensorboard for additional monitoring.

The next log screenshot shows the main TensorFlow and Horovod workflow starting up. 

Performance monitoring

On 20 p3.16xl nodes, I achieved a comparable speed of approximately 100k images/sec, with close to 90-100% GPU utilization across all 160 GPUs with the containerized Horovod TensorFlow Docker image.

When you have this implemented, try out the cluster using the recently announced p3dn.24xlarge, a 32-GB NVIDIA Tesla V100 memory variant of the p3.16xl with 100-Gbps networking. To take advantage of the full GPU memory of the p3dn in the job definition, increase the BATCH_SIZEenvironmental variable.

Conclusion

With the evolution of a scalable, deep learning–focused, high performance computing environment, you can now use a cloud-native approach. Focus on your code and training while AWS handles the undifferentiated heavy lifting.

As mentioned earlier, this reference architecture has an API interface, thus an event-driven workflow can further extend this work. For example, you can integrate this core compute in an AWS Step Functions workflow to stand up the FSx for Lustre layer. Submit the batch job and collapse the FSx for Lustre layer.

Or through an API Gateway, create a web application for the job submission. Integrate with on-premises resources to transfer data to the S3 bucket and hydrate the FSx for Lustre file system.

If you have any questions about this deployment or how to integrate with a longer AWS posture, please comment below. Now go power up your deep learning workloads with a fully managed, high performance compute framework!

How Reliable are SSDs?

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/how-reliable-are-ssds/

an exploded view of a Samsung Solid State Drive

What’s not to love about solid state drives (SSDs)? They are faster than conventional hard disk drives (HDDs), more compact, have no moving parts, are immune to magnetic fields, and can withstand more shocks and vibration than conventional magnetic platter disks. And, they are becoming available in larger and larger capacities while their cost comes down.

If you’ve upgraded an older computer with an SSD, you no doubt instantly saw the benefits. Your computer booted in less time, your applications loaded faster, and even when you ran out of memory, and apps and data had to be swapped to disk, it felt like everything was much snappier.

We’re now seeing SSDs with capacities that used to be reserved for HDDs and at prices that no longer make our eyes water. 500 GB SSDs are now affordable (under $100), and 1 TB drives are reasonably priced ($100 to $150). Even 2 TB SSDs fall into a budget range for putting together a good performance desktop system ($300 to $400).

We’ve written a number of times on this blog about SSDs, and considered the best uses for SSDs compared to HDDs. We’ve also written about the future of SSDs and how we use them in our data centers and whether we plan on using more in the future.

Reliability

In this post we’re going to consider the issue of SSD reliability. For all their merits, can SSDs be trusted with your data and will they last as long or longer than if you were using an HDD instead? You might have read that SSDs are limited to a finite number of reads and writes before they fail. What’s that all about?

The bottom line question is: do SSD drives fail? Of course they do, as do all drives eventually. The important questions we really need to be asking are 1) do they fail faster than HDDs, and 2) how long can we reasonably expect them to last?

Backing Up Is Great To Do

Of course, as a data storage and backup company, you know what we’re going to say right off. We always recommend that no matter which storage medium you use, you should always have a backup copy of your data. Even if the disk is reliable and in good condition, it won’t do you any good if your computer is stolen, consumed by a flood, or lost in a fire or other act of nature. You might have heard that water damage is the most common computer accident, and few computer components can survive a thorough soaking, especially when powered.

SSD Reliability Factors to Consider

Generally, SSDs are more durable than HDDs in extreme and harsh environments because they don’t have moving parts such as actuator arms. SSDs can withstand accidental drops and other shocks, vibration, extreme temperatures, and magnetic fields better than HDDs. Add to that their small size and lower power consumption, and you can understand why they’re a great fit for laptop computers and mobile applications.

First, let’s cover the basics. Almost all types of today’s SSDs use NAND flash memory. NAND isn’t an acronym like a lot of computer terms. Instead, it’s a name that’s derived from its logic gate called “NOT AND.”

SSD part diagram including Cache, Controller, and NAND Flash Memory

The term following NAND, flash, refers to a non-volatile solid state memory that retains data even when the power source is removed. NAND storage has specific properties that affect how long it will last. When data is written to a NAND cell (also known as programming), the data must be erased before new data can be written to that same cell. NAND is programed and erased by applying a voltage to send electrons through an insulator. The location of those electrons (and their quantity) determine when current will flow between a source and a sink (called a voltage threshold), determining the data stored in that cell (the 1s and 0s). When writing and erasing NAND, it sends the electrons through the insulator and back, and the insulator starts to wear — the exact number of these cycles in each individual cell varies by NAND design. Eventually, the insulator wears to the point where it may have difficulty keeping the electrons in their correct (programmed) location, which makes it increasingly more difficult to determine if the electrons are where they should be, or if they have migrated on their own.

This means that flash type memory cells can only be programmed and erased a limited number of times. This is measured in P/E cycles, which stands for programmed and erased.

P/E cycles are an important measurement of SSD reliability, but there are other factors that are important to consider, as well. These are P/E cycles, TBW (terabytes written), and MTBF (mean time between failures).

The SSD manufacturer will have these specifications available for their products and they can help you understand how long your drive can be expected to last and whether a particular drive is suited to your application.

P/E cycles — A solid-state-storage program-erase cycle is a sequence of events in which data is written to solid-state NAND flash memory cell, then erased, and then rewritten. How many P/E cycles a SSD can endure varies with the technology used, somewhere between 500 to 100,000 P/E cycles.

TBW — Terabytes written is the total amount of data that can be written into an SSD before it is likely to fail. For example, here are the TBW warranties for the popular Samsung 860 EVO SSD: 150 TBW for 250 GB model, 300 TBW for 500 GB model, 600 TBW for 1 TB model, 1,200 TBW for 2 TB model and 2,400 TBW for 4 TB model. Note: these models are warrantied for 5 years or TBW, whichever comes first.

MTBF — MTBF (mean time between failures) is a measure of how reliable a hardware product or component is over its expected lifetime. For most components, the measure is typically in thousands or even tens of thousands of hours between failures. For example, a hard disk drive may have a mean time between failures of 300,000 hours, while an SSD might have 1.5 million hours.

This doesn’t mean that your SSD will last that many hours, what it means is, given a sample set of that model of SSD, errors will occur at a certain rate. A 1.2 million hour MTBF means that if the drive is used at an average of 8 hours a day, a sample size of 1,000 SSDs would be expected to have one failure every 150 days, or about twice a year.

SSD Types

There are a number of different types of SSD, and advancements to the technology continue at a brisk pace. Generally, SSDs are based on four different NAND cell technologies:

  • SLC (Single Level Cell) — one bit per cell
  • When one bit is stored (SLC), it’s not necessary to keep close tabs on electron locations, so a few electrons migrating isn’t much of a concern. Because only a 1 or a 0 is being stored, it’s necessary only to accurately determine if voltage flows or not.

  • MLC (Multi-Level Cell) — two bits per cell
  • MLC stores two bits per cell, so more precision is needed (determining voltage threshold is more complex). It’s necessary to distinguish among 00, 01, 10 or 11. Migrating electrons have more of an impact, so the insulator cannot be worn as much as with SLC.

  • TLC (Triple Level Cell) — three bits per cell
  • This trend continues with TLC where three bits are stored: 001, 010, 100, …110 and 111. Migrating electrons have more effect than in MLC, which further reduces tolerable insulator wear.

  • QLC (Quad Level Cell) — four bits per cell
  • QLC stores four bits (16 possible combinations of 1s and 0s). With QLC, migrating electrons have the most significant effect. Tolerable insulator wear is further reduced.

    QLC is a good fit for read-centric workloads because NAND cells are worn negligibly when reading data versus worn more when writing data (programming and erasing). When writing and rewriting a lot of data, the insulator wears more quickly. If a NAND cell can tolerate that wear, it is well suited to read/write mixed accesses. The less wear-tolerable NAND cells are, the better they are suited for read-centric workloads and applications.

Each subsequent technology for NAND allows it to store an extra bit. The fewer bits per NAND cell, the faster, more reliable, and more energy efficient the technology is — and also, more expensive. A SLC SSD would technically be the most reliable SSD as it can endure more writes, while a QLC is the least reliable. If you’re selecting an SSD for an application where it will be written more than read, than the selection of NAND cell technology could be a significant factor in your decision. If your application is general computer use, it likely will matter less to you.

How Reliability Factors Affect Your Choice of SSD

How important these factors are to you depends on how the SSD is used. The right question to ask is how a drive will perform in your application. There are different performance and reliability criteria depending on whether the SSD will be used in a home desktop computer, a data center, or an exploration vehicle on Mars.

Manufacturers sometimes specify the type of application workload for which an SSD is designed, such as write-intensive, read-intensive or mixed-use. Some vendors allow the customer to select the optimal level of endurance and capacity for a particular SSD. For instance, an enterprise user with a high-transaction database might opt for a higher number of drive writes at the expense of capacity. Or a user operating a database that does infrequent writes might choose a lower drive writes number and a higher capacity.

Signs of SSD Failure

SSDs will eventually fail, but there usually are advance warnings of when that’s going to happen. You’ve likely encountered the dreaded clicking sound that emanates from a dying HDD. As an SSD has no moving parts, so we won’t get an audible warning that an SSD is about to fail us. You should be paying attention for a number of indicators that your SSD is nearing its end of life, and take action by replacing that drive with a new one.

1) Errors Involving Bad Blocks

Much like bad sectors on HDDs, there are bad blocks on SSDs. This is typically a scenario where the computer attempts to read or save a file, but it takes an unusually long time and ends in failure, so the system eventually gives up with an error message.

2) Files Cannot Be Read or Written

There are two ways in which a bad block can affect your files, 1) the system detects the bad block while writing data to the drive, and thus refuses to write data, and 2), the system detects the bad block after the data has been written, and thus refuses to read that data.

3) The File System Needs Repair
Getting an error message on your screen can happen simply because the computer was not shut down properly, but it also could be a sign of an SSD developing bad blocks or other problems.

4) Crashing During Boot
A crash during the computer boot is a sign that your drive could be developing a problem. You should make sure you have a current backup of all your data before it gets worse and the drive fails completely.

5) The Drive Becomes Read-Only
Your drive might refuse to write any more data to disk and can only read data. Fortunately, you can still get your data off the disk.

SSDs Generally Will Last As Long As You Need Them To

Let’s go back to the two questions we asked above.

Q: Do SSDs fail faster than HDDs?

A: That depends on the technology of the drives and how they’re used. HDDs are better suited for some applications and SSDs for others. SSDs can be expected to last as long or longer than HDDs in most general applications.

and

Q: How long can we reasonably expect an SSD to last?

A: An SSD should last as long as its manufacturer expects it to last (e.g. five years), provided that the use of the drive is not excessive for the technology it employs (e.g. using a QLC in an application with a high number of writes). Consult the manufacturer’s recommendations to ensure that how you’re using the SSD matches its best use.

SSDs are a different breed of animal than a HDD and they have their strengths and weaknesses relative to other storage media. The good news is that their strengths — speed, durability, size, power consumption, etc. — are backed by pretty good overall reliability.

SSD users are far more likely to replace their storage drive because they’re ready to upgrade to a newer technology, higher capacity, or faster drive, than having to replace the drive due to a short lifespan. Under normal use we can expect an SSD to last years. If you replace your computer every three years, as most users do, then you probably needn’t worry about whether your SSD will outlast your computer. What’s important is whether the SSD will be sufficiently reliable that you won’t lose your data.

As we saw above, if you’re paying attention to your system, you will be given ample warning of an impending drive failure, and you can replace the drive before the data is not readable.

It’s good to understand how the different SSD technologies affect their reliability, and whether it’s worth it to spend extra money for SLC over MLC or QLC. However, unless you’re using an SSD in a specialized application with more writes than reads as we described above, just selecting a good quality SSD from a reputable manufacturer should be enough to make you feel confident that your SSD will have a useful life span.

Keep an eye out for any signs of failure or bad sectors, and, of course, be sure to have a solid backup plan no matter what type of drive you’re using.

The post How Reliable are SSDs? appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

[$] Development statistics for the 5.0 kernel

Post Syndicated from corbet original https://lwn.net/Articles/780271/rss

The announcement of the 5.0-rc7 kernel
prepatch on February 17 signaled the imminent release of the final 5.0
kernel and the end of this development cycle. 5.0, as it turns out,
brought in fewer changesets than its immediate predecessors, but it was
still a busy cycle with a lot of developers participating. Read on for an
overview of where the work came from in this release cycle.

Make art with LEDs | HackSpace magazine #16

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/make-art-with-leds-hackspace-16/

Create something beautiful with silicon, electricity, your endless imagination, and HackSpace magazine issue 16 — out today!

HackSpace magazine 16

LEDs are awesome

Basically, LEDs are components that convert electrical power into light. Connect them to a power source (with some form of current limiter) in the right orientation, and they’ll glow.

Each LED has a single colour. Fortunately, manufacturers can pack three LEDs (red, green, and blue) into a single component, and varying the power to each LED-within-an-LED produces a wide range of hues. However, by itself, this type of colourful LED is a little tricky to control: each requires three inputs, so a simple 10×10 matrix would require 300 inputs. But there’s a particular trick electronics manufacturers have that make RGB LEDs easy to use: making the LEDs addressable!

An RGB LED

Look: you can clearly see the red, green, and blue elements of this RGB LED

Addressable LEDs

Addressable LEDs have microcontrollers built into them. These aren’t powerful, programmable microcontrollers, they’re just able to handle a simple communications protocol. There are quite a few different types of addressable LEDs, but two are most popular with makers: WS2812 (often called NeoPixels) and APA102 (often called DotStars). Both are widely available from maker stores and direct-from-China websites. NeoPixels use a single data line, while DotStars use a signal and a clock line. Both, however, are chainable. This means that you connect one (for NeoPixels) or two (for DotStars) pins of your microcontroller to the Data In connectors on the first LED, then the output of this LED to the input of the next, and so on.

Exactly how many LEDs you can chain together depends on a few different things, including the power of the microcontroller and the intended refresh rate. Often, though, the limiting factor for most hobbyists is the amount of electricity you need.

Which type to use

The big difference between NeoPixels and DotStars comes down to the speed of them. LEDs are made dimmer by turning them off and on very quickly. The proportion of the time they’re off, the dimmer they are. This is known as pulse-width modulation (PWM). The speed at which this blinking on and off can have implications for some makes, such as when the LEDs are moving quickly.

NeoPixels

  • Cheap
  • Slowish refresh rate
  • Slowish PWM rate

DotStars

  • More expensive
  • Faster refresh rate
  • Fast PWM rate
NeoPixels moving in the dark

As a NeoPixel is moved through a long-exposure photograph, you can see it blink on and off. DotStars – which have a faster PWM rate – avoid this.

Safety first!

HackSpace magazine’s LED feature is just a whistle-stop guide to the basics of powering LEDs — it’s not a comprehensive guide to all things power-related. Once you go above a few amperes, you need to think about what you’re doing with power. Once you start to approach double figures, you need to make sure you know what you’re doing and, if you find yourself shopping for an industrial power supply, then you really need to make sure you know how to use it safely.

Read more

Read the rest of the exclusive 14-page LED special in HackSpace magazine issue 16, out today. Buy your copy now from the Raspberry Pi Press store, major newsagents in the UK, or Barnes & Noble, Fry’s, or Micro Center in the US. Or, download your free PDF copy from the HackSpace magazine website.

HackSpace magazine 16 Front Cover

We’re also shipping to stores in Australia, Hong Kong, Canada, Singapore, Belgium, and Brazil, so be sure to ask your local newsagent whether they’ll be getting HackSpace magazine.

Subscribe now

Subscribe to HackSpace on a monthly, quarterly, or twelve-month basis to save money against newsstand prices.

Twelve-month print subscribers get a free Adafruit Circuit Playground Express, loaded with inputs and sensors and ready for your next project. Tempted?

The post Make art with LEDs | HackSpace magazine #16 appeared first on Raspberry Pi.

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close