Tag Archives: iq

Backblaze B2, Cloud Storage on a Budget: One Year Later

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/backblaze-b2-cloud-storage-on-a-budget-one-year-later/

B2 Cloud Storage Review

A year ago, Backblaze B2 Cloud Storage came out of beta and became available for everyone to use. We were pretty excited, even though it seemed like everyone and their brother had a cloud storage offering. Now that we are a year down the road let’s see how B2 has fared in the real world of tight budgets, maxed-out engineering schedules, insanely funded competition, and more. Spoiler alert: We’re still pretty excited…

Cloud Storage on a Budget

There are dozens of companies offering cloud storage and the landscape is cluttered with incomprehensible pricing models, cleverly disguised transfer and download charges, and differing levels of service that seem to be driven more by marketing departments than customer needs.

Backblaze B2 keeps things simple: A single performant level of service, a single affordable price for storage ($0.005/GB/month), a single affordable price for downloads ($0.02/GB), and a single list of transaction charges – all on a single pricing page.

Who’s Using B2?

By making cloud storage affordable, companies and organizations now have a way to store their data in the cloud and still be able to access and restore it as quickly as needed. You don’t have to choose between price and performance. Here are a few examples:

  • Media & Entertainment: KLRU-TV, Austin PBS, is using B2 to preserve their video catalog of the world renown musical anthology series, Austin City Limits.
  • LTO Migration: The Girl Scouts San Diego, were able to move their daily incremental backups from LTO tape to the cloud, saving money and time, while helping automate their entire backup process.
  • Cloud Migration: Vintage Aerial found it cost effective to discard their internal data server and store their unique hi-resolution images in B2 Cloud Storage.
  • Backup: Ahuja and Clark, a boutique accounting firm, was able to save over 80% on the cost to backup all their corporate and client data.

How is B2 Being Used?

B2 Cloud Storage can be accessed in four ways: using the Web GUI, using the CLI, using the API library, and using a product or service integrated with B2. While many customers are using the Web GUI, CLI and API to store and retrieve data, the most prolific use of B2 occurs via our integration partners. Each integration partner has certified they have met our best practices for integrating to B2 and we’ve tested each of the integrations submitted to us. Here are a few of the highlights.

  • NAS Devices – Synology and QNAP have integrations which allow their NAS devices to sync their data to/from B2.
  • Backup and Sync – CloudBerry, GoodSync, and Retrospect are just a few of the services that can backup and/or sync data to/from B2.
  • Hybrid Cloud – 45 Drives and OpenIO are solutions that allow you to setup and operate a hybrid data storage cloud environment.
  • Desktop Apps – CyberDuck, MountainDuck, Dropshare, and more allow users an easy way to store and use data in B2 right from your desktop.
  • Digital Asset Management – Cantemo, Cubix, CatDV, and axle Video, let you catalog your digital assets and then store them in B2 for fast retrieval when they are needed.

If you have an application or service that stores data in the cloud and it isn’t integrated with Backblaze B2, then your customers are probably paying too much for cloud storage.

What’s New in B2?

B2 Fireball – our rapid data ingest service. We send you a storage device, and you load it up with up to 40 TB of data and send it back, then we load the data into your B2 account. The cost is $550 per trip plus shipping. Save your network bandwidth with the B2 Fireball.

Lowered the download price – When we introduced B2, we set the price to download a gigabyte of data to be $0.05/GB – the same as most competitors. A year in, we reevaluated the price based on usage and decided to lower the price to $0.02/GB.

B2 User Groups – Backblaze Groups functionality is now available in B2. An administrator can invite users to a B2 centric Group to centralize the storage location for that group of users. For example, multiple members of a department working on a project will be able to archive their work-in-process activities into a single B2 bucket.

Time Machine backup – You may know that you can use your Synology NAS as the destination for your Time Machine backup. With B2 you can also sync your Synology NAS to B2 for a true 3-2-1 backup solution. If your system crashes or is lost, you can restore your Time Machine image directly from B2 to your new machine.

Life Cycle Rules – Create rules that allow you to manage the length of time deleted files will remain in your B2 bucket before they are deleted. A great option for managing the cleanup of outdated file versions to save on storage costs.

Large Files – In the B2 Web GUI you can upload files as large as 500 MB using either the upload or drag-and-drop functionality. The B2 CLI and API support the ability to upload/download files as large as 10 TB.

5 MB file part size – When working with large files, the minimum file part size can now be set as low as 5 MB versus the previous low setting of 100 MB. Now the range of a file part when working with large files can be from 5 MB to 5GB. This increases the throughput of your data uploads and downloads.

SHA-1 at the end – This feature allows you to compute the SHA-1 checksum and append it to the end of the request body versus doing the computation before the file is sent. This is especially useful for those applications which stream data to/from B2.

Cache-Control – When data is downloaded from B2 into a browser, the length of time the file remains in the browser cache can be set at the bucket level using the b2_create_bucket and b2_update_bucket API calls. Setting this policy is optional.

Customized delimiters – Used in the API, this allows you to specify a delimiter to use for a given purpose. A common use is to set a delimiter in the file name string. Then use that delimiter to detect a folder name within the string.

Looking Ahead

Over the past year we added nearly 30,000 new B2 customers to the fold and are welcoming more and more each day as B2 continues to grow. We have plans to expand our storage footprint by adding more data centers as we look forward to moving towards a multi-region environment.

For those of you who are B2 customers – thank you for helping build B2. If you have an interesting way you are using B2, tell us in the comments below.

The post Backblaze B2, Cloud Storage on a Budget: One Year Later appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Intel Skylake/Kaby Lake processors: broken hyper-threading

Post Syndicated from ris original https://lwn.net/Articles/726496/rss

Henrique de Moraes Holschuh has posted an advisory about a processor/microcode
defect recently identified on Intel Skylake and Intel Kaby Lake processors
with hyper-threading enabled. “TL;DR: unfixed Skylake and Kaby Lake
processors could, in some situations, dangerously misbehave when
hyper-threading is enabled. Disable hyper-threading immediately in
BIOS/UEFI to work around the problem. Read this advisory for instructions
about an Intel-provided fix.

Building Loosely Coupled, Scalable, C# Applications with Amazon SQS and Amazon SNS

Post Syndicated from Tara Van Unen original https://aws.amazon.com/blogs/compute/building-loosely-coupled-scalable-c-applications-with-amazon-sqs-and-amazon-sns/

 
Stephen Liedig, Solutions Architect

 

One of the many challenges professional software architects and developers face is how to make cloud-native applications scalable, fault-tolerant, and highly available.

Fundamental to your project success is understanding the importance of making systems highly cohesive and loosely coupled. That means considering the multi-dimensional facets of system coupling to support the distributed nature of the applications that you are building for the cloud.

By that, I mean addressing not only the application-level coupling (managing incoming and outgoing dependencies), but also considering the impacts of of platform, spatial, and temporal coupling of your systems. Platform coupling relates to the interoperability, or lack thereof, of heterogeneous systems components. Spatial coupling deals with managing components at a network topology level or protocol level. Temporal, or runtime coupling, refers to the ability of a component within your system to do any kind of meaningful work while it is performing a synchronous, blocking operation.

The AWS messaging services, Amazon SQS and Amazon SNS, help you deal with these forms of coupling by providing mechanisms for:

  • Reliable, durable, and fault-tolerant delivery of messages between application components
  • Logical decomposition of systems and increased autonomy of components
  • Creating unidirectional, non-blocking operations, temporarily decoupling system components at runtime
  • Decreasing the dependencies that components have on each other through standard communication and network channels

Following on the recent topic, Building Scalable Applications and Microservices: Adding Messaging to Your Toolbox, in this post, I look at some of the ways you can introduce SQS and SNS into your architectures to decouple your components, and show how you can implement them using C#.

Walkthrough

To illustrate some of these concepts, consider a web application that processes customer orders. As good architects and developers, you have followed best practices and made your application scalable and highly available. Your solution included implementing load balancing, dynamic scaling across multiple Availability Zones, and persisting orders in a Multi-AZ Amazon RDS database instance, as in the following diagram.


In this example, the application is responsible for handling and persisting the order data, as well as dealing with increases in traffic for popular items.

One potential point of vulnerability in the order processing workflow is in saving the order in the database. The business expects that every order has been persisted into the database. However, any potential deadlock, race condition, or network issue could cause the persistence of the order to fail. Then, the order is lost with no recourse to restore the order.

With good logging capability, you may be able to identify when an error occurred and which customer’s order failed. This wouldn’t allow you to “restore” the transaction, and by that stage, your customer is no longer your customer.

As illustrated in the following diagram, introducing an SQS queue helps improve your ordering application. Using the queue isolates the processing logic into its own component and runs it in a separate process from the web application. This, in turn, allows the system to be more resilient to spikes in traffic, while allowing work to be performed only as fast as necessary in order to manage costs.


In addition, you now have a mechanism for persisting orders as messages (with the queue acting as a temporary database), and have moved the scope of your transaction with your database further down the stack. In the event of an application exception or transaction failure, this ensures that the order processing can be retired or redirected to the Amazon SQS Dead Letter Queue (DLQ), for re-processing at a later stage. (See the recent post, Using Amazon SQS Dead-Letter Queues to Control Message Failure, for more information on dead-letter queues.)

Scaling the order processing nodes

This change allows you now to scale the web application frontend independently from the processing nodes. The frontend application can continue to scale based on metrics such as CPU usage, or the number of requests hitting the load balancer. Processing nodes can scale based on the number of orders in the queue. Here is an example of scale-in and scale-out alarms that you would associate with the scaling policy.

Scale-out Alarm

aws cloudwatch put-metric-alarm --alarm-name AddCapacityToCustomerOrderQueue --metric-name ApproximateNumberOfMessagesVisible --namespace "AWS/SQS" 
--statistic Average --period 300 --threshold 3 --comparison-operator GreaterThanOrEqualToThreshold --dimensions Name=QueueName,Value=customer-orders
--evaluation-periods 2 --alarm-actions <arn of the scale-out autoscaling policy>

Scale-in Alarm

aws cloudwatch put-metric-alarm --alarm-name RemoveCapacityFromCustomerOrderQueue --metric-name ApproximateNumberOfMessagesVisible --namespace "AWS/SQS" 
 --statistic Average --period 300 --threshold 1 --comparison-operator LessThanOrEqualToThreshold --dimensions Name=QueueName,Value=customer-orders
 --evaluation-periods 2 --alarm-actions <arn of the scale-in autoscaling policy>

In the above example, use the ApproximateNumberOfMessagesVisible metric to discover the queue length and drive the scaling policy of the Auto Scaling group. Another useful metric is ApproximateAgeOfOldestMessage, when applications have time-sensitive messages and developers need to ensure that messages are processed within a specific time period.

Scaling the order processing implementation

On top of scaling at an infrastructure level using Auto Scaling, make sure to take advantage of the processing power of your Amazon EC2 instances by using as many of the available threads as possible. There are several ways to implement this. In this post, we build a Windows service that uses the BackgroundWorker class to process the messages from the queue.

Here’s a closer look at the implementation. In the first section of the consuming application, use a loop to continually poll the queue for new messages, and construct a ReceiveMessageRequest variable.

public static void PollQueue()
{
    while (_running)
    {
        Task<ReceiveMessageResponse> receiveMessageResponse;

        // Pull messages off the queue
        using (var sqs = new AmazonSQSClient())
        {
            const int maxMessages = 10;  // 1-10

            //Receiving a message
            var receiveMessageRequest = new ReceiveMessageRequest
            {
                // Get URL from Configuration
                QueueUrl = _queueUrl, 
                // The maximum number of messages to return. 
                // Fewer messages might be returned. 
                MaxNumberOfMessages = maxMessages, 
                // A list of attributes that need to be returned with message.
                AttributeNames = new List<string> { "All" },
                // Enable long polling. 
                // Time to wait for message to arrive on queue.
                WaitTimeSeconds = 5 
            };

            receiveMessageResponse = sqs.ReceiveMessageAsync(receiveMessageRequest);
        }

The WaitTimeSeconds property of the ReceiveMessageRequest specifies the duration (in seconds) that the call waits for a message to arrive in the queue before returning a response to the calling application. There are a few benefits to using long polling:

  • It reduces the number of empty responses by allowing SQS to wait until a message is available in the queue before sending a response.
  • It eliminates false empty responses by querying all (rather than a limited number) of the servers.
  • It returns messages as soon any message becomes available.

For more information, see Amazon SQS Long Polling.

After you have returned messages from the queue, you can start to process them by looping through each message in the response and invoking a new BackgroundWorker thread.

// Process messages
if (receiveMessageResponse.Result.Messages != null)
{
    foreach (var message in receiveMessageResponse.Result.Messages)
    {
        Console.WriteLine("Received SQS message, starting worker thread");

        // Create background worker to process message
        BackgroundWorker worker = new BackgroundWorker();
        worker.DoWork += (obj, e) => ProcessMessage(message);
        worker.RunWorkerAsync();
    }
}
else
{
    Console.WriteLine("No messages on queue");
}

The event handler, ProcessMessage, is where you implement business logic for processing orders. It is important to have a good understanding of how long a typical transaction takes so you can set a message VisibilityTimeout that is long enough to complete your operation. If order processing takes longer than the specified timeout period, the message becomes visible on the queue. Other nodes may pick it and process the same order twice, leading to unintended consequences.

Handling Duplicate Messages

In order to manage duplicate messages, seek to make your processing application idempotent. In mathematics, idempotent describes a function that produces the same result if it is applied to itself:

f(x) = f(f(x))

No matter how many times you process the same message, the end result is the same (definition from Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions, Hohpe and Wolf, 2004).

There are several strategies you could apply to achieve this:

  • Create messages that have inherent idempotent characteristics. That is, they are non-transactional in nature and are unique at a specified point in time. Rather than saying “place new order for Customer A,” which adds a duplicate order to the customer, use “place order <orderid> on <timestamp> for Customer A,” which creates a single order no matter how often it is persisted.
  • Deliver your messages via an Amazon SQS FIFO queue, which provides the benefits of message sequencing, but also mechanisms for content-based deduplication. You can deduplicate using the MessageDeduplicationId property on the SendMessage request or by enabling content-based deduplication on the queue, which generates a hash for MessageDeduplicationId, based on the content of the message, not the attributes.
var sendMessageRequest = new SendMessageRequest
{
    QueueUrl = _queueUrl,
    MessageBody = JsonConvert.SerializeObject(order),
    MessageGroupId = Guid.NewGuid().ToString("N"),
    MessageDeduplicationId = Guid.NewGuid().ToString("N")
};
  • If using SQS FIFO queues is not an option, keep a message log of all messages attributes processed for a specified period of time, as an alternative to message deduplication on the receiving end. Verifying the existence of the message in the log before processing the message adds additional computational overhead to your processing. This can be minimized through low latency persistence solutions such as Amazon DynamoDB. Bear in mind that this solution is dependent on the successful, distributed transaction of the message and the message log.

Handling exceptions

Because of the distributed nature of SQS queues, it does not automatically delete the message. Therefore, you must explicitly delete the message from the queue after processing it, using the message ReceiptHandle property (see the following code example).

However, if at any stage you have an exception, avoid handling it as you normally would. The intention is to make sure that the message ends back on the queue, so that you can gracefully deal with intermittent failures. Instead, log the exception to capture diagnostic information, and swallow it.

By not explicitly deleting the message from the queue, you can take advantage of the VisibilityTimeout behavior described earlier. Gracefully handle the message processing failure and make the unprocessed message available to other nodes to process.

In the event that subsequent retries fail, SQS automatically moves the message to the configured DLQ after the configured number of receives has been reached. You can further investigate why the order process failed. Most importantly, the order has not been lost, and your customer is still your customer.

private static void ProcessMessage(Message message)
{
    using (var sqs = new AmazonSQSClient())
    {
        try
        {
            Console.WriteLine("Processing message id: {0}", message.MessageId);

            // Implement messaging processing here
            // Ensure no downstream resource contention (parallel processing)
            // <your order processing logic in here…>
            Console.WriteLine("{0} Thread {1}: {2}", DateTime.Now.ToString("s"), Thread.CurrentThread.ManagedThreadId, message.MessageId);
            
            // Delete the message off the queue. 
            // Receipt handle is the identifier you must provide 
            // when deleting the message.
            var deleteRequest = new DeleteMessageRequest(_queueName, message.ReceiptHandle);
            sqs.DeleteMessageAsync(deleteRequest);
            Console.WriteLine("Processed message id: {0}", message.MessageId);

        }
        catch (Exception ex)
        {
            // Do nothing.
            // Swallow exception, message will return to the queue when 
            // visibility timeout has been exceeded.
            Console.WriteLine("Could not process message due to error. Exception: {0}", ex.Message);
        }
    }
}

Using SQS to adapt to changing business requirements

One of the benefits of introducing a message queue is that you can accommodate new business requirements without dramatically affecting your application.

If, for example, the business decided that all orders placed over $5000 are to be handled as a priority, you could introduce a new “priority order” queue. The way the orders are processed does not change. The only significant change to the processing application is to ensure that messages from the “priority order” queue are processed before the “standard order” queue.

The following diagram shows how this logic could be isolated in an “order dispatcher,” whose only purpose is to route order messages to the appropriate queue based on whether the order exceeds $5000. Nothing on the web application or the processing nodes changes other than the target queue to which the order is sent. The rates at which orders are processed can be achieved by modifying the poll rates and scalability settings that I have already discussed.

Extending the design pattern with Amazon SNS

Amazon SNS supports reliable publish-subscribe (pub-sub) scenarios and push notifications to known endpoints across a wide variety of protocols. It eliminates the need to periodically check or poll for new information and updates. SNS supports:

  • Reliable storage of messages for immediate or delayed processing
  • Publish / subscribe – direct, broadcast, targeted “push” messaging
  • Multiple subscriber protocols
  • Amazon SQS, HTTP, HTTPS, email, SMS, mobile push, AWS Lambda

With these capabilities, you can provide parallel asynchronous processing of orders in the system and extend it to support any number of different business use cases without affecting the production environment. This is commonly referred to as a “fanout” scenario.

Rather than your web application pushing orders to a queue for processing, send a notification via SNS. The SNS messages are sent to a topic and then replicated and pushed to multiple SQS queues and Lambda functions for processing.

As the diagram above shows, you have the development team consuming “live” data as they work on the next version of the processing application, or potentially using the messages to troubleshoot issues in production.

Marketing is consuming all order information, via a Lambda function that has subscribed to the SNS topic, inserting the records into an Amazon Redshift warehouse for analysis.

All of this, of course, is happening without affecting your order processing application.

Summary

While I haven’t dived deep into the specifics of each service, I have discussed how these services can be applied at an architectural level to build loosely coupled systems that facilitate multiple business use cases. I’ve also shown you how to use infrastructure and application-level scaling techniques, so you can get the most out of your EC2 instances.

One of the many benefits of using these managed services is how quickly and easily you can implement powerful messaging capabilities in your systems, and lower the capital and operational costs of managing your own messaging middleware.

Using Amazon SQS and Amazon SNS together can provide you with a powerful mechanism for decoupling application components. This should be part of design considerations as you architect for the cloud.

For more information, see the Amazon SQS Developer Guide and Amazon SNS Developer Guide. You’ll find tutorials on all the concepts covered in this post, and more. To can get started using the AWS console or SDK of your choice visit:

Happy messaging!

New Technique to Hijack Social Media Accounts

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/06/new_technique_t.html

Access Now has documented it being used against a Twitter user, but it also works against other social media accounts:

With the Doubleswitch attack, a hijacker takes control of a victim’s account through one of several attack vectors. People who have not enabled an app-based form of multifactor authentication for their accounts are especially vulnerable. For instance, an attacker could trick you into revealing your password through phishing. If you don’t have multifactor authentication, you lack a secondary line of defense. Once in control, the hijacker can then send messages and also subtly change your account information, including your username. The original username for your account is now available, allowing the hijacker to register for an account using that original username, while providing different login credentials.

Three news stories.

The Pirate Bay Isn’t Affected By Adverse Court Rulings – Everyone Else Is

Post Syndicated from Andy original https://torrentfreak.com/the-pirate-bay-isnt-affected-by-adverse-court-rulings-everyone-else-is-170618/

For more than a decade The Pirate Bay has been the world’s most controversial site. Delivering huge quantities of copyrighted content to the masses, the platform is revered and reviled across the copyright spectrum.

Its reputation is one of a defiant Internet swashbuckler, but due to changes in how the site has been run in more recent times, its current philosophy is more difficult to gauge. What has never been in doubt, however, is the site’s original intent to be as provocative as possible.

Through endless publicity stunts, some real, some just for the ‘lulz’, The Pirate Bay managed to attract a massive audience, all while incurring the wrath of every major copyright holder in the world.

Make no mistake, they all queued up to strike back, but every subsequent rightsholder action was met by a Pirate Bay middle finger, two fingers, or chin flick, depending on the mood of the day. This only served to further delight the masses, who happily spread the word while keeping their torrents flowing.

This vicious circle of being targeted by the entertainment industries, mocking them, and then reaping the traffic benefits, developed into the cheapest long-term marketing campaign the Internet had ever seen. But nothing is ever truly for free and there have been consequences.

After taunting Hollywood and the music industry with its refusals to capitulate, endless legal action that the site would have ordinarily been forced to participate in largely took place without The Pirate Bay being present. It doesn’t take a law degree to work out what happened in each and every one of those cases, whatever complex route they took through the legal system. No defense, no win.

For example, the web-blocking phenomenon across the UK, Europe, Asia and Australia was driven by the site’s absolute resilience and although there would clearly have been other scapegoats had The Pirate Bay disappeared, the site was the ideal bogeyman the copyright lobby required to move forward.

Filing blocking lawsuits while bringing hosts, advertisers, and ISPs on board for anti-piracy initiatives were also made easier with the ‘evil’ Pirate Bay still online. Immune from every anti-piracy technique under the sun, the existence of the platform in the face of all onslaughts only strengthened the cases of those arguing for even more drastic measures.

Over a decade, this has meant a significant tightening of the sharing and streaming climate. Without any big legislative changes but plenty of case law against The Pirate Bay, web-blocking is now a walk in the park, ad hoc domain seizures are a fairly regular occurrence, and few companies want to host sharing sites. Advertisers and brands are also hesitant over where they place their ads. It’s a very different world to the one of 10 years ago.

While it would be wrong to attribute every tightening of the noose to the actions of The Pirate Bay, there’s little doubt that the site and its chaotic image played a huge role in where copyright enforcement is today. The platform set out to provoke and succeeded in every way possible, gaining supporters in their millions. It could also be argued it kicked a hole in a hornets’ nest, releasing the hell inside.

But perhaps the site’s most amazing achievement is the way it has managed to stay online, despite all the turmoil.

This week yet another ruling, this time from the powerful European Court of Justice, found that by offering links in the manner it does, The Pirate Bay and other sites are liable for communicating copyright works to the public. Of course, this prompted the usual swathe of articles claiming that this could be the final nail in the site’s coffin.

Wrong.

In common with every ruling, legal defeat, and legislative restriction put in place due to the site’s activities, this week’s decision from the ECJ will have zero effect on the Pirate Bay’s availability. For right or wrong, the site was breaking the law long before this ruling and will continue to do so until it decides otherwise.

What we have instead is a further tightened legal landscape that will have a lasting effect on everything BUT the site, including weaker torrent sites, Internet users, and user-uploaded content sites such as YouTube.

With The Pirate Bay carrying on regardless, that is nothing short of remarkable.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

[$] Assembling the history of Unix

Post Syndicated from corbet original https://lwn.net/Articles/725297/rss

The moment when an antique operating system that has not run in decades
boots and presents a command prompt is thrilling for Warren Toomey, who
founded the Unix Heritage Society to
reconstruct the early history of the Unix operating system. Recently this
historical code has become much more accessible: we can now browse it in an
instant on GitHub
, thanks to the efforts of a computer science
professor at the Athens University of Economics and Business named Diomidis
Spinellis.

Click below (subscribers only) for a look at the Unix Heritage Society and
what it has accomplished.

[$] A survey of scheduler benchmarks

Post Syndicated from corbet original https://lwn.net/Articles/725238/rss

Many benchmarks have been used by kernel developers over the years to
test the performance of the scheduler. But recent kernel commit messages
have shown a particular pattern of tools being used (some relatively new),
all of which were created specifically for developing scheduler patches.
While each benchmark is different, having its own unique genesis story and
intended testing scenario, there is a unifying attribute; they were all
written to scratch a developer’s itch.

Kodi Turmoil Continues as TVAddons Mysteriously Disappears

Post Syndicated from Ernesto original https://torrentfreak.com/kodi-turmoil-continues-as-tvaddons-mysteriously-disappears-170613/

Last week we broke the news that third-party Kodi add-on ZemTV and the TVAddons library were being sued in a federal court in Texas.

Since then, the ‘pirate’ Kodi community has been in turmoil. Several popular Kodi addons decided to shut down, and now TVAddons itself appears to be in trouble as well.

TVAddons is pne of the largest repositories of Kodi add-ons, many of which allow users to watch pirated content. The site has grown massively in recent years and reported that nearly 40 million unique users connected to the site’s servers in March.

Since yesterday, however, these millions of users can no longer access the site. Without prior warning or a public explanation, TVAddons’ domain name stopped responding. The domain’s DNS entries have been removed which means that it’s no longer accessible to the public.

Those who try to access the site either get a browser error message, or are redirected to a page of TVAddons’ domain name registrar Uniregistry.com (in some cases people may still see the site, if the DNS entries are cached).

TVAddons.ag can’t be reached

For now, it’s unclear who removed the DNS entries and why. The registrar could have taken this action, but TVAddons may have done it themselves too.

TorrentFreak reached out to TVAddons a few times over the past several days but without response. The site’s spokesperson was previously quick to reply, but after the Dish lawsuit became public this changed.

In response to our latest email inquiry, we received an error message, suggesting that the site’s official email addresses are no longer functioning due to the domain troubles.

TVAddons has also gone quiet on social media, where TVAddons has been very active in the past. However, the last updates on Twitter and Facebook date back more than a week ago. In fact, a few hours ago TVAddons’ Facebook page disappeared completely.

Facebook page unavailable

Based on the current downtime issues, it’s no surprise that people are getting worried. If TVAddons doesn’t return, the Kodi-addon community has lost what’s arguably its biggest player.

The site’s extensive library listed 1,500 different add-ons, of which the community-maintained Exodus addon was one of the most popular. Now that the site is no longer available, people may run into issues while updating these.

That said, it’s best not to jump to conclusions without an official explanation from the team. If we find out more, this article will be updated accordingly.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Who’s To Blame For The Kodi Crackdown?

Post Syndicated from Andy original https://torrentfreak.com/whos-to-blame-for-the-kodi-crackdown-170611/

Perfectly legal as standard, the Kodi media player can be easily modified to turn it into the ultimate streaming piracy machine.

Uptake by users has been nothing short of phenomenal. Millions of people are now consuming illicit media through third-party Kodi addons. With free movies, TV shows, sports, live TV and more on tap, it’s not difficult to see why the system is so popular.

As a result, barely a day goes by without Kodi making headlines and this week was no exception. On Monday, TorrentFreak broke the news that the ZEMTV addon and TV Addons, one of the most popular addon communities, were being sued by Dish Network for copyright infringement.

Within hours of the announcement and apparently as a direct result, several addons (including the massively popular Phoenix) decided to throw in the towel. Quite understandably, users of the platforms were disappointed, and that predictably resulted in people attempting to apportion blame.

The first comment to catch the eye was posted directly beneath our article. Interestingly, it placed the blame squarely on our shoulders.

“Thanks Torrentfreak, for ruining Kodi,” it read.

While shooting the messenger is an option, it’s historically problematic. Town criers were the original newsreaders, delivering important messages to the public. Killing a town crier was considered treason, but it was also pointless – it didn’t change the facts on the ground.

So if we can’t kill those who read about a lawsuit in the public PACER system and reported it, who’s left to blame? Unsurprisingly, there’s no shortage of targets, but most of them fall short.

The underlying theme is that most people voicing a negative opinion about the profile of Kodi do not appreciate their previously niche piracy system being in the spotlight. Everything was just great when just a few people knew about the marvelous hidden world of ‘secret’ XBMC/Kodi addons, many insist, but seeing it in the mainstream press is a disaster. It’s difficult to disagree.

However, the point where this all falls down is when people are asked when the discussion about Kodi should’ve stopped. We haven’t questioned them all, of course, but it’s almost guaranteed that while most with a grievance didn’t want Kodi getting too big, they absolutely appreciate the fact that someone told them about it. Piracy and piracy techniques spread by word of mouth so unfortunately, people can’t have it both ways.

Interestingly, some people placed the blame on TV Addons, the site that hosts the addons themselves. They argued that the addon scene didn’t need such a high profile target and that the popularity of the site only brought unwanted attention. However, for every critic, there are apparently thousands who love what the site does to raise the profile of Kodi. Without that, it’s clear that there would be fewer users and indeed, fewer addons.

For TV Addons’ part, they’re extremely clear who’s responsible for bringing the heat. On numerous occasions in emails to TF, the operators of the repository have blamed those who have attempted to commercialize the Kodi scene. For them, the responsibility must be placed squarely on the shoulders of people selling ‘Kodi boxes’ on places like eBay and Amazon. Once big money got involved, that attracted the authorities, they argue.

With this statement in mind, TF spoke with a box seller who previously backed down from selling on eBay due to issues over Kodi’s trademark. He didn’t want to speak on the record but admitted to selling “a couple of thousand” boxes over the past two years, noting that all he did was respond to demand with supply.

And this brings us full circle and a bit closer to apportioning blame for the Kodi crackdown.

The bottom line is that when it comes to piracy, Kodi and its third-party ‘pirate’ addons are so good at what they do, it’s no surprise they’ve been a smash hit with Internet users. All of the content that anyone could want – and more – accessible in one package, on almost any platform? That’s what consumers have been demanding for more than a decade and a half.

That brings us to the unavoidable conclusion that modified Kodi simply got too good at delivering content outside controlled channels, and that success was impossible to moderate or calm. Quite simply, every user that added to the Kodi phenomenon by installing the software with ‘pirate’ addons has to shoulder some of the blame for the crackdown.

That might sound harsh but in the piracy world it’s never been any different. Without millions of users, The Pirate Bay raid would never have happened. Without users, KickassTorrents might still be rocking today. But of course, what would be the point?

Users might break sites and services, but they also make them. That’s the piracy paradox.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Teaching tech

Post Syndicated from Eevee original https://eev.ee/blog/2017/06/10/teaching-tech/

A sponsored post from Manishearth:

I would kinda like to hear about any thoughts you have on technical teaching or technical writing. Pedagogy is something I care about. But I don’t know how much you do, so feel free to ignore this suggestion 🙂

Good news: I care enough that I’m trying to write a sorta-kinda-teaching book!

Ironically, one of the biggest problems I’ve had with writing the introduction to that book is that I keep accidentally rambling on for pages about problems and difficulties with teaching technical subjects. So maybe this is a good chance to get it out of my system.

Phaser

I recently tried out a new thing. It was Phaser, but this isn’t a dig on them in particular, just a convenient example fresh in my mind. If anything, they’re better than most.

As you can see from Phaser’s website, it appears to have tons of documentation. Two of the six headings are “LEARN” and “EXAMPLES”, which seems very promising. And indeed, Phaser offers:

  • Several getting-started walkthroughs
  • Possibly hundreds of examples
  • A news feed that regularly links to third-party tutorials
  • Thorough API docs

Perfect. Beautiful. Surely, a dream.

Well, almost.

The examples are all microscopic, usually focused around a single tiny feature — many of them could be explained just as well with one line of code. There are a few example games, but they’re short aimless demos. None of them are complete games, and there’s no showcase either. Games sometimes pop up in the news feed, but most of them don’t include source code, so they’re not useful for learning from.

Likewise, the API docs are just API docs, leading to the sorts of problems you might imagine. For example, in a few places there’s a mention of a preUpdate stage that (naturally) happens before update. You might rightfully wonder what kinds of things happen in preUpdate — and more importantly, what should you put there, and why?

Let’s check the API docs for Phaser.Group.preUpdate:

The core preUpdate – as called by World.

Okay, that didn’t help too much, but let’s check what Phaser.World has to say:

The core preUpdate – as called by World.

Ah. Hm. It turns out World is a subclass of Group and inherits this method — and thus its unaltered docstring — from Group.

I did eventually find some brief docs attached to Phaser.Stage (but only by grepping the source code). It mentions what the framework uses preUpdate for, but not why, and not when I might want to use it too.


The trouble here is that there’s no narrative documentation — nothing explaining how the library is put together and how I’m supposed to use it. I get handed some brief primers and a massive reference, but nothing in between. It’s like buying an O’Reilly book and finding out it only has one chapter followed by a 500-page glossary.

API docs are great if you know specifically what you’re looking for, but they don’t explain the best way to approach higher-level problems, and they don’t offer much guidance on how to mesh nicely with the design of a framework or big library. Phaser does a decent chunk of stuff for you, off in the background somewhere, so it gives the strong impression that it expects you to build around it in a particular way… but it never tells you what that way is.

Tutorials

Ah, but this is what tutorials are for, right?

I confess I recoil whenever I hear the word “tutorial”. It conjures an image of a uniquely useless sort of post, which goes something like this:

  1. Look at this cool thing I made! I’ll teach you how to do it too.

  2. Press all of these buttons in this order. Here’s a screenshot, which looks nothing like what you have, because I’ve customized the hell out of everything.

  3. You did it!

The author is often less than forthcoming about why they made any of the decisions they did, where you might want to try something else, or what might go wrong (and how to fix it).

And this is to be expected! Writing out any of that stuff requires far more extensive knowledge than you need just to do the thing in the first place, and you need to do a good bit of introspection to sort out something coherent to say.

In other words, teaching is hard. It’s a skill, and it takes practice, and most people blogging are not experts at it. Including me!


With Phaser, I noticed that several of the third-party tutorials I tried to look at were 404s — sometimes less than a year after they were linked on the site. Pretty major downside to relying on the community for teaching resources.

But I also notice that… um…

Okay, look. I really am not trying to rag on this author. I’m not. They tried to share their knowledge with the world, and that’s a good thing, something worthy of praise. I’m glad they did it! I hope it helps someone.

But for the sake of example, here is the most recent entry in Phaser’s list of community tutorials. I have to link it, because it’s such a perfect example. Consider:

  • The post itself is a bulleted list of explanation followed by a single contiguous 250 lines of source code. (Not that there’s anything wrong with bulleted lists, mind you.) That code contains zero comments and zero blank lines.

  • This is only part two in what I think is a series aimed at beginners, yet the title and much of the prose focus on object pooling, a performance hack that’s easy to add later and that’s almost certainly unnecessary for a game this simple. There is no explanation of why this is done; the prose only says you’ll understand why it’s critical once you add a lot more game objects.

  • It turns out I only have two things to say here so I don’t know why I made this a bulleted list.

In short, it’s not really a guided explanation; it’s “look what I did”.

And that’s fine, and it can still be interesting. I’m not sure English is even this person’s first language, so I’m hardly going to criticize them for not writing a novel about platforming.

The trouble is that I doubt a beginner would walk away from this feeling very enlightened. They might be closer to having the game they wanted, so there’s still value in it, but it feels closer to having someone else do it for them. And an awful lot of tutorials I’ve seen — particularly of the “post on some blog” form (which I’m aware is the genre of thing I’m writing right now) — look similar.

This isn’t some huge social problem; it’s just people writing on their blog and contributing to the corpus of written knowledge. It does become a bit stickier when a large project relies on these community tutorials as its main set of teaching aids.


Again, I’m not ragging on Phaser here. I had a slightly frustrating experience with it, coming in knowing what I wanted but unable to find a description of the semantics anywhere, but I do sympathize. Teaching is hard, writing documentation is hard, and programmers would usually rather program than do either of those things. For free projects that run on volunteer work, and in an industry where anything other than programming is a little undervalued, getting good docs written can be tricky.

(Then again, Phaser sells books and plugins, so maybe they could hire a documentation writer. Or maybe the whole point is for you to buy the books?)

Some pretty good docs

Python has pretty good documentation. It introduces the language with a tutorial, then documents everything else in both a library and language reference.

This sounds an awful lot like Phaser’s setup, but there’s some considerable depth in the Python docs. The tutorial is highly narrative and walks through quite a few corners of the language, stopping to mention common pitfalls and possible use cases. I clicked an arbitrary heading and found a pleasant, informative read that somehow avoids being bewilderingly dense.

The API docs also take on a narrative tone — even something as humble as the collections module offers numerous examples, use cases, patterns, recipes, and hints of interesting ways you might extend the existing types.

I’m being a little vague and hand-wavey here, but it’s hard to give specific examples without just quoting two pages of Python documentation. Hopefully you can see right away what I mean if you just take a look at them. They’re good docs, Bront.

I’ve likewise always enjoyed the SQLAlchemy documentation, which follows much the same structure as the main Python documentation. SQLAlchemy is a database abstraction layer plus ORM, so it can do a lot of subtly intertwined stuff, and the complexity of the docs reflects this. Figuring out how to do very advanced things correctly, in particular, can be challenging. But for the most part it does a very thorough job of introducing you to a large library with a particular philosophy and how to best work alongside it.

I softly contrast this with, say, the Perl documentation.

It’s gotten better since I first learned Perl, but Perl’s docs are still a bit of a strange beast. They exist as a flat collection of manpage-like documents with terse names like perlootut. The documentation is certainly thorough, but much of it has a strange… allocation of detail.

For example, perllol — the explanation of how to make a list of lists, which somehow merits its own separate documentation — offers no fewer than nine similar variations of the same code for reading a file into a nested lists of words on each line. Where Python offers examples for a variety of different problems, Perl shows you a lot of subtly different ways to do the same basic thing.

A similar problem is that Perl’s docs sometimes offer far too much context; consider the references tutorial, which starts by explaining that references are a powerful “new” feature in Perl 5 (first released in 1994). It then explains why you might want to nest data structures… from a Perl 4 perspective, thus explaining why Perl 5 is so much better.

Some stuff I’ve tried

I don’t claim to be a great teacher. I like to talk about stuff I find interesting, and I try to do it in ways that are accessible to people who aren’t lugging around the mountain of context I already have. This being just some blog, it’s hard to tell how well that works, but I do my best.

I also know that I learn best when I can understand what’s going on, rather than just seeing surface-level cause and effect. Of course, with complex subjects, it’s hard to develop an understanding before you’ve seen the cause and effect a few times, so there’s a balancing act between showing examples and trying to provide an explanation. Too many concrete examples feel like rote memorization; too much abstract theory feels disconnected from anything tangible.

The attempt I’m most pleased with is probably my post on Perlin noise. It covers a fairly specific subject, which made it much easier. It builds up one step at a time from scratch, with visualizations at every point. It offers some interpretations of what’s going on. It clearly explains some possible extensions to the idea, but distinguishes those from the core concept.

It is a little math-heavy, I grant you, but that was hard to avoid with a fundamentally mathematical topic. I had to be economical with the background information, so I let the math be a little dense in places.

But the best part about it by far is that I learned a lot about Perlin noise in the process of writing it. In several places I realized I couldn’t explain what was going on in a satisfying way, so I had to dig deeper into it before I could write about it. Perhaps there’s a good guideline hidden in there: don’t try to teach as much as you know?

I’m also fairly happy with my series on making Doom maps, though they meander into tangents a little more often. It’s hard to talk about something like Doom without meandering, since it’s a convoluted ecosystem that’s grown organically over the course of 24 years and has at least three ways of doing anything.


And finally there’s the book I’m trying to write, which is sort of about game development.

One of my biggest grievances with game development teaching in particular is how often it leaves out important touches. Very few guides will tell you how to make a title screen or menu, how to handle death, how to get a Mario-style variable jump height. They’ll show you how to build a clearly unfinished demo game, then leave you to your own devices.

I realized that the only reliable way to show how to build a game is to build a real game, then write about it. So the book is laid out as a narrative of how I wrote my first few games, complete with stumbling blocks and dead ends and tiny bits of polish.

I have no idea how well this will work, or whether recapping my own mistakes will be interesting or distracting for a beginner, but it ought to be an interesting experiment.

Secure API Access with Amazon Cognito Federated Identities, Amazon Cognito User Pools, and Amazon API Gateway

Post Syndicated from Ed Lima original https://aws.amazon.com/blogs/compute/secure-api-access-with-amazon-cognito-federated-identities-amazon-cognito-user-pools-and-amazon-api-gateway/

Ed Lima, Solutions Architect

 

Our identities are what define us as human beings. Philosophical discussions aside, it also applies to our day-to-day lives. For instance, I need my work badge to get access to my office building or my passport to travel overseas. My identity in this case is attached to my work badge or passport. As part of the system that checks my access, these documents or objects help define whether I have access to get into the office building or travel internationally.

This exact same concept can also be applied to cloud applications and APIs. To provide secure access to your application users, you define who can access the application resources and what kind of access can be granted. Access is based on identity controls that can confirm authentication (AuthN) and authorization (AuthZ), which are different concepts. According to Wikipedia:

 

The process of authorization is distinct from that of authentication. Whereas authentication is the process of verifying that “you are who you say you are,” authorization is the process of verifying that “you are permitted to do what you are trying to do.” This does not mean authorization presupposes authentication; an anonymous agent could be authorized to a limited action set.

Amazon Cognito allows building, securing, and scaling a solution to handle user management and authentication, and to sync across platforms and devices. In this post, I discuss the different ways that you can use Amazon Cognito to authenticate API calls to Amazon API Gateway and secure access to your own API resources.

 

Amazon Cognito Concepts

 

It’s important to understand that Amazon Cognito provides three different services:

Today, I discuss the use of the first two. One service doesn’t need the other to work; however, they can be configured to work together.
 

Amazon Cognito Federated Identities

 
To use Amazon Cognito Federated Identities in your application, create an identity pool. An identity pool is a store of user data specific to your account. It can be configured to require an identity provider (IdP) for user authentication, after you enter details such as app IDs or keys related to that specific provider.

After the user is validated, the provider sends an identity token to Amazon Cognito Federated Identities. In turn, Amazon Cognito Federated Identities contacts the AWS Security Token Service (AWS STS) to retrieve temporary AWS credentials based on a configured, authenticated IAM role linked to the identity pool. The role has appropriate IAM policies attached to it and uses these policies to provide access to other AWS services.

Amazon Cognito Federated Identities currently supports the IdPs listed in the following graphic.

 



Continue reading Secure API Access with Amazon Cognito Federated Identities, Amazon Cognito User Pools, and Amazon API Gateway

AWS Greengrass – Run AWS Lambda Functions on Connected Devices

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-greengrass-run-aws-lambda-functions-on-connected-devices/

I first told you about AWS Greengrass in the post that I published during re:Invent (AWS Greengrass – Ubiquitous Real-World Computing). We launched a limited preview of Greengrass at that time and invited you to sign up if you were interested.

As I noted at the time, many AWS customers want to collect and process data out in the field, where connectivity is often slow and sometimes either intermittent or unreliable. Greengrass allows them to extend the AWS programming model to small, simple, field-based devices. It builds on AWS IoT and AWS Lambda, and supports access to the ever-increasing variety of services that are available in the AWS Cloud.

Greengrass gives you access to compute, messaging, data caching, and syncing services that run in the field, and that do not depend on constant, high-bandwidth connectivity to an AWS Region. You can write Lambda functions in Python 2.7 and deploy them to your Greengrass devices from the cloud while using device shadows to maintain state. Your devices and peripherals can talk to each other using local messaging that does not pass through the cloud.

Now Generally Available
Today we are making Greengrass generally available in the US East (Northern Virginia) and US West (Oregon) Regions. During the preview, AWS customers were able to get hands-on experience with Greengrass and to start building applications and businesses around it. I’ll share a few of these early successes later in this post.

The Greengrass Core code runs on each device. It allows you to deploy and run Lambda applications on the device, supports local MQTT messaging across a secure network, and also ensures that conversations between devices and the cloud are made across secure connections. The Greengrass Core also supports secure, over-the-air software updates, including Lambda functions. It includes a message broker, a Lambda runtime, a Thing Shadows implementation, and a deployment agent. Greengrass Core and (optionally) other devices make up a Greengrass Group. The group includes configuration data, the list of devices and the identity of the Greengrass Core, a list of Lambda functions, and a set of subscriptions that define where the messages should go. All of this information is copied to the Greengrass core devices during the deployment process.

Your Lambda functions can use APIs in three distinct SDKs:

AWS SDK for Python – This SDK allows your code to interact with Amazon Simple Storage Service (S3), Amazon DynamoDB, Amazon Simple Queue Service (SQS), and other AWS services.

AWS IoT Device SDK – This SDK (available for Node.js, Python, Java, and C++) helps you to connect your hardware devices to AWS IoT. The C++ SDK has a few extra features including access to the Greengrass Discovery Service and support for root CA downloads.

AWS Greengrass Core SDK – This SDK provides APIs that allow local invocation of other Lambda functions, publish messages, and work with thing shadows.

You can run the Greengrass Core on x86 and ARM devices that have version 4.4.11 (or newer) of the Linux kernel, with the OverlayFS and user namespace features enabled. While most deployments of Greengrass will be targeted at specialized, industrial-grade hardware, you can also run the Greengrass Core on a Raspberry Pi or an EC2 instance for development and test purposes.

For this post, I used a Raspberry Pi attached to a BrickPi, connected to my home network via WiFi:

The Raspberry Pi, the BrickPi, the case, and all of the other parts are available in the BrickPi 3 Starter Kit. You will need some Linux command-line expertise and a decent amount of manual dexterity to put all of this together, but if I did it then you surely can.

Greengrass in Action
I can access Greengrass from the Console, API, or CLI. I’ll use the Console. The intro page of the Greengrass Console lets me define groups, add Greengrass Cores, and add devices to my groups:

I click on Get Started and then on Use easy creation:

Then I name my group:

And name my first Greengrass Core:

I’m ready to go, so I click on Create Group and Core:

This runs for a few seconds and then offers up my security resources (two keys and a certificate) for downloading, along with the Greengrass Core:

I download the security resources and put them in a safe place, and select and download the desired version of the Greengrass Core software (ARMv7l for my Raspberry Pi), and click on Finish.

Now I power up my Pi, and copy the security resources and the software to it (I put them in an S3 bucket and pulled them down with wget). Here’s my shell history at that point:

Following the directions in the user guide, I create a new user and group, run the rpi-update script, and install several packages including sqlite3 and openssl. After a couple of reboots, I am ready to proceed!

Next, still following the directions, I untar the Greengrass Core software and move the security resources to their final destination (/greengrass/configuration/certs), giving them generic names along the way. Here’s what the directory looks like:

The next step is to associate the core with an AWS IoT thing. I return to the Console, click through the group and the Greengrass Core, and find the Thing ARN:

I insert the names of the certificates and the Thing ARN into the config.json file, and also fill in the missing sections of the iotHost and ggHost:

I start the Greengrass demon (this was my second attempt; I had a typo in one of my path names the first time around):

After all of this pleasant time at the command line (taking me back to my Unix v7 and BSD 4.2 days), it is time to go visual once again! I visit my AWS IoT dashboard and see that my Greengrass Core is making connections to IoT:

I go to the Lambda Console and create a Lambda function using the Python 2.7 runtime (the IAM role does not matter here):

I publish the function in the usual way and, hop over to the Greengrass Console, click on my group, and choose to add a Lambda function:

Then I choose the version to deploy:

I also configure the function to be long-lived instead of on-demand:

My code will publish messages to AWS IoT, so I create a subscription by specifying the source and destination:

I set up a topic filter (hello/world) on the subscription as well:

I confirm my settings and save my subscription and I am just about ready to deploy my code. I revisit my group, click on Deployments, and choose Deploy from the Actions menu:

I choose Automatic detection to move forward:

Since this is my first deployment, I need to create a service-level role that gives Greengrass permission to access other AWS services. I simply click on Grant permission:

I can see the status of each deployment:

The code is now running on my Pi! It publishes messages to topic hello/world; I can see them by going to the IoT Console, clicking on Test, and subscribing to the topic:

And here are the messages:

With all of the setup work taken care of, I can do iterative development by uploading, publishing, and deploying new versions of my code. I plan to use the BrickPi to control some LEGO Technic motors and to publish data collected from some sensors. Stay tuned for that post!

Greengrass Pricing
You can run the Greengrass Core on three devices free for one year as part of the AWS Free Tier. At the next level (3 to 10,000 devices) two options are available:

  • Pay as You Go – $0.16 per month per device.
  • Annual Commitment – $1.49 per year per device, a 17.5% savings.

If you want to run the Greengrass Core on more than 10,000 devices or make a longer commitment, please get in touch with us; details on all pricing models are on the Greengrass Pricing page.

Jeff;

How to Deploy Local Administrator Password Solution with AWS Microsoft AD

Post Syndicated from Dragos Madarasan original https://aws.amazon.com/blogs/security/how-to-deploy-local-administrator-password-solution-with-aws-microsoft-ad/

Local Administrator Password Solution (LAPS) from Microsoft simplifies password management by allowing organizations to use Active Directory (AD) to store unique passwords for computers. Typically, an organization might reuse the same local administrator password across the computers in an AD domain. However, this approach represents a security risk because it can be exploited during lateral escalation attacks. LAPS solves this problem by creating unique, randomized passwords for the Administrator account on each computer and storing it encrypted in AD.

Deploying LAPS with AWS Microsoft AD requires the following steps:

  1. Install the LAPS binaries on instances joined to your AWS Microsoft AD domain. The binaries add additional client-side extension (CSE) functionality to the Group Policy client.
  2. Extend the AWS Microsoft AD schema. LAPS requires new AD attributes to store an encrypted password and its expiration time.
  3. Configure AD permissions and delegate the ability to retrieve the local administrator password for IT staff in your organization.
  4. Configure Group Policy on instances joined to your AWS Microsoft AD domain to enable LAPS. This configures the Group Policy client to process LAPS settings and uses the binaries installed in Step 1.

The following diagram illustrates the setup that I will be using throughout this post and the associated tasks to set up LAPS. Note that the AWS Directory Service directory is deployed across multiple Availability Zones, and monitoring automatically detects and replaces domain controllers that fail.

Diagram illustrating this blog post's solution

In this blog post, I explain the prerequisites to set up Local Administrator Password Solution, demonstrate the steps involved to update the AD schema on your AWS Microsoft AD domain, show how to delegate permissions to IT staff and configure LAPS via Group Policy, and demonstrate how to retrieve the password using the graphical user interface or with Windows PowerShell.

This post assumes you are familiar with Lightweight Directory Access Protocol Data Interchange Format (LDIF) files and AWS Microsoft AD. If you need more of an introduction to Directory Service and AWS Microsoft AD, see How to Move More Custom Applications to the AWS Cloud with AWS Directory Service, which introduces working with schema changes in AWS Microsoft AD.

Prerequisites

In order to implement LAPS, you must use AWS Directory Service for Microsoft Active Directory (Enterprise Edition), also known as AWS Microsoft AD. Any instance on which you want to configure LAPS must be joined to your AWS Microsoft AD domain. You also need a Management instance on which you install the LAPS management tools.

In this post, I use an AWS Microsoft AD domain called example.com that I have launched in the EU (London) region. To see which the regions in which Directory Service is available, see AWS Regions and Endpoints.

Screenshot showing the AWS Microsoft AD domain example.com used in this blog post

In addition, you must have at least two instances launched in the same region as the AWS Microsoft AD domain. To join the instances to your AWS Microsoft AD domain, you have two options:

  1. Use the Amazon EC2 Systems Manager (SSM) domain join feature. To learn more about how to set up domain join for EC2 instances, see joining a Windows Instance to an AWS Directory Service Domain.
  2. Manually configure the DNS server addresses in the Internet Protocol version 4 (TCP/IPv4) settings of the network card to use the AWS Microsoft AD DNS addresses (172.31.9.64 and 172.31.16.191, for this blog post) and perform a manual domain join.

For the purpose of this post, my two instances are:

  1. A Management instance on which I will install the management tools that I have tagged as Management.
  2. A Web Server instance on which I will be deploying the LAPS binary.

Screenshot showing the two EC2 instances used in this post

Implementing the solution

 

1. Install the LAPS binaries on instances joined to your AWS Microsoft AD domain by using EC2 Run Command

LAPS binaries come in the form of an MSI installer and can be downloaded from the Microsoft Download Center. You can install the LAPS binaries manually, with an automation service such as EC2 Run Command, or with your existing software deployment solution.

For this post, I will deploy the LAPS binaries on my Web Server instance (i-0b7563d0f89d3453a) by using EC2 Run Command:

  1. While signed in to the AWS Management Console, choose EC2. In the Systems Manager Services section of the navigation pane, choose Run Command.
  2. Choose Run a command, and from the Command document list, choose AWS-InstallApplication.
  3. From Target instances, choose the instance on which you want to deploy the LAPS binaries. In my case, I will be selecting the instance tagged as Web Server. If you do not see any instances listed, make sure you have met the prerequisites for Amazon EC2 Systems Manager (SSM) by reviewing the Systems Manager Prerequisites.
  4. For Action, choose Install, and then stipulate the following values:
    • Parameters: /quiet
    • Source: https://download.microsoft.com/download/C/7/A/C7AAD914-A8A6-4904-88A1-29E657445D03/LAPS.x64.msi
    • Source Hash: f63ebbc45e2d080630bd62a195cd225de734131a56bb7b453c84336e37abd766
    • Comment: LAPS deployment

Leave the other options with the default values and choose Run. The AWS Management Console will return a Command ID, which will initially have a status of In Progress. It should take less than 5 minutes to download and install the binaries, after which the Command ID will update its status to Success.

Status showing the binaries have been installed successfully

If the Command ID runs for more than 5 minutes or returns an error, it might indicate a problem with the installer. To troubleshoot, review the steps in Troubleshooting Systems Manager Run Command.

To verify the binaries have been installed successfully, open Control Panel and review the recently installed applications in Programs and Features.

Screenshot of Control Panel that confirms LAPS has been installed successfully

You should see an entry for Local Administrator Password Solution with a version of 6.2.0.0 or newer.

2. Extend the AWS Microsoft AD schema

In the previous section, I used EC2 Run Command to install the LAPS binaries on an EC2 instance. Now, I am ready to extend the schema in an AWS Microsoft AD domain. Extending the schema is a requirement because LAPS relies on new AD attributes to store the encrypted password and its expiration time.

In an on-premises AD environment, you would update the schema by running the Update-AdmPwdADSchema Windows PowerShell cmdlet with schema administrator credentials. Because AWS Microsoft AD is a managed service, I do not have permissions to update the schema directly. Instead, I will update the AD schema from the Directory Service console by importing an LDIF file. If you are unfamiliar with schema updates or LDIF files, see How to Move More Custom Applications to the AWS Cloud with AWS Directory Service.

To make things easier for you, I am providing you with a sample LDIF file that contains the required AD schema changes. Using Notepad or a similar text editor, open the SchemaChanges-0517.ldif file and update the values of dc=example,dc=com with your own AWS Microsoft AD domain and suffix.

After I update the LDIF file with my AWS Microsoft AD details, I import it by using the AWS Management Console:

  1. On the Directory Service console, select from the list of directories in the Microsoft AD directory by choosing its identifier (it will look something like d-534373570ea).
  2. On the Directory details page, choose the Schema extensions tab and choose Upload and update schema.
    Screenshot showing the "Upload and update schema" option
  3. When prompted for the LDIF file that contains the changes, choose the sample LDIF file.
  4. In the background, the LDIF file is validated for errors and a backup of the directory is created for recovery purposes. Updating the schema might take a few minutes and the status will change to Updating Schema. When the process has completed, the status of Completed will be displayed, as shown in the following screenshot.

Screenshot showing the schema updates in progress
When the process has completed, the status of Completed will be displayed, as shown in the following screenshot.

Screenshot showing the process has completed

If the LDIF file contains errors or the schema extension fails, the Directory Service console will generate an error code and additional debug information. To help troubleshoot error messages, see Schema Extension Errors.

The sample LDIF file triggers AWS Microsoft AD to perform the following actions:

  1. Create the ms-Mcs-AdmPwd attribute, which stores the encrypted password.
  2. Create the ms-Mcs-AdmPwdExpirationTime attribute, which stores the time of the password’s expiration.
  3. Add both attributes to the Computer class.

3. Configure AD permissions

In the previous section, I updated the AWS Microsoft AD schema with the required attributes for LAPS. I am now ready to configure the permissions for administrators to retrieve the password and for computer accounts to update their password attribute.

As part of configuring AD permissions, I grant computers the ability to update their own password attribute and specify which security groups have permissions to retrieve the password from AD. As part of this process, I run Windows PowerShell cmdlets that are not installed by default on Windows Server.

Note: To learn more about Windows PowerShell and the concept of a cmdlet (pronounced “command-let”), go to Getting Started with Windows PowerShell.

Before getting started, I need to set up the required tools for LAPS on my Management instance, which must be joined to the AWS Microsoft AD domain. I will be using the same LAPS installer that I downloaded from the Microsoft LAPS website. In my Management instance, I have manually run the installer by clicking the LAPS.x64.msi file. On the Custom Setup page of the installer, under Management Tools, for each option I have selected Install on local hard drive.

Screenshot showing the required management tools

In the preceding screenshot, the features are:

  • The fat client UI – A simple user interface for retrieving the password (I will use it at the end of this post).
  • The Windows PowerShell module – Needed to run the commands in the next sections.
  • The GPO Editor templates – Used to configure Group Policy objects.

The next step is to grant computers in the Computers OU the permission to update their own attributes. While connected to my Management instance, I go to the Start menu and type PowerShell. In the list of results, right-click Windows PowerShell and choose Run as administrator and then Yes when prompted by User Account Control.

In the Windows PowerShell prompt, I type the following command.

Import-module AdmPwd.PS

Set-AdmPwdComputerSelfPermission –OrgUnit “OU=Computers,OU=MyMicrosoftAD,DC=example,DC=com

To grant the administrator group called Admins the permission to retrieve the computer password, I run the following command in the Windows PowerShell prompt I previously started.

Import-module AdmPwd.PS

Set-AdmPwdReadPasswordPermission –OrgUnit “OU=Computers, OU=MyMicrosoftAD,DC=example,DC=com” –AllowedPrincipals “Admins”

4. Configure Group Policy to enable LAPS

In the previous section, I deployed the LAPS management tools on my management instance, granted the computer accounts the permission to self-update their local administrator password attribute, and granted my Admins group permissions to retrieve the password.

Note: The following section addresses the Group Policy Management Console and Group Policy objects. If you are unfamiliar with or wish to learn more about these concepts, go to Get Started Using the GPMC and Group Policy for Beginners.

I am now ready to enable LAPS via Group Policy:

  1. On my Management instance (i-03b2c5d5b1113c7ac), I have installed the Group Policy Management Console (GPMC) by running the following command in Windows PowerShell.
Install-WindowsFeature –Name GPMC
  1. Next, I have opened the GPMC and created a new Group Policy object (GPO) called LAPS GPO.
  2. In the Local Group Policy Editor, I navigate to Computer Configuration > Policies > Administrative Templates > LAPS. I have configured the settings using the values in the following table.

Setting

State

Options

Password Settings

Enabled

Complexity: large letters, small letters, numbers, specials

Do not allow password expiration time longer than required by policy

Enabled

N/A

Enable local admin password management

Enabled

N/A

  1. Next, I need to link the GPO to an organizational unit (OU) in which my machine accounts sit. In your environment, I recommend testing the new settings on a test OU and then deploying the GPO to production OUs.

Note: If you choose to create a new test organizational unit, you must create it in the OU that AWS Microsoft AD delegates to you to manage. For example, if your AWS Microsoft AD directory name were example.com, the test OU path would be example.com/example/Computers/Test.

  1. To test that LAPS works, I need to make sure the computer has received the new policy by forcing a Group Policy update. While connected to the Web Server instance (i-0b7563d0f89d3453a) using Remote Desktop, I open an elevated administrative command prompt and run the following command: gpupdate /force. I can check if the policy is applied by running the command: gpresult /r | findstr LAPS GPO, where LAPS GPO is the name of the GPO created in the second step.
  2. Back on my Management instance, I can then launch the LAPS interface from the Start menu and use it to retrieve the password (as shown in the following screenshot). Alternatively, I can run the Get-ADComputer Windows PowerShell cmdlet to retrieve the password.
Get-ADComputer [YourComputerName] -Properties ms-Mcs-AdmPwd | select name, ms-Mcs-AdmPwd

Screenshot of the LAPS UI, which you can use to retrieve the password

Summary

In this blog post, I demonstrated how you can deploy LAPS with an AWS Microsoft AD directory. I then showed how to install the LAPS binaries by using EC2 Run Command. Using the sample LDIF file I provided, I showed you how to extend the schema, which is a requirement because LAPS relies on new AD attributes to store the encrypted password and its expiration time. Finally, I showed how to complete the LAPS setup by configuring the necessary AD permissions and creating the GPO that starts the LAPS password change.

If you have comments about this post, submit them in the “Comments” section below. If you have questions about or issues implementing this solution, please start a new thread on the Directory Service forum.

– Dragos

Surveillance Intermediaries

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/06/surveillance_in_2.html

Interesting law-journal article: “Surveillance Intermediaries,” by Alan Z. Rozenshtein.

Abstract:Apple’s 2016 fight against a court order commanding it to help the FBI unlock the iPhone of one of the San Bernardino terrorists exemplifies how central the question of regulating government surveillance has become in American politics and law. But scholarly attempts to answer this question have suffered from a serious omission: scholars have ignored how government surveillance is checked by “surveillance intermediaries,” the companies like Apple, Google, and Facebook that dominate digital communications and data storage, and on whose cooperation government surveillance relies. This Article fills this gap in the scholarly literature, providing the first comprehensive analysis of how surveillance intermediaries constrain the surveillance executive. In so doing, it enhances our conceptual understanding of, and thus our ability to improve, the institutional design of government surveillance.

Surveillance intermediaries have the financial and ideological incentives to resist government requests for user data. Their techniques of resistance are: proceduralism and litigiousness that reject voluntary cooperation in favor of minimal compliance and aggressive litigation; technological unilateralism that designs products and services to make surveillance harder; and policy mobilization that rallies legislative and public opinion to limit surveillance. Surveillance intermediaries also enhance the “surveillance separation of powers”; they make the surveillance executive more subject to inter-branch constraints from Congress and the courts, and to intra-branch constraints from foreign-relations and economics agencies as well as the surveillance executive’s own surveillance-limiting components.

The normative implications of this descriptive account are important and cross-cutting. Surveillance intermediaries can both improve and worsen the “surveillance frontier”: the set of tradeoffs ­ between public safety, privacy, and economic growth ­ from which we choose surveillance policy. And while intermediaries enhance surveillance self-government when they mobilize public opinion and strengthen the surveillance separation of powers, they undermine it when their unilateral technological changes prevent the government from exercising its lawful surveillance authorities.

Torrents Help Researchers Worldwide to Study Babies’ Brains

Post Syndicated from Ernesto original https://torrentfreak.com/torrents-help-researchers-worldwide-to-study-babies-brains-170603/

One of the core pillars of academic research is sharing.

By letting other researchers know what you do, ideas are criticized, improved upon and extended. In today’s digital age, sharing is easier than ever before, especially with help from torrents.

One of the leading scientific projects that has adopted BitTorrent is the developing Human Connectome Project, or dHCP for short. The goal of the project is to map the brain wiring of developing babies in the wombs of their mothers.

To do so, a consortium of researchers with expertise ranging from computer science, to MRI physics and clinical medicine, has teamed up across three British institutions: Imperial College London, King’s College London and the University of Oxford.

The collected data is extremely valuable for the neuroscience community and the project has received mainstream press coverage and financial backing from the European Union Research Council. Not only to build the dataset, but also to share it with researchers around the globe. This is where BitTorrent comes in.

Sharing more than 150 GB of data with researchers all over the world can be quite a challenge. Regular HTTP downloads are not really up to the task, and many other transfer options have a high failure rate.

Baby brain scan (Credit: Developing Human Connectome Project)

This is why Jonathan Passerat-Palmbach, Research Associate Department of Computing Imperial College London, came up with the idea to embrace BitTorrent instead.

“For me, it was a no-brainer from day one that we couldn’t rely on plain old HTTP to make this dataset available. Our first pilot release is 150GB, and I expect the next ones to reach a couple of TB. Torrents seemed like the de facto solution to share this data with the world’s scientific community.” Passerat-Palmbach says.

The researchers opted to go for the Academic Torrents tracker, which specializes in sharing research data. A torrent with the first batch of images was made available there a few weeks ago.

“This initial release contains 3,629 files accounting for 167.20GB of data. While this figure might not appear extremely large at the moment, it will significantly grow as the project aims to make the data of 1,000 subjects available by the time it has completed.”

Torrent of the first dataset

The download numbers are nowhere in the region of an average Hollywood blockbuster, of course. Thus far the tracker has registered just 28 downloads. That said, as a superior and open file-transfer protocol, BitTorrent does aid in critical research that helps researchers to discover more about the development of conditions such as ADHD and autism.

Interestingly, the biggest challenges of implementing the torrent solution were not of a technical nature. Most time and effort went into assuring other team members that this was the right solution.

“I had to push for more than a year for the adoption of torrents within the consortium. While my colleagues could understand the potential of the approach and its technical inputs, they remained skeptical as to the feasibility to implement such a solution within an academic context and its reception by the world community.

“However, when the first dataset was put together, amounting to 150GB, it became obvious all the HTTP and FTP fallback plans would not fit our needs,” Passerat-Palmbach adds.

Baby brain scans (Credit: Developing Human Connectome Project)

When the consortium finally agreed that BitTorrent was an acceptable way to share the data, local IT staff at the university had to give their seal of approval. Imperial College London doesn’t allow torrent traffic to flow freely across the network, so an exception had to be made.

“Torrents are blocked across the wireless and VPN networks at Imperial. Getting an explicit firewall exception created for our seeding machine was not a walk in the park. It was the first time they were faced with such a situation and we were clearly told that it was not to become the rule.”

Then, finally, the data could be shared around the world.

While BitTorrent is probably the most efficient way to share large files, there were other proprietary solutions that could do the same. However, Passerat-Palmbach preferred not to force other researchers to install “proprietary black boxes” on their machines.

Torrents are free and open, which is more in line with the Open Access approach more academics take today.

Looking back, it certainly wasn’t a walk in the park to share the data via BitTorrent. Passerat-Palmbach was frequently confronted with the piracy stigma torrents have amoung many of his peers, even among younger generations.

“Considering how hard it was to convince my colleagues within the project to actually share this dataset using torrents (‘isn’t it illegal?’ and other kinds of misconceptions…), I think there’s still a lot of work to do to demystify the use of torrents with the public.

“I was even surprised to see that these misconceptions spread out not only to more senior scientists but also to junior researchers who I was expecting to be more tech-aware,” Passerat-Palmbach adds.

That said, the hard work is done now and in the months and years ahead the neuroscience community will have access to Petabytes of important data, with help from BitTorrent. That is definitely worth the effort.

Finally, we thought it was fitting to end with Passerat-Palmbach’s “pledge to seed,” which he shared with his peers. Keep on sharing!


On the importance of seeding

Dear fellow scientist,

Thank for you very much for the interest you are showing in the dHCP dataset!

Once you start downloading the dataset, you’ll notice that your torrent client mentions a sharing / seeding ratio. It means that as soon as you start downloading the dataset, you become part of our community of sharers and contribute to making the dataset available to other researchers all around the world!

There’s no reason to be scared! It’s perfectly legal as long as you’re allowed to have a copy of the dataset (that’s the bit you need to forward to your lab’s IT staff if they’re blocking your ports).

You’re actually providing a tremendous contribution to dHCP by spreading the data, so thank you again for that!

With your help, we can make sure this data remains available and can be downloaded relatively fast in the future. Over time, the dataset will grow and your contribution will be more and more important so that each and everyone of you can still obtain the data in the smoothest possible way.

We cannot do it without you. By seeding, you’re actually saying “cheers!” to your peers whom you downloaded your data from. So leave your client open and stay tuned!

All this is made possible thanks to the amazing folks at academictorrents and their infrastructure, so kudos academictorrents!

You can learn more about their project here and get some help to get started with torrent downloading here.

Jonathan Passerat-Palmbach

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Seven Tips for Using S3DistCp on Amazon EMR to Move Data Efficiently Between HDFS and Amazon S3

Post Syndicated from Illya Yalovyy original https://aws.amazon.com/blogs/big-data/seven-tips-for-using-s3distcp-on-amazon-emr-to-move-data-efficiently-between-hdfs-and-amazon-s3/

Have you ever needed to move a large amount of data between Amazon S3 and Hadoop Distributed File System (HDFS) but found that the data set was too large for a simple copy operation? EMR can help you with this. In addition to processing and analyzing petabytes of data, EMR can move large amounts of data.

In the Hadoop ecosystem, DistCp is often used to move data. DistCp provides a distributed copy capability built on top of a MapReduce framework. S3DistCp is an extension to DistCp that is optimized to work with S3 and that adds several useful features. In addition to moving data between HDFS and S3, S3DistCp is also a Swiss Army knife of file manipulations. In this post we’ll cover the following tips for using S3DistCp, starting with basic use cases and then moving to more advanced scenarios:

1. Copy or move files without transformation
2. Copy and change file compression on the fly
3. Copy files incrementally
4. Copy multiple folders in one job
5. Aggregate files based on a pattern
6. Upload files larger than 1 TB in size
7. Submit a S3DistCp step to an EMR cluster

1. Copy or move files without transformation

We’ve observed that customers often use S3DistCp to copy data from one storage location to another, whether S3 or HDFS. Syntax for this operation is simple and straightforward:

$ s3-dist-cp --src /data/incoming/hourly_table --dest s3://my-tables/incoming/hourly_table

The source location may contain extra files that we don’t necessarily want to copy. Here, we can use filters based on regular expressions to do things such as copying files with the .log extension only.

Each subfolder has the following files:

$ hadoop fs -ls /data/incoming/hourly_table/2017-02-01/03
Found 8 items
-rw-r--r--   1 hadoop hadoop     197850 2017-02-19 03:41 /data/incoming/hourly_table/2017-02-01/03/2017-02-01.03.25845.log
-rw-r--r--   1 hadoop hadoop     484006 2017-02-19 03:41 /data/incoming/hourly_table/2017-02-01/03/2017-02-01.03.32953.log
-rw-r--r--   1 hadoop hadoop     868522 2017-02-19 03:41 /data/incoming/hourly_table/2017-02-01/03/2017-02-01.03.62649.log
-rw-r--r--   1 hadoop hadoop     408072 2017-02-19 03:41 /data/incoming/hourly_table/2017-02-01/03/2017-02-01.03.64637.log
-rw-r--r--   1 hadoop hadoop    1031949 2017-02-19 03:41 /data/incoming/hourly_table/2017-02-01/03/2017-02-01.03.70767.log
-rw-r--r--   1 hadoop hadoop     368240 2017-02-19 03:41 /data/incoming/hourly_table/2017-02-01/03/2017-02-01.03.89910.log
-rw-r--r--   1 hadoop hadoop     437348 2017-02-19 03:41 /data/incoming/hourly_table/2017-02-01/03/2017-02-01.03.96053.log
-rw-r--r--   1 hadoop hadoop        800 2017-02-19 03:41 /data/incoming/hourly_table/2017-02-01/03/processing.meta

To copy only the required files, let’s use the --srcPattern option:

$ s3-dist-cp --src /data/incoming/hourly_table --dest s3://my-tables/incoming/hourly_table_filtered --srcPattern .*\.log

After the upload has finished successfully, let’s check the folder contents in the destination location to confirm only the files ending in .log were copied:

$ hadoop fs -ls s3://my-tables/incoming/hourly_table_filtered/2017-02-01/03
-rw-rw-rw-   1     197850 2017-02-19 22:56 s3://my-tables/incoming/hourly_table_filtered/2017-02-01/03/2017-02-01.03.25845.log
-rw-rw-rw-   1     484006 2017-02-19 22:56 s3://my-tables/incoming/hourly_table_filtered/2017-02-01/03/2017-02-01.03.32953.log
-rw-rw-rw-   1     868522 2017-02-19 22:56 s3://my-tables/incoming/hourly_table_filtered/2017-02-01/03/2017-02-01.03.62649.log
-rw-rw-rw-   1     408072 2017-02-19 22:56 s3://my-tables/incoming/hourly_table_filtered/2017-02-01/03/2017-02-01.03.64637.log
-rw-rw-rw-   1    1031949 2017-02-19 22:56 s3://my-tables/incoming/hourly_table_filtered/2017-02-01/03/2017-02-01.03.70767.log
-rw-rw-rw-   1     368240 2017-02-19 22:56 s3://my-tables/incoming/hourly_table_filtered/2017-02-01/03/2017-02-01.03.89910.log
-rw-rw-rw-   1     437348 2017-02-19 22:56 s3://my-tables/incoming/hourly_table_filtered/2017-02-01/03/2017-02-01.03.96053.log

Sometimes, data needs to be moved instead of copied. In this case, we can use the --deleteOnSuccess option. This option is similar to aws s3 mv, which you might have used previously with the AWS CLI. The files are first copied and then deleted from the source:

$ s3-dist-cp --src s3://my-tables/incoming/hourly_table --dest s3://my-tables/incoming/hourly_table_archive --deleteOnSuccess

After the preceding operation, the source location has only empty folders, and the target location contains all files.

$ hadoop fs -ls -R s3://my-tables/incoming/hourly_table/2017-02-01/
drwxrwxrwx   -          0 1970-01-01 00:00 s3://my-tables/incoming/hourly_table/2017-02-01/00
drwxrwxrwx   -          0 1970-01-01 00:00 s3://my-tables/incoming/hourly_table/2017-02-01/01
...
drwxrwxrwx   -          0 1970-01-01 00:00 s3://my-tables/incoming/hourly_table/2017-02-01/21
drwxrwxrwx   -          0 1970-01-01 00:00 s3://my-tables/incoming/hourly_table/2017-02-01/22


$ hadoop fs -ls s3://my-tables/incoming/hourly_table_archive/2017-02-01/01
-rw-rw-rw-   1     676756 2017-02-19 23:27 s3://my-tables/incoming/hourly_table_archive/2017-02-01/01/2017-02-01.01.27047.log
-rw-rw-rw-   1     780197 2017-02-19 23:27 s3://my-tables/incoming/hourly_table_archive/2017-02-01/01/2017-02-01.01.59789.log
-rw-rw-rw-   1    1041789 2017-02-19 23:27 s3://my-tables/incoming/hourly_table_archive/2017-02-01/01/2017-02-01.01.82293.log
-rw-rw-rw-   1        400 2017-02-19 23:27 s3://my-tables/incoming/hourly_table_archive/2017-02-01/01/processing.meta

The important things to remember here are that S3DistCp deletes only files with the --deleteOnSuccess flag and that it doesn’t delete parent folders, even when they are empty.

2. Copy and change file compression on the fly

Raw files often land in S3 or HDFS in an uncompressed text format. This format is suboptimal both for the cost of storage and for running analytics on that data. S3DistCp can help you efficiently store data and compress files on the fly with the --outputCodec option:

$ s3-dist-cp --src s3://my-tables/incoming/hourly_table_filtered --dest s3://my-tables/incoming/hourly_table_gz --outputCodec=gz

The current version of S3DistCp supports the codecs gzip, gz, lzo, lzop, and snappy, and the keywords none and keep (the default). These keywords have the following meaning:

  • none” – Save files uncompressed. If the files are compressed, then S3DistCp decompresses them.
  • keep” – Don’t change the compression of the files but copy them as-is.

Let’s check the files in the target folder, which have now been compressed with the gz codec:

$ hadoop fs -ls s3://my-tables/incoming/hourly_table_gz/2017-02-01/01/
Found 3 items
-rw-rw-rw-   1     78756 2017-02-20 00:07 s3://my-tables/incoming/hourly_table_gz/2017-02-01/01/2017-02-01.01.27047.log.gz
-rw-rw-rw-   1     80197 2017-02-20 00:07 s3://my-tables/incoming/hourly_table_gz/2017-02-01/01/2017-02-01.01.59789.log.gz
-rw-rw-rw-   1    121178 2017-02-20 00:07 s3://my-tables/incoming/hourly_table_gz/2017-02-01/01/2017-02-01.01.82293.log.gz

3. Copy files incrementally

In real life, the upstream process drops files in some cadence. For instance, new files might get created every hour, or every minute. The downstream process can be configured to pick it up at a different schedule.

Let’s say data lands on S3 and we want to process it on HDFS daily. Copying all files every time doesn’t scale very well. Fortunately, S3DistCp has a built-in solution for that.

For this solution, we use a manifest file. That file allows S3DistCp to keep track of copied files. Following is an example of the command:

$ s3-dist-cp --src s3://my-tables/incoming/hourly_table --dest s3://my-tables/processing/hourly_table --srcPattern .*\.log --outputManifest=manifest-2017-02-25.gz --previousManifest=s3://my-tables/processing/hourly_table/manifest-2017-02-24.gz

The command takes two manifest files as parameters, outputManifest and previousManifest. The first one contains a list of all copied files (old and new), and the second contains a list of files copied previously. This way, we can recreate the full history of operations and see what files were copied during each run:

$ hadoop fs -text s3://my-tables/processing/hourly_table/manifest-2017-02-24.gz > previous.lst
$ hadoop fs -text s3://my-tables/processing/hourly_table/manifest-2017-02-25.gz > current.lst
$ diff previous.lst current.lst
2548a2549,2550
> {"path":"s3://my-tables/processing/hourly_table/2017-02-25/00/2017-02-15.00.50958.log","baseName":"2017-02-25/00/2017-02-15.00.50958.log","srcDir":"s3://my-tables/processing/hourly_table","size":610308}
> {"path":"s3://my-tables/processing/hourly_table/2017-02-25/00/2017-02-25.00.93423.log","baseName":"2017-02-25/00/2017-02-25.00.93423.log","srcDir":"s3://my-tables/processing/hourly_table","size":178928}

S3DistCp creates the file in the local file system using the provided path, /tmp/mymanifest.gz. When the copy operation finishes, it moves that manifest to <DESTINATION LOCATION>.

4. Copy multiple folders in one job

Imagine that we need to copy several folders. Usually, we run as many copy jobs as there are folders that need to be copied. With S3DistCp, the copy can be done in one go. All we need is to prepare a file with list of prefixes and use it as a parameter for the tool:

$ s3-dist-cp --src s3://my-tables/incoming/hourly_table_filtered --dest s3://my-tables/processing/sample_table --srcPrefixesFile file://${PWD}/folders.lst

In this case, the folders.lst file contains the following prefixes:

$ cat folders.lst
s3://my-tables/incoming/hourly_table_filtered/2017-02-10/11
s3://my-tables/incoming/hourly_table_filtered/2017-02-19/02
s3://my-tables/incoming/hourly_table_filtered/2017-02-23

As a result, the target location has only the requested subfolders:

$ hadoop fs -ls -R s3://my-tables/processing/sample_table
drwxrwxrwx   -          0 1970-01-01 00:00 s3://my-tables/processing/sample_table/2017-02-10
drwxrwxrwx   -          0 1970-01-01 00:00 s3://my-tables/processing/sample_table/2017-02-10/11
-rw-rw-rw-   1     139200 2017-02-24 05:59 s3://my-tables/processing/sample_table/2017-02-10/11/2017-02-10.11.12980.log
...
drwxrwxrwx   -          0 1970-01-01 00:00 s3://my-tables/processing/sample_table/2017-02-19
drwxrwxrwx   -          0 1970-01-01 00:00 s3://my-tables/processing/sample_table/2017-02-19/02
-rw-rw-rw-   1     702058 2017-02-24 05:59 s3://my-tables/processing/sample_table/2017-02-19/02/2017-02-19.02.19497.log
-rw-rw-rw-   1     265404 2017-02-24 05:59 s3://my-tables/processing/sample_table/2017-02-19/02/2017-02-19.02.26671.log
...
drwxrwxrwx   -          0 1970-01-01 00:00 s3://my-tables/processing/sample_table/2017-02-23
drwxrwxrwx   -          0 1970-01-01 00:00 s3://my-tables/processing/sample_table/2017-02-23/00
-rw-rw-rw-   1     310425 2017-02-24 05:59 s3://my-tables/processing/sample_table/2017-02-23/00/2017-02-23.00.10061.log
-rw-rw-rw-   1    1030397 2017-02-24 05:59 s3://my-tables/processing/sample_table/2017-02-23/00/2017-02-23.00.22664.log
...

5. Aggregate files based on a pattern

Hadoop is optimized for reading a fewer number of large files rather than many small files, whether from S3 or HDFS. You can use S3DistCp to aggregate small files into fewer large files of a size that you choose, which can optimize your analysis for both performance and cost.

In the following example, we combine small files into bigger files. We do so by using a regular expression with the –groupBy option.

$ s3-dist-cp --src /data/incoming/hourly_table --dest s3://my-tables/processing/daily_table --targetSize=10 --groupBy=’.*/hourly_table/.*/(\d\d)/.*\.log’

Let’s take a look into the target folders and compare them to the corresponding source folders:

$ hadoop fs -ls /data/incoming/hourly_table/2017-02-22/05/
Found 8 items
-rw-r--r--   1 hadoop hadoop     289949 2017-02-19 06:07 /data/incoming/hourly_table/2017-02-22/05/2017-02-22.05.11125.log
-rw-r--r--   1 hadoop hadoop     407290 2017-02-19 06:07 /data/incoming/hourly_table/2017-02-22/05/2017-02-22.05.19596.log
-rw-r--r--   1 hadoop hadoop     253434 2017-02-19 06:07 /data/incoming/hourly_table/2017-02-22/05/2017-02-22.05.30135.log
-rw-r--r--   1 hadoop hadoop     590655 2017-02-19 06:07 /data/incoming/hourly_table/2017-02-22/05/2017-02-22.05.36531.log
-rw-r--r--   1 hadoop hadoop     762076 2017-02-19 06:07 /data/incoming/hourly_table/2017-02-22/05/2017-02-22.05.47822.log
-rw-r--r--   1 hadoop hadoop     489783 2017-02-19 06:07 /data/incoming/hourly_table/2017-02-22/05/2017-02-22.05.80518.log
-rw-r--r--   1 hadoop hadoop     205976 2017-02-19 06:07 /data/incoming/hourly_table/2017-02-22/05/2017-02-22.05.99127.log
-rw-r--r--   1 hadoop hadoop        800 2017-02-19 06:07 /data/incoming/hourly_table/2017-02-22/05/processing.meta

 

$ hadoop fs -ls s3://my-tables/processing/daily_table/2017-02-22/05/
Found 2 items
-rw-rw-rw-   1   10541944 2017-02-28 05:16 s3://my-tables/processing/daily_table/2017-02-22/05/054
-rw-rw-rw-   1   10511516 2017-02-28 05:16 s3://my-tables/processing/daily_table/2017-02-22/05/055

As you can see, seven data files were combined into two with a size close to the requested 10 MB. The *.meta file was filtered out because --groupBy pattern works in a similar way to –srcPattern. We recommend keeping files larger than the default block size, which is 128 MB on EMR.

The name of the final file is composed of groups in the regular expression used in --groupBy plus some number to make the name unique. The pattern must have at least one group defined.

Let’s consider one more example. This time, we want the file name to be formed from three parts: year, month, and file extension (.log in this case). Here is an updated command:

$ s3-dist-cp --src /data/incoming/hourly_table --dest s3://my-tables/processing/daily_table_2017 --targetSize=10 --groupBy=’.*/hourly_table/.*(2017-).*/(\d\d)/.*\.(log)’

Now we have final files named in a different way:

$ hadoop fs -ls s3://my-tables/processing/daily_table_2017/2017-02-22/05/
Found 2 items
-rw-rw-rw-   1   10541944 2017-02-28 05:16 s3://my-tables/processing/daily_table/2017-02-22/05/2017-05log4
-rw-rw-rw-   1   10511516 2017-02-28 05:16 s3://my-tables/processing/daily_table/2017-02-22/05/2017-05log5

As you can see, names of final files consist of concatenation of 3 groups from the regular expression (2017-), (\d\d), (log).

You might find that occasionally you get an error that looks like the following:

$ s3-dist-cp --src /data/incoming/hourly_table --dest s3://my-tables/processing/daily_table_2017 --targetSize=10 --groupBy=’.*/hourly_table/.*(2018-).*/(\d\d)/.*\.(log)’
...
17/04/27 15:37:45 INFO S3DistCp.S3DistCp: Created 0 files to copy 0 files
... 
Exception in thread “main” java.lang.RuntimeException: Error running job
	at com.amazon.elasticmapreduce.S3DistCp.S3DistCp.run(S3DistCp.java:927)
	at com.amazon.elasticmapreduce.S3DistCp.S3DistCp.run(S3DistCp.java:705)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
	at com.amazon.elasticmapreduce.S3DistCp.Main.main(Main.java:22)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
…

In this case, the key information is contained in Created 0 files to copy 0 files. S3DistCp didn’t find any files to copy because the regular expression in the --groupBy option doesn’t match any files in the source location.

The reason for this issue varies. For example, it can be a mistake in the specified pattern. In the preceding example, we don’t have any files for the year 2018. Another common reason is incorrect escaping of the pattern when we submit S3DistCp command as a step, which is addressed later later in this post.

6. Upload files larger than 1 TB in size

The default upload chunk size when doing an S3 multipart upload is 128 MB. When files are larger than 1 TB, the total number of parts can reach over 10,000. Such a large number of parts can make the job run for a very long time or even fail.

In this case, you can improve job performance by increasing the size of each part. In S3DistCp, you can do this by using the --multipartUploadChunkSize option.

Let’s test how it works on several files about 200 GB in size. With the default part size, it takes about 84 minutes to copy them to S3 from HDFS.

We can increase the default part size to 1000 MB:

$ time s3-dist-cp --src /data/gb200 --dest s3://my-tables/data/S3DistCp/gb200_2 --multipartUploadChunkSize=1000
...
real    41m1.616s

The maximum part size is 5 GB. Keep in mind that larger parts have a higher chance to fail during upload and don’t necessarily speed up the process. Let’s run the same job with the maximum part size:

time s3-dist-cp --src /data/gb200 --dest s3://my-tables/data/S3DistCp/gb200_2 --multipartUploadChunkSize=5000
...
real    40m17.331s

7. Submit a S3DistCp step to an EMR cluster

You can run the S3DistCp tool in several ways. First, you can SSH to the master node and execute the command in a terminal window as we did in the preceding examples. This approach might be convenient for many use cases, but sometimes you might want to create a cluster that has some data on HDFS. You can do this by submitting a step directly in the AWS Management Console when creating a cluster.

In the console add step dialog box, we can fill the fields in the following way:

  • Step type: Custom JAR
  • Name*: S3DistCp Stepli>
  • JAR location: command-runner.jar
  • Arguments: s3-dist-cp --src s3://my-tables/incoming/hourly_table --dest /data/input/hourly_table --targetSize 10 --groupBy .*/hourly_table/.*(2017-).*/(\d\d)/.*\.(log)
  • Action of failure: Continue

Notice that we didn’t add quotation marks around our pattern. We needed quotation marks when we were using bash in the terminal window, but not here. The console takes care of escaping and transferring our arguments to the command on the cluster.

Another common use case is to run S3DistCp recurrently or on some event. We can always submit a new step to the existing cluster. The syntax here is slightly different than in previous examples. We separate arguments by commas. In the case of a complex pattern, we shield the whole step option with single quotation marks:

aws emr add-steps --cluster-id j-ABC123456789Z --steps 'Name=LoadData,Jar=command-runner.jar,ActionOnFailure=CONTINUE,Type=CUSTOM_JAR,Args=s3-dist-cp,--src,s3://my-tables/incoming/hourly_table,--dest,/data/input/hourly_table,--targetSize,10,--groupBy,.*/hourly_table/.*(2017-).*/(\d\d)/.*\.(log)'

Summary

This post showed you the basics of how S3DistCp works and highlighted some of its most useful features. It covered how you can use S3DistCp to optimize for raw files of different sizes and also selectively copy different files between locations. We also looked at several options for using the tool from SSH, the AWS Management Console, and the AWS CLI.

If you have questions or suggestions, leave a message in the comments.


Next Steps

Take your new knowledge to the next level! Click on the post below and learn the top 10 tips to improve query performance in Amazon Athena.

Top 10 Performance Tuning Tips for Amazon Athena


About the Author

Illya Yalovyy is a Senior Software Development Engineer with Amazon Web Services. He works on cutting-edge features of EMR and is heavily involved in open source projects such as Apache Hive, Apache Zookeeper, Apache Sqoop. His spare time is completely dedicated to his children and family.

 

Getting started with soldering

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/getting-started-soldering/

In our newest resource video, Content and Curriculum Manager Laura Sach introduces viewers to the basics of soldering.

Getting started with soldering

Learn the basics of how to solder components together, and the safety precautions you need to take. Find a transcript of this video in our accompanying learning resource: raspberrypi.org/learning/getting-started-with-soldering/

So sit down, grab your Raspberry Pi Zero, and prepare to be schooled in the best (and warned about the worst) practices in the realm of soldering.

Do I have to?!

Yes. Yes, you do.

If you are planning to use a Raspberry Pi Zero or Zero W, or to build something magnificent using wires, buttons, lights, and more, you’ll want to practice your soldering technique. Those of us inexperienced in soldering have been jumping for joy since the release of the Pimoroni solderless header. However, if you want to your project to progress from the ‘prototyping with a breadboard’ stage to a durable final build, soldering is the best option for connecting all its components together.

soldering raspberry pi gif

Hot glue just won’t cut it this time. Sorry.

I promise it’s not hard to do, and the final result will give you a warm feeling of accomplishment…made warmer still if, like me, you burn yourself due to your inability to pay attention to instructions. (Please pay attention to the instructions.)

Soldering 101

As Laura explains in the video, there are two types of solder to choose from for your project: the lead-free kind that requires a slightly higher temperature to melt, and the lead-containing kind that – surprise, surprise – has lead in it. Although you’ll find other types of solder, one of these two is what you want for tinkering.

soldering raspberry pi

The decision…is yours.

In order to heat your solder and apply it to your project, you’ll need either Kryptonian heat vision* or, on this planet at least, a soldering iron. There is a variety of soldering irons available on the market, and as your making skills improve you will probably upgrade. But for now, try not to break the bank and choose an iron that’s within your budget. You may also want to ask around, as someone you know might be able to lend you theirs and help you out with your first soldering attempt.

Safety first!

Make sure you always solder in a well-ventilated area. Before you start, remove any small people, four-legged friends, and other trip hazards from the space and check you have everything you need close at hand.

soldering raspberry pi

The lab at Pi Towers is well ventilated thanks to this handy ventilation pipe…thingy.

And never forget, things get hot when you heat them! Always allow a moment for cooling before you handle your wonderful soldering efforts. I remember the first time I tried soldering a button to a Raspberry Pi and…let’s just say that I still bear the scars incured because I didn’t follow my own safety advice.

Let’s do this!

Now you’re geared up and ready to solder, follow along with Laura and fit a header to your Raspberry Pi Zero! You can also read a complete transcript of the video in our free Getting started with soldering  resource.

If you use Laura’s video to help you complete a soldering project, make sure to share your final piece with us via social media using the hashtag #ThanksLauraSach.

 

 

*spoiler alert!

The post Getting started with soldering appeared first on Raspberry Pi.