Tag Archives: BASIC

Kodi ‘Trademark Troll’ Has Interesting Views on Co-Opting Other People’s Work

Post Syndicated from Andy original https://torrentfreak.com/kodi-trademark-troll-has-interesting-views-on-co-opting-other-peoples-work-170917/

The Kodi team, operating under the XBMC Foundation, announced last week that a third-party had registered the Kodi trademark in Canada and was using it for their own purposes.

That person was Geoff Gavora, who had previously been in communication with the Kodi team, expressing how important the software was to his sales.

“We had hoped, given the positive nature of his past emails, that perhaps he was doing this for the benefit of the Foundation. We learned, unfortunately, that this was not the case,” XBMC Foundation President Nathan Betzen said.

According to the Kodi team, Gavora began delisting Amazon ads placed by companies selling Kodi-enabled products, based on infringement of Gavora’s trademark rights.

“[O]nly Gavora’s hardware can be sold, unless those companies pay him a fee to stay on the store,” Betzen explained.

Predictably, Gavora’s move is being viewed as highly controversial, not least since he’s effectively claiming licensing rights in Canada over what should be a free and open source piece of software. TF obtained one of the notices Amazon sent to a seller of a Kodi-enabled device in Canada, following a complaint from Gavora.

Take down Kodi from Amazon, or pay Gavora

So who is Geoff Gavora and what makes him tick? Thanks to a 2016 interview with Ali Salman of the Rapid Growth Podcast, we have a lot of information from the horse’s mouth.

It all began in 2011, when Gavora began jailbreaking Apple TVs, loading them with XBMC, and selling them to friends.

“I did it as a joke, for beer money from my friends,” Gavora told Salman.

“I’d do it for $25 to $50 and word of mouth spread that I was doing this so we could load on this media center to watch content and online streams from it.”

Intro to the interview with Ali Salman

Soon, however, word of mouth caused the business to grow wings, Gavora claims.

“So they started telling people and I start telling people it’s $50, and then I got so busy so I start telling people it’s $75. I’m getting too busy with my work and with this. And it got to the point where I was making more jailbreaking these Apple TVs than I was at my career, and I wasn’t very happy at my career at that time.”

Jailbreaking was supposed to be a side thing to tide Gavora over until another job came along, but he had a problem – he didn’t come from a technical background. Nevertheless, what Gavora did have was a background in marketing and with a decent knowledge of how to succeed in customer service, he majored on that front.

Gavora had come to learn that while people wanted his devices, they weren’t very good at operating XBMC (Kodi’s former name) which he’d loaded onto them. With this in mind, he began offering web support and phone support via a toll-free line.

“I started receiving calls from New York, Dallas, and then Australia, Hong Kong. Everyone around the world was calling me and saying ‘we hear there’s some kid in Calgary, some young child, who’s offering tech support for the Apple TV’,” Gavora said.

But with things apparently going well, a wrench was soon thrown into the works when Apple released the third variant of its Apple TV and Gavorra was unable to jailbreak it. This prompted him to market his own Linux-based set-top device and his business, Raw-Media, grew from there.

While it seems likely that so-called ‘Raw Boxes’ were doing reasonably well with consumers, what was the secret of their success? Podcast host Salman asked Gavora for his ‘networking party 10-second pitch’, and the Canadian was happy to oblige.

“I get this all the time actually. I basically tell people that I sell a box that gives them free TV and movies,” he said.

This was met with laughter from the host, to which Gavora added, “That’s sort of the three-second pitch and everyone’s like ‘Oh, tell me more’.”

“Who doesn’t like free TV, come on?” Salman responded. “Yeah exactly,” Gavora said.

The image below, taken from a January 2016 YouTube unboxing video, shows one of the products sold by Gavora’s company.

Raw-Media Kodi Box packaging (note Kodi logo)

Bearing in mind the offer of free movies and TV, the tagline on the box, “Stop paying for things you don’t want to watch, watch more free tv!” initially looks quite provocative. That being said, both the device and Kodi are perfectly capable of playing plenty of legal content from free sources, so there’s no problem there.

What is surprising, however, is that the unboxing video shows the device being booted up, apparently already loaded with infamous third-party Kodi addons including PrimeWire, Genesis, Icefilms, and Navi-X.

The unboxing video showing the Kodi setup

Given that Gavora has registered the Kodi trademark in Canada and prints the official logo on his packaging, this runs counter to the official Kodi team’s aggressive stance towards boxes ready-configured with what they categorize as banned addons. Matters are compounded when one visits the product support site.

As seen in the image below, Raw-Media devices are delivered with a printed card in the packaging informing people where to get the after-sales services Gavora says he built his business upon. The cards advise people to visit No-Issue.ca, a site setup to offer text and video-based support to set-top box buyers.

No-Issue.ca (which is hosted on the same server as raw-media.ca and claimed officially as a sister site here) now redirects to No-Issue.is, as per a 2016 announcement. It has a fairly bland forum but the connected tutorial videos, found on No Issue’s YouTube channel, offer a lot more spice.

Registered under Gavora’s online nickname Gombeek (which is also used on the official Kodi forums), the channel is full of videos detailing how to install and use a wide range of addons.

The No-issue YouTube Channel tutorials

But while supplying tutorial videos is one thing, providing the actual software addons is another. Surprisingly, No-Issue does that too. Filed away under the URL http://solved.no-issue.is/ is a Kodi repository which distributes a wide range of addons, including many that specialize in infringing content, according to the Kodi team.

The No-Issue repository

A source familiar with Raw-Media’s devices informs TF that they’re no longer delivered with addons installed. However, tools hosted on No-Issue.is automate the installation process for the customer, with unlisted YouTube Videos (1,2) providing the instructions.

XBMC Foundation President Nathan Betzen says that situation isn’t ideal.

“If that really is his repo it is disappointing to see that Gavora is charging a fee or outright preventing the sale of boxes with Kodi installed that do not include infringing add-ons, while at the same time he is distributing boxes himself that do include the infringing add-ons like this,” Betzen told TF.

While the legality of this type of service is yet to be properly tested in Canada and may yet emerge as entirely permissible under local law, Gavora himself previously described his business as operating in a gray area.

“If I could go back in time four years, I would’ve been more aggressive in the beginning because there was a lot of uncertainty being in a gray market business about how far I could push it,” he said.

“I really shouldn’t say it’s a gray market because everything I do is completely above board, I just felt it was more gray market so I was a bit scared,” he added.

But, legality aside (which will be determined in due course through various cases 1,2), the situation is still problematic when it comes to the Kodi trademark.

The official Kodi team indicate they don’t want to be associated with any kind of questionable addon or even tutorials for the same. Nevertheless, several of the addons installed by No-Issue (including PrimeWire, cCloud TV, Genesis, Icefilms, MoviesHD, MuchMovies and Navi-X, to name a few), are present on the Kodi team’s official ban list.

The fact remains, however, that Gavora successfully registered the trademark in Canada (one month later it was transferred to a brand new company at the same address), and Kodi now have no control over the situation in the country, short of a settlement or some kind of legal action.

Kodi matters aside, though, we get more insight into Gavora’s attitudes towards intellectual property after learning that he studied gemology and jewelry at school. He’s a long-standing member of jewelry discussion forum Ganoskin.com (his profile links to Gavora.com, a domain Gavora owns, as per information supplied by Amazon).

Things get particularly topical in a 2006 thread titled “When your work gets ripped“. The original poster asked how people feel when their jewelry work gets copied and Gavora made his opinions known.

“I think that what most people forget to remember is that when a piece from Tiffany’s or Cartier is ripped off or copied they don’t usually just copy the work, they will stamp it with their name as well,” Gavora said.

“This is, in fact, fraud and they are deceiving clients into believing they are purchasing genuine Tiffany’s or Cartier pieces. The client is in fact more interested in purchasing from an artist than they are the piece. Laying claim to designs (unless a symbol or name is involved) is outrageous.”

Unless that ‘design’ is called Kodi, of course, then it’s possible to claim it as your own through an administrative process and begin demanding licensing fees from the public. That being said, Gavora does seem to flip back and forth a little, later suggesting that being copied is sometimes ok.

“If someone copies your design and produces it under their own name, I think one should be honored and revel in the fact that your design is successful and has caused others to imitate it and grow from it,” he wrote.

“I look forward to the day I see one of my original designs copied, that is the day I will know my design is a success.”

From their public statements, this opinion isn’t shared by the Kodi team in respect of their product. Despite the Kodi name, software and logo being all their own work, they now find themselves having to claw back rights in Canada, in order to keep the product free in the region. For now, however, that seems like a difficult task.

TorrentFreak wrote to Gavora and asked him why he felt the need to register the Kodi trademark, but we received no response. That means we didn’t get the chance to ask him why he’s taking down Amazon listings for other people’s devices, or about something else that came up in the podcast.

“My biggest weakness, I guess, is that I’m too ethical about how I do my business,” he said, referring to how he deals with customers.

Only time will tell how that philosophy will affect Gavora’s attitudes to trademarks and people’s desire not to be charged for using free, open source software.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Self-Driving Cars Should Be Open Source

Post Syndicated from Bozho original https://techblog.bozho.net/self-driving-cars-open-source/

Self-driving cars are (will be) the pinnacle of consumer products automation – robot vacuum cleaners, smart fridges and TVs are just toys compared to self-driving cars. Both in terms of technology and in terms of impact. We aren’t yet on level 5 self driving cars , but they are behind the corner.

But as software engineers we know how fragile software is. And self-driving cars are basically software, so we can see all the risks involved with putting our lives in the hands anonymous (from our point of view) developers and unknown (to us) processes and quality standards. One may argue that this has been the case for every consumer product ever, but with software is different – software is way more complex than anything else.

So I have an outrageous proposal – self-driving cars should be open source. We have to be able to verify and trust the code that’s navigating our helpless bodies around the highways. Not only that, but we have to be able to verify if it is indeed that code that is currently running in our car, and not something else.

In fact, let me extend that – all cars should be open source. Before you say “but that will ruin the competitive advantage of manufacturers and will be deadly for business”, I don’t actually care how they trained their neural networks, or what their datasets are. That’s actually the secret sauce of the self-driving car and in my view it can remain proprietary and closed. What I’d like to see open-sourced is everything else. (Under what license – I’d be fine to even have it copyrighted and so not “real” open source, but that’s a separate discussion).

Why? This story about remote carjacking using the entertainment system of a Jeep is a scary example. Attackers that reverse engineer the car software can remotely control everything in the car. Why did that happen? Well, I guess it’s complicated and we have to watch the DEFCON talk.

And also read the paper, but a paragraph in wikipedia about the CAN bus used in most cars gives us a hint:

CAN is a low-level protocol and does not support any security features intrinsically. There is also no encryption in standard CAN implementations, which leaves these networks open to man-in-the-middle packet interception. In most implementations, applications are expected to deploy their own security mechanisms; e.g., to authenticate incoming commands or the presence of certain devices on the network. Failure to implement adequate security measures may result in various sorts of attacks if the opponent manages to insert messages on the bus. While passwords exist for some safety-critical functions, such as modifying firmware, programming keys, or controlling antilock brake actuators, these systems are not implemented universally and have a limited number of seed/key pair

I don’t know in what world it makes sense to even have a link between the entertainment system and the low-level network that operates the physical controls. As apparent from the talk, the two systems are supposed to be air-gapped, but in reality they aren’t.

Rookie mistakes were abound – unauthenticated “execute” method, running as root, firmware is not signed, hard-coded passwords, etc. How do we know that there aren’t tons of those in all cars out there right now, and in the self-driving cars of the future (which will likely use the same legacy technologies of the current cars)? Recently I heard a negative comment about the source code of one of the self-driving cars “players”, and I’m pretty sure there are many of those rookie mistakes.

Why this is this even more risky for self-driving cars? I’m not an expert in car programming, but it seems like the attack surface is bigger. I might be completely off target here, but on a typical car you’d have to “just” properly isolate the CAN bus. With self-driving cars the autonomous system that watches the surrounding and makes decisions on what to do next has to be connected to the CAN bus. With Tesla being able to send updates over the wire, the attack surface is even bigger (although that’s actually a good feature – to be able to patch all cars immediately once a vulnerability is discovered).

Of course, one approach would be to introduce legislation that regulates car software. It might work, but it would rely on governments to to proper testing, which won’t always be the case.

The alternative is to open-source it and let all the white-hats find your issues, so that you can close them before the car hits the road. Not only that, but consumers like me will feel safer, and geeks would be able to verify whether the car is really running the software it claims to run by verifying the fingerprints.

Richard Stallman might be seen as a fanatic when he advocates against closed source software, but in cases like … cars, his concerns seem less extreme.

“But the Jeep vulnerability was fixed”, you may say. And that might be seen as being the way things are – vulnerabilities appear, they get fixed, life goes on. No person was injured because of the bug, right? Well, not yet. And “gaining control” is the extreme scenario – there are still pretty bad scenarios, like being able to track a car through its GPS, or cause panic by controlling the entertainment system. It might be over wifi, or over GPRS, or even by physically messing with the car by inserting a flash drive. Is open source immune to those issues? No, but it has proven to be more resilient.

One industry where the problem of proprietary software on a product that the customer bought is … tractors. It turns out farmers are hacking their tractors, because of multiple issues and the inability of the vendor to resolve them in a timely manner. This is likely to happen to cars soon, when only authorized repair shops are allowed to touch anything on the car. And with unauthorized repair shops the attack surface becomes even bigger.

In fact, I’d prefer open source not just for cars, but for all consumer products. The source code of a smart fridge or a security camera is trivial, it would rarely mean sacrificing competitive advantage. But refrigerators get hacked, security cameras are active part of botnets, the “internet of shit” is getting ubiquitous. A huge amount of these issues are dumb, beginner mistakes. We have the right to know what shit we are running – in our frdges, DVRs and ultimatey – cars.

Your fridge may soon by spying on you, your vacuum cleaner may threaten your pet in demand of “ransom”. The terrorists of the future may crash planes without being armed, can crash vans into crowds without being in the van, and can “explode” home equipment without being in the particular home. And that’s not just a hypothetical.

Will open source magically solve the issue? No. But it will definitely make things better and safer, as it has done with operating systems and web servers.

The post Self-Driving Cars Should Be Open Source appeared first on Bozho's tech blog.

Automate Your IT Operations Using AWS Step Functions and Amazon CloudWatch Events

Post Syndicated from Andy Katz original https://aws.amazon.com/blogs/compute/automate-your-it-operations-using-aws-step-functions-and-amazon-cloudwatch-events/


Rob Percival, Associate Solutions Architect

Are you interested in reducing the operational overhead of your AWS Cloud infrastructure? One way to achieve this is to automate the response to operational events for resources in your AWS account.

Amazon CloudWatch Events provides a near real-time stream of system events that describe the changes and notifications for your AWS resources. From this stream, you can create rules to route specific events to AWS Step Functions, AWS Lambda, and other AWS services for further processing and automated actions.

In this post, learn how you can use Step Functions to orchestrate serverless IT automation workflows in response to CloudWatch events sourced from AWS Health, a service that monitors and generates events for your AWS resources. As a real-world example, I show automating the response to a scenario where an IAM user access key has been exposed.

Serverless workflows with Step Functions and Lambda

Step Functions makes it easy to develop and orchestrate components of operational response automation using visual workflows. Building automation workflows from individual Lambda functions that perform discrete tasks lets you develop, test, and modify the components of your workflow quickly and seamlessly. As serverless services, Step Functions and Lambda also provide the benefits of more productive development, reduced operational overhead, and no costs incurred outside of when the workflows are actively executing.

Example workflow

As an example, this post focuses on automating the response to an event generated by AWS Health when an IAM access key has been publicly exposed on GitHub. This is a diagram of the automation workflow:

AWS proactively monitors popular code repository sites for IAM access keys that have been publicly exposed. Upon detection of an exposed IAM access key, AWS Health generates an AWS_RISK_CREDENTIALS_EXPOSED event in the AWS account related to the exposed key. A configured CloudWatch Events rule detects this event and invokes a Step Functions state machine. The state machine then orchestrates the automated workflow that deletes the exposed IAM access key, summarizes the recent API activity for the exposed key, and sends the summary message to an Amazon SNS topic to notify the subscribers―in that order.

The corresponding Step Functions state machine diagram of this automation workflow can be seen below:

While this particular example focuses on IT automation workflows in response to the AWS_RISK_CREDENTIALS_EXPOSEDevent sourced from AWS Health, it can be generalized to integrate with other events from these services, other event-generating AWS services, and even run on a time-based schedule.

Walkthrough

To follow along, use the code and resources found in the aws-health-tools GitHub repo. The code and resources include an AWS CloudFormation template, in addition to instructions on how to use it.

Launch Stack into N. Virginia with CloudFormation

The Step Functions state machine execution starts with the exposed keys event details in JSON, a sanitized example of which is provided below:

{
    "version": "0",
    "id": "121345678-1234-1234-1234-123456789012",
    "detail-type": "AWS Health Event",
    "source": "aws.health",
    "account": "123456789012",
    "time": "2016-06-05T06:27:57Z",
    "region": "us-east-1",
    "resources": [],
    "detail": {
        "eventArn": "arn:aws:health:us-east-1::event/AWS_RISK_CREDENTIALS_EXPOSED_XXXXXXXXXXXXXXXXX",
        "service": "RISK",
        "eventTypeCode": "AWS_RISK_CREDENTIALS_EXPOSED",
        "eventTypeCategory": "issue",
        "startTime": "Sat, 05 Jun 2016 15:10:09 GMT",
        "eventDescription": [
            {
                "language": "en_US",
                "latestDescription": "A description of the event is provided here"
            }
        ],
        "affectedEntities": [
            {
                "entityValue": "ACCESS_KEY_ID_HERE"
            }
        ]
    }
}

After it’s invoked, the state machine execution proceeds as follows.

Step 1: Delete the exposed IAM access key pair

The first thing you want to do when you determine that an IAM access key has been exposed is to delete the key pair so that it can no longer be used to make API calls. This Step Functions task state deletes the exposed access key pair detailed in the incoming event, and retrieves the IAM user associated with the key to look up API activity for the user in the next step. The user name, access key, and other details about the event are passed to the next step as JSON.

This state contains a powerful error-handling feature offered by Step Functions task states called a catch configuration. Catch configurations allow you to reroute and continue state machine invocation at new states depending on potential errors that occur in your task function. In this case, the catch configuration skips to Step 3. It immediately notifies your security team that errors were raised in the task function of this step (Step 1), when attempting to look up the corresponding IAM user for a key or delete the user’s access key.

Note: Step Functions also offers a retry configuration for when you would rather retry a task function that failed due to error, with the option to specify an increasing time interval between attempts and a maximum number of attempts.

Step 2: Summarize recent API activity for key

After you have deleted the access key pair, you’ll want to have some immediate insight into whether it was used for malicious activity in your account. Another task state, this step uses AWS CloudTrail to look up and summarize the most recent API activity for the IAM user associated with the exposed key. The summary is in the form of counts for each API call made and resource type and name affected. This summary information is then passed to the next step as JSON. This step requires information that you obtained in Step 1. Step Functions ensures the successful completion of Step 1 before moving to Step 2.

Step 3: Notify security

The summary information gathered in the last step can provide immediate insight into any malicious activity on your account made by the exposed key. To determine this and further secure your account if necessary, you must notify your security team with the gathered summary information.

This final task state generates an email message providing in-depth detail about the event using the API activity summary, and publishes the message to an SNS topic subscribed to by the members of your security team.

If the catch configuration of the task state in Step 1 was triggered, then the security notification email instead directs your security team to log in to the console and navigate to the Personal Health Dashboard to view more details on the incident.

Lessons learned

When implementing this use case with Step Functions and Lambda, consider the following:

  • One of the most important parts of implementing automation in response to operational events is to ensure visibility into the response and resolution actions is retained. Step Functions and Lambda enable you to orchestrate your granular response and resolution actions that provides direct visibility into the state of the automation workflow.
  • This basic workflow currently executes these steps serially with a catch configuration for error handling. More sophisticated workflows can leverage the parallel execution, branching logic, and time delay functionality provided by Step Functions.
  • Catch and retry configurations for task states allow for orchestrating reliable workflows while maintaining the granularity of each Lambda function. Without leveraging a catch configuration in Step 1, you would have had to duplicate code from the function in Step 3 to ensure that your security team was notified on failure to delete the access key.
  • Step Functions and Lambda are serverless services, so there is no cost for these services when they are not running. Because this IT automation workflow only runs when an IAM access key is exposed for this account (which is hopefully rare!), the total monthly cost for this workflow is essentially $0.

Conclusion

Automating the response to operational events for resources in your AWS account can free up the valuable time of your engineers. Step Functions and Lambda enable granular IT automation workflows to achieve this result while gaining direct visibility into the orchestration and state of the automation.

For more examples of how to use Step Functions to automate the operations of your AWS resources, or if you’d like to see how Step Functions can be used to build and orchestrate serverless applications, visit Getting Started on the Step Functions website.

NSA Spied on Early File-Sharing Networks, Including BitTorrent

Post Syndicated from Andy original https://torrentfreak.com/nsa-spied-on-early-file-sharing-networks-including-bittorrent-170914/

In the early 2000s, when peer-to-peer (P2P) file-sharing was in its infancy, the majority of users had no idea that their activities could be monitored by outsiders. The reality was very different, however.

As few as they were, all of the major networks were completely open, with most operating a ‘shared folder’ type system that allowed any network participant to see exactly what another user was sharing. Nevertheless, with little to no oversight, file-sharing at least felt like a somewhat private affair.

As user volumes began to swell, software such as KaZaA (which utilized the FastTrack network) and eDonkey2000 (eD2k network) attracted attention from record labels, who were desperate to stop the unlicensed sharing of copyrighted content. The same held true for the BitTorrent networks that arrived on the scene a couple of years later.

Through the rise of lawsuits against consumers, the general public began to learn that their activities on P2P networks were not secret and they were being watched for some, if not all, of the time by copyright holders. Little did they know, however, that a much bigger player was also keeping a watchful eye.

According to a fascinating document just released by The Intercept as part of the Edward Snowden leaks, the National Security Agency (NSA) showed a keen interest in trying to penetrate early P2P networks.

Initially published by internal NSA news site SIDToday in June 2005, the document lays out the aims of a program called FAVA – File-Sharing Analysis and Vulnerability Assessment.

“One question that naturally arises after identifying file-sharing traffic is whether or not there is anything of intelligence value in this traffic,” the NSA document begins.

“By searching our collection databases, it is clear that many targets are using popular file sharing applications; but if they are merely sharing the latest release of their favorite pop star, this traffic is of dubious value (no offense to Britney Spears intended).”

Indeed, the vast majority of users of these early networks were only been interested in sharing relatively small music files, which were somewhat easy to manage given the bandwidth limitations of the day. However, the NSA still wanted to know what was happening on a broader scale, so that meant decoding their somewhat limited encryption.

“As many of the applications, such as KaZaA for example, encrypt their traffic, we first had to decrypt the traffic before we could begin to parse the messages. We have developed the capability to decrypt and decode both KaZaA and eDonkey traffic to determine which files are being shared, and what queries are being performed,” the NSA document reveals.

Most progress appears to have been made against KaZaA, with the NSA revealing the use of tools to parse out registry entries on users’ hard drives. This information gave up users’ email addresses, country codes, user names, the location of their stored files, plus a list of recent searches.

This gave the NSA the ability to look deeper into user behavior, which revealed some P2P users going beyond searches for basic run-of-the-mill multimedia content.

“[We] have discovered that our targets are using P2P systems to search for and share files which are at the very least somewhat surprising — not simply harmless music and movie files. With more widespread adoption, these tools will allow us to regularly assimilate data which previously had been passed over; giving us a more complete picture of our targets and their activities,” the document adds.

Today, more than 12 years later, with KaZaA long dead and eDonkey barely alive, scanning early pirate activities might seem a distant act. However, there’s little doubt that similar programs remain active today. Even in 2005, the FAVA program had lofty ambitions, targeting other networks and protocols including DirectConnect, Freenet, Gnutella, Gnutella2, JoltID, MSN Messenger, Windows Messenger and……BitTorrent.

“If you have a target using any of these applications or using some other application which might fall into the P2P category, please contact us,” the NSA document urges staff. “We would be more than happy to help.”

Confirming the continued interest in BitTorrent, The Intercept has published a couple of further documents which deal with the protocol directly.

The first details an NSA program called GRIMPLATE, which aimed to study how Department of Defense employees were using BitTorrent and whether that constituted a risk.

The second relates to P2P research carried out by Britain’s GCHQ spy agency. It details DIRTY RAT, a web application which gave the government to “the capability to identify users sharing/downloading files of interest on the eMule (Kademlia) and BitTorrent networks.”

The SIDToday document detailing the FAVA program can be viewed here

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Five Must-Watch Software Engineering Talks

Post Syndicated from Bozho original https://techblog.bozho.net/five-must-watch-software-engineering-talks/

We’ve all watched dozens of talks online. And we probably don’t remember many of them. But some do stick in our heads and we eventually watch them again (and again) because we know they are good and we want to remember the things that were said there. So I decided to compile a small list of talks that I find very insightful, useful and that have, in a way, shaped my software engineering practice or expanded my understanding of the software world.

1. How To Design A Good API and Why it Matters by Joshua Bloch – this is a must-watch (well, obviously all are). And don’t skip it because “you are not writing APIs” – everyone is writing APIs. Maybe not used by hundreds of other developers, but used by at least several, and that’s a good enough reason. Having watched this talk I ended up buying and reading one of the few software books that I have actually read end-to-end – “Effective Java” (the talk uses Java as an example, but the principles aren’t limited to Java)

2. How to write clean, testable code by Miško Hevery. Maybe there are tons of talks about testing code, maybe Uncle Bob has a more popular one, but I found this one particularly practical and the the point – that writing testable code is a skill, and that testable code is good code. (By the way, the speaker then wrote AngularJS)

3. Back to basics: the mess we’ve made of our fundamental data types by Jon Skeet. The title says it all, and it’s nice to be reminded of how fragile even the basics of programming languages are.

4. The Danger of Software Patents by Richard Stallman. That goes a little bit away from writing software, but puts software in legal context – how do legislation loopholes affect code reuse and business practices related it. It’s a bit long, but I think worth it.

5. Does my ESB look big in this? by Martin Fowler and Jim Webber. It’s about bloated enterprise architecture and how to actually do enterprise architecture without complex and expensive middleware. (Unfortunately it’s not on YouTube, so no embedding).

Although this is not a “ranking”, I’d like to add a few honourable mentions: The famous “WAT” lightning talk, showing some quirks of ruby and javascript, “The future of programming” by Bret Victor, “You suck at Excel” by Joel Spolsky, which isn’t really about creating software, but it’s cool. And a tiny shameless plug with my “Common sense driven development talk”

I hope the compilation is useful and enlightening. Enjoy.

The post Five Must-Watch Software Engineering Talks appeared first on Bozho's tech blog.

Wikto Scanner Download – Web Server Security Tool

Post Syndicated from Darknet original https://www.darknet.org.uk/2017/09/wikto-scanner-download-web-server-security-tool/?utm_source=rss&utm_medium=social&utm_campaign=darknetfeed

Wikto Scanner Download – Web Server Security Tool

Wikto is an Open Source (GPL) web server scanner which performs comprehensive tests against web servers for multiple items, including over 3500 potentially dangerous files/CGIs, versions on over 900 servers, and version specific problems on over 250 servers.

It’s Nikto for Windows basically with some extra features written in C# and requires the .NET framework.

What is Wikto

Wikto is not a web application scanner. It is totally unaware of the application (if any) that’s running on the web site.

Read the rest of Wikto Scanner Download – Web Server Security Tool now! Only available at Darknet.

How Much Does ‘Free’ Premier League Piracy Cost These Days?

Post Syndicated from Andy original https://torrentfreak.com/how-much-does-free-premier-league-piracy-cost-these-days-170902/

Right now, the English Premier League is engaged in perhaps the most aggressively innovative anti-piracy operation the Internet has ever seen. After obtaining a new High Court order, it now has the ability to block ‘pirate’ streams of matches, in real-time, with no immediate legal oversight.

If the Premier League believes a server is streaming one of its matches, it can ask ISPs in the UK to block it, immediately. That’s unprecedented anywhere on the planet.

As previously reported, this campaign caused a lot of problems for people trying to access free and premium streams at the start of the season. Many IPTV services were blocked in the UK within minutes of matches starting, with free streams also dropping like flies. According to information obtained by TF, more than 600 illicit streams were blocked during that weekend.

While some IPTV providers and free streams continued without problems, it seems likely that it’s only a matter of time before the EPL begins to pick off more and more suppliers. To be clear, the EPL isn’t taking services or streams down, it’s only blocking them, which means that people using circumvention technologies like VPNs can get around the problem.

However, this raises the big issue again – that of continuously increasing costs. While piracy is often painted as free, it is not, and as setups get fancier, costs increase too.

Below, we take a very general view of a handful of the many ‘pirate’ configurations currently available, to work out how much ‘free’ piracy costs these days. The list is not comprehensive by any means (and excludes more obscure methods such as streaming torrents, which are always free and rarely blocked), but it gives an idea of costs and how the balance of power might eventually tip.

Basic beginner setup

On a base level, people who pirate online need at least some equipment. That could be an Android smartphone and easily installed free software such as Mobdro or Kodi. An Internet connection is a necessity and if the EPL blocks those all important streams, a VPN provider is required to circumvent the bans.

Assuming people already have a phone and the Internet, a VPN can be bought for less than £5 per month. This basic setup is certainly cheap but overall it’s an entry level experience that provides quality equal to the effort and money expended.

Equipment: Phone, tablet, PC
Comms: Fast Internet connection, decent VPN provider
Overal performance: Low quality, unpredictable, often unreliable
Cost: £5pm approx for VPN, plus Internet costs

Big screen, basic

For those who like their matches on the big screen, stepping up the chain costs more money. People need a TV with an HDMI input and a fast Internet connection as a minimum, alongside some kind of set-top device to run the necessary software.

Android devices are the most popular and are roughly split into two groups – the small standalone box type and the plug-in ‘stick’ variant such as Amazon’s Firestick.

A cheap Android set-top box

These cost upwards of £30 to £40 but the software to install on them is free. Like the phone, Mobdro is an option, but most people look to a Kodi setup with third-party addons. That said, all streams received on these setups are now vulnerable to EPL blocking so in the long-term, users will need to run a paid VPN.

The problem here is that some devices (including the 1st gen Firestick) aren’t ideal for running a VPN on top of a stream, so people will need to dump their old device and buy something more capable. That could cost another £30 to £40 and more, depending on requirements.

Importantly, none of this investment guarantees a decent stream – that’s down to what’s available on the day – but invariably the quality is low and/or intermittent, at best.

Equipment: TV, decent Android set-top box or equivalent
Comms: Fast Internet connection, decent VPN provider
Overall performance: Low to acceptable quality, unpredictable, often unreliable
Cost: £30 to £50 for set-top box, £5pm approx for VPN, plus Internet

Premium IPTV – PC or Android based

At this point, premium IPTV services come into play. People have a choice of spending varying amounts of money, depending on the quality of experience they require.

First of all, a monthly IPTV subscription with an established provider that isn’t going to disappear overnight is required, which can be a challenge to find in itself. We’re not here to review or recommend services but needless to say, like official TV packages they come in different flavors to suit varying wallet sizes. Some stick around, many don’t.

A decent one with a Sky-like EPG costs between £7 and £15 per month, depending on the quality and depth of streams, and how far in front users are prepared to commit.

Fairly typical IPTV with EPG (VOD shown)

Paying for a year in advance tends to yield better prices but with providers regularly disappearing and faltering in their service levels, people are often reluctant to do so. That said, some providers experience few problems so it’s a bit like gambling – research can improve the odds but there’s never a guarantee.

However, even when a provider, price, and payment period is decided upon, the process of paying for an IPTV service can be less than straightforward.

While some providers are happy to accept PayPal, many will only deal in credit cards, bitcoin, or other obscure payment methods. That sets up more barriers to entry that might deter the less determined customer. And, if time is indeed money, fussing around with new payment processors can be pricey, at least to begin with.

Once subscribed though, watching these streams is pretty straightforward. On a base level, people can use a phone, tablet, or set-top device to receive them, using software such as Perfect Player IPTV, for example. Currently available in free (ad supported) and premium (£2) variants, this software can be setup in a few clicks and will provide a decent user experience, complete with EPG.

Perfect Player IPTV

Those wanting to go down the PC route have more options but by far the most popular is receiving IPTV via a Kodi setup. For the complete novice, it’s not always easy to setup but some IPTV providers supply their own free addons, which streamline the process massively. These can also be used on Android-based Kodi setups, of course.

Nevertheless, if the EPL blocks the provider, a VPN is still going to be needed to access the IPTV service.

An Android tablet running Kodi

So, even if we ignore the cost of the PC and Internet connection, users could still find themselves paying between £10 and £20 per month for an IPTV service and a decent VPN. While more channels than simply football will be available from most providers, this is getting dangerously close to the £18 Sky are asking for its latest football package.

Equipment: TV, PC, or decent Android set-top box or equivalent
Comms: Fast Internet connection, IPTV subscription, decent VPN provider
Overal performance: High quality, mostly reliable, user-friendly (once setup)
Cost: PC or £30/£50 for set-top box, IPTV subscription £7 to £15pm, £5pm approx for VPN, plus Internet, plus time and patience for obscure payment methods.
Note: There are zero refunds when IPTV providers disappoint or disappear

Premium IPTV – Deluxe setup

Moving up to the top of the range, things get even more costly. Those looking to give themselves the full home entertainment-like experience will often move away from the PC and into the living room in front of the TV, armed with a dedicated set-top box. Weapon of choice: the Mag254.

Like Amazon’s FireStick, PC or Android tablet, the Mag254 is an entirely legal, content agnostic device. However, enter the credentials provided by many illicit IPTV suppliers and users are presented with a slick Sky-like experience, far removed from anything available elsewhere. The device is operated by remote control and integrates seamlessly with any HDMI-capable TV.

Mag254 IPTV box

Something like this costs around £70 in the UK, plus the cost of a WiFi adaptor on top, if needed. The cost of the IPTV provider needs to be figured in too, plus a VPN subscription if the provider gets blocked by EPL, which is likely. However, in this respect the Mag254 has a problem – it can’t run a VPN natively. This means that if streams get blocked and people need to use a VPN, they’ll need to find an external solution.

Needless to say, this costs more money. People can either do all the necessary research and buy a VPN-capable router/modem that’s also compatible with their provider (this can stretch to a couple of hundred pounds) or they’ll need to invest in a small ‘travel’ router with VPN client features built in.

‘Travel’ router (with tablet running Mobdro for scale)

These devices are available on Amazon for around £25 and sit in between the Mag254 (or indeed any other wireless device) and the user’s own regular router. Once the details of the VPN subscription are entered into the router, all traffic passing through is encrypted and will tunnel through web blocking measures. They usually solve the problem (ymmv) but of course, this is another cost.

Equipment: Mag254 or similar, with WiFi
Comms: Fast Internet connection, IPTV subscription, decent VPN provider
Overall performance: High quality, mostly reliable, very user-friendly
Cost: Mag254 around £75 with WiFi, IPTV subscription £7 to £15pm, £5pm for VPN (plus £25 for mini router), plus Internet, plus patience for obscure payment methods.
Note: There are zero refunds when IPTV providers disappoint or disappear

Conclusion

On the whole, people who want a reliable and high-quality Premier League streaming experience cannot get one for free, no matter where they source the content. There are many costs involved, some of which cannot be avoided.

If people aren’t screwing around with annoying and unreliable Kodi streams, they’ll be paying for an IPTV provider, VPN and other equipment. Or, if they want an easy life, they’ll be paying Sky, BT or Virgin Media. That might sound harsh to many pirates but it’s the only truly reliable solution.

However, for those looking for something that’s merely adequate, costs drop significantly. Indeed, if people don’t mind the hassle of wondering whether a sub-VHS quality stream will appear before the big match and stay on throughout, it can all be done on a shoestring.

But perhaps the most important thing to note in respect of costs is the recent changes to the pricing of Premier League content in the UK. As mentioned earlier, Sky now delivers a sports package for £18pm, which sounds like the best deal offered to football fans in recent years. It will be tempting for sure and has all the hallmarks of a price point carefully calculated by Sky.

The big question is whether it will be low enough to tip significant numbers of people away from piracy. The reality is that if another couple of thousand streams get hit hard again this weekend – and the next – and the next – many pirating fans will be watching the season drift away for yet another month, unviewed. That’s got to be frustrating.

The bottom line is that high-quality streaming piracy is becoming a little bit pricey just for football so if it becomes unreliable too – and that’s the Premier League’s goal – the balance of power could tip. At this point, the EPL will need to treat its new customers with respect, in order to keep them feeling both entertained and unexploited.

Fail on those counts – especially the latter – and the cycle will start again.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

AWS Hot Startups – August 2017

Post Syndicated from Tina Barr original https://aws.amazon.com/blogs/aws/aws-hot-startups-august-2017/

There’s no doubt about it – Artificial Intelligence is changing the world and how it operates. Across industries, organizations from startups to Fortune 500s are embracing AI to develop new products, services, and opportunities that are more efficient and accessible for their consumers. From driverless cars to better preventative healthcare to smart home devices, AI is driving innovation at a fast rate and will continue to play a more important role in our everyday lives.

This month we’d like to highlight startups using AI solutions to help companies grow. We are pleased to feature:

  • SignalBox – a simple and accessible deep learning platform to help businesses get started with AI.
  • Valossa – an AI video recognition platform for the media and entertainment industry.
  • Kaliber – innovative applications for businesses using facial recognition, deep learning, and big data.

SignalBox (UK)

In 2016, SignalBox founder Alain Richardt was hearing the same comments being made by developers, data scientists, and business leaders. They wanted to get into deep learning but didn’t know where to start. Alain saw an opportunity to commodify and apply deep learning by providing a platform that does the heavy lifting with an easy-to-use web interface, blueprints for common tasks, and just a single-click to productize the models. With SignalBox, companies can start building deep learning models with no coding at all – they just select a data set, choose a network architecture, and go. SignalBox also offers step-by-step tutorials, tips and tricks from industry experts, and consulting services for customers that want an end-to-end AI solution.

SignalBox offers a variety of solutions that are being used across many industries for energy modeling, fraud detection, customer segmentation, insurance risk modeling, inventory prediction, real estate prediction, and more. Existing data science teams are using SignalBox to accelerate their innovation cycle. One innovative UK startup, Energi Mine, recently worked with SignalBox to develop deep networks that predict anomalous energy consumption patterns and do time series predictions on energy usage for businesses with hundreds of sites.

SignalBox uses a variety of AWS services including Amazon EC2, Amazon VPC, Amazon Elastic Block Store, and Amazon S3. The ability to rapidly provision EC2 GPU instances has been a critical factor in their success – both in terms of keeping their operational expenses low, as well as speed to market. The Amazon API Gateway has allowed for operational automation, giving SignalBox the ability to control its infrastructure.

To learn more about SignalBox, visit here.

Valossa (Finland)

As students at the University of Oulu in Finland, the Valossa founders spent years doing research in the computer science and AI labs. During that time, the team witnessed how the world was moving beyond text, with video playing a greater role in day-to-day communication. This spawned an idea to use technology to automatically understand what an audience is viewing and share that information with a global network of content producers. Since 2015, Valossa has been building next generation AI applications to benefit the media and entertainment industry and is moving beyond the capabilities of traditional visual recognition systems.

Valossa’s AI is capable of analyzing any video stream. The AI studies a vast array of data within videos and converts that information into descriptive tags, categories, and overviews automatically. Basically, it sees, hears, and understands videos like a human does. The Valossa AI can detect people, visual and auditory concepts, key speech elements, and labels explicit content to make moderating and filtering content simpler. Valossa’s solutions are designed to provide value for the content production workflow, from media asset management to end-user applications for content discovery. AI-annotated content allows online viewers to jump directly to their favorite scenes or search specific topics and actors within a video.

Valossa leverages AWS to deliver the industry’s first complete AI video recognition platform. Using Amazon EC2 GPU instances, Valossa can easily scale their computation capacity based on customer activity. High-volume video processing with GPU instances provides the necessary speed for time-sensitive workflows. The geo-located Availability Zones in EC2 allow Valossa to bring resources close to their customers to minimize network delays. Valossa also uses Amazon S3 for video ingestion and to provide end-user video analytics, which makes managing and accessing media data easy and highly scalable.

To see how Valossa works, check out www.WhatIsMyMovie.com or enable the Alexa Skill, Valossa Movie Finder. To try the Valossa AI, sign up for free at www.valossa.com.

Kaliber (San Francisco, CA)

Serial entrepreneurs Ray Rahman and Risto Haukioja founded Kaliber in 2016. The pair had previously worked in startups building smart cities and online privacy tools, and teamed up to bring AI to the workplace and change the hospitality industry. Our world is designed to appeal to our senses – stores and warehouses have clearly marked aisles, products are colorfully packaged, and we use these designs to differentiate one thing from another. We tell each other apart by our faces, and previously that was something only humans could measure or act upon. Kaliber is using facial recognition, deep learning, and big data to create solutions for business use. Markets and companies that aren’t typically associated with cutting-edge technology will be able to use their existing camera infrastructure in a whole new way, making them more efficient and better able to serve their customers.

Computer video processing is rapidly expanding, and Kaliber believes that video recognition will extend to far more than security cameras and robots. Using the clients’ network of in-house cameras, Kaliber’s platform extracts key data points and maps them to actionable insights using their machine learning (ML) algorithm. Dashboards connect users to the client’s BI tools via the Kaliber enterprise APIs, and managers can view these analytics to improve their real-world processes, taking immediate corrective action with real-time alerts. Kaliber’s Real Metrics are aimed at combining the power of image recognition with ML to ultimately provide a more meaningful experience for all.

Kaliber uses many AWS services, including Amazon Rekognition, Amazon Kinesis, AWS Lambda, Amazon EC2 GPU instances, and Amazon S3. These services have been instrumental in helping Kaliber meet the needs of enterprise customers in record time.

Learn more about Kaliber here.

Thanks for reading and we’ll see you next month!

-Tina

 

Kim Dotcom Wants K.im to Trigger a “Copyright Revolution”

Post Syndicated from Ernesto original https://torrentfreak.com/kim-dotcom-wants-k-im-to-trigger-a-copyright-revolution-170831/

For many people Kim Dotcom is synonymous with Megaupload, the file-sharing giant that was taken down by the U.S. Government early 2012.

While Megaupload is no more, the New Zealand Internet entrepreneur is working on a new file-sharing site. Initially dubbed Megaupload 2, the new service will be called K.im, and it will be quite different from its predecessor.

This week Dotcom, who’s officially the chief “evangelist” of the service, showed a demo to a few thousand people revealing more about what it’s going to offer.

K.im is not a central hosting service, quite the contrary. It will allow users to upload content and distribute it to dozens of other services, including Dropbox, Google, Reddit, Storj, and even torrent sites.

The files are distributed across the Internet where they can be accessed freely. However, there is a catch. The uploaders set a price for each download and people who want a copy can only unlock it through the K.im app or browser addon, after they’ve paid.

Pick your price

K.im, paired with Bitcache, is basically a micropayment solution. It allows creators to charge the public for everything they upload. Every download is tied to a Bitcoin transaction, turning files into their own “stores.”

Kim Dotcom tells TorrentFreak that he sees the service as a copyright revolution. It should be a win-win solution for independent creators, rightsholders, and people who are used to pirating stuff.

“I’m working for both sides. For the copyright holders and also for the people who what to pay for content but have been geo-blocked and then are forced to download for free,” Dotcom says.

Like any other site that allows user uploaded content, K.im can also be used by pirates who want to charge a small fee for spreading infringing content. This is something Dotcom is aware of, but he has a solution in mind.

Much like YouTube, which allows rightsholders to “monetize” videos that use their work, K.im will provide an option to claim pirated content. Rightsholders can then change the price and all revenue will go to them.

So, if someone uploads a pirated copy of the Game of Thrones season finale through K.im, HBO can claim that file, charge an appropriate fee, and profit from it. The uploader, meanwhile, maintains his privacy.

“It is the holy grail of copyright enforcement. It is my gift to Hollywood, the movie studios, and everyone else,” Dotcom says.

Dotcom believes that piracy is in large part caused by an availability problem. People can often not find the content they’re looking for so it’s K.im’s goal to distribute files as widely as possible. This includes several torrent sites, which are currently featured in the demo.

Torrent uploads?

Interestingly, it will be hard to upload content to sites such as YTS, EZTV, KickassTorrents, and RARBG, as they’ve been shut down or don’t allow user uploads. However, Dotcom stresses that the names are just examples, and that they are still working on partnering with various sites.

Whether torrent sites will be eager to cooperate has yet to be seen. It’s possible that the encrypted files, which can’t be opened without paying, will be seen as “spam” by traditional torrent sites.

Also, from a user perspective, one has to wonder how many people are willing to pay for something if they set out to pirate it. After all, there will always be plenty of free options for those who refuse to or can’t pay.

Dotcom, however, is convinced that K.im can create a “copyright revolution.” He stresses that site owners and uploaders can greatly benefit from it as they receive affiliate fees, even after a pirated file is claimed by a rightsholder.

In addition, he says it will revolutionize copyright enforcement, as copyright holders can monetize the work of pirates. That is, if they are willing to work with the service.

“Rightsholders can turn piracy traffic into revenue and users can access the content on any platform. Since every file is a store, it doesn’t matter where it ends up,” Dotcom says.

Dotcom does have a very valid point here. Many people have simply grown used to pirating because it’s much more convenient than using a dozen different services. In Dotcom’s vision, people can just use one site to access everything.

The ideas don’t stop at sharing files either. In the future, Dotcom also wants to use the micropayment option to offer YouTubers and media organizations to accept payments from the public, BBC notes.

There’s still a long way to go before K.im and Bitcache go public though. The expected launch date is not final yet, but the services are expected to go live in mid-to-late 2018.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Weekly roundup: Let’s get physical

Post Syndicated from Eevee original https://eev.ee/dev/2017/08/29/weekly-roundup-lets-get-physical/

  • cc: I did a lot of physics and a lot of tearing my hair out. I changed ground detection to be based on collision, which opened the door to making slopes work, and clumsily hacked sliding into working. Then I wasted two days banging my head against a wall and getting nowhere.

    (But spoilers for next week: I did get slopes working perfectly on Sunday.)

  • fox flux: Focusing just on player sprites since the game is fundamentally not playable without them. I can finally see the light at the end of the tunnel; almost all of the walking sequences are done, and about half of the forms are more or less done. So, like, 60% of the way there maybe?

  • veekun: Loaded and fixed a bunch of little things that were missing, and now the site is basically functional. Had to fix some more form ordering problems, little obscure connections we forget about with every game, and some issues with evolutions. But I do have a website, and that’s nice. Ideally I’ll have something worth publishing by the end of the month.

And, hm, that’s all. Looks like most of my week went to CC physics, which has me in a mood to work on my book again… horf, so much to do.

How to Configure an LDAPS Endpoint for Simple AD

Post Syndicated from Cameron Worrell original https://aws.amazon.com/blogs/security/how-to-configure-an-ldaps-endpoint-for-simple-ad/

Simple AD, which is powered by Samba  4, supports basic Active Directory (AD) authentication features such as users, groups, and the ability to join domains. Simple AD also includes an integrated Lightweight Directory Access Protocol (LDAP) server. LDAP is a standard application protocol for the access and management of directory information. You can use the BIND operation from Simple AD to authenticate LDAP client sessions. This makes LDAP a common choice for centralized authentication and authorization for services such as Secure Shell (SSH), client-based virtual private networks (VPNs), and many other applications. Authentication, the process of confirming the identity of a principal, typically involves the transmission of highly sensitive information such as user names and passwords. To protect this information in transit over untrusted networks, companies often require encryption as part of their information security strategy.

In this blog post, we show you how to configure an LDAPS (LDAP over SSL/TLS) encrypted endpoint for Simple AD so that you can extend Simple AD over untrusted networks. Our solution uses Elastic Load Balancing (ELB) to send decrypted LDAP traffic to HAProxy running on Amazon EC2, which then sends the traffic to Simple AD. ELB offers integrated certificate management, SSL/TLS termination, and the ability to use a scalable EC2 backend to process decrypted traffic. ELB also tightly integrates with Amazon Route 53, enabling you to use a custom domain for the LDAPS endpoint. The solution needs the intermediate HAProxy layer because ELB can direct traffic only to EC2 instances. To simplify testing and deployment, we have provided an AWS CloudFormation template to provision the ELB and HAProxy layers.

This post assumes that you have an understanding of concepts such as Amazon Virtual Private Cloud (VPC) and its components, including subnets, routing, Internet and network address translation (NAT) gateways, DNS, and security groups. You should also be familiar with launching EC2 instances and logging in to them with SSH. If needed, you should familiarize yourself with these concepts and review the solution overview and prerequisites in the next section before proceeding with the deployment.

Note: This solution is intended for use by clients requiring an LDAPS endpoint only. If your requirements extend beyond this, you should consider accessing the Simple AD servers directly or by using AWS Directory Service for Microsoft AD.

Solution overview

The following diagram and description illustrates and explains the Simple AD LDAPS environment. The CloudFormation template creates the items designated by the bracket (internal ELB load balancer and two HAProxy nodes configured in an Auto Scaling group).

Diagram of the the Simple AD LDAPS environment

Here is how the solution works, as shown in the preceding numbered diagram:

  1. The LDAP client sends an LDAPS request to ELB on TCP port 636.
  2. ELB terminates the SSL/TLS session and decrypts the traffic using a certificate. ELB sends the decrypted LDAP traffic to the EC2 instances running HAProxy on TCP port 389.
  3. The HAProxy servers forward the LDAP request to the Simple AD servers listening on TCP port 389 in a fixed Auto Scaling group configuration.
  4. The Simple AD servers send an LDAP response through the HAProxy layer to ELB. ELB encrypts the response and sends it to the client.

Note: Amazon VPC prevents a third party from intercepting traffic within the VPC. Because of this, the VPC protects the decrypted traffic between ELB and HAProxy and between HAProxy and Simple AD. The ELB encryption provides an additional layer of security for client connections and protects traffic coming from hosts outside the VPC.

Prerequisites

  1. Our approach requires an Amazon VPC with two public and two private subnets. The previous diagram illustrates the environment’s VPC requirements. If you do not yet have these components in place, follow these guidelines for setting up a sample environment:
    1. Identify a region that supports Simple AD, ELB, and NAT gateways. The NAT gateways are used with an Internet gateway to allow the HAProxy instances to access the internet to perform their required configuration. You also need to identify the two Availability Zones in that region for use by Simple AD. You will supply these Availability Zones as parameters to the CloudFormation template later in this process.
    2. Create or choose an Amazon VPC in the region you chose. In order to use Route 53 to resolve the LDAPS endpoint, make sure you enable DNS support within your VPC. Create an Internet gateway and attach it to the VPC, which will be used by the NAT gateways to access the internet.
    3. Create a route table with a default route to the Internet gateway. Create two NAT gateways, one per Availability Zone in your public subnets to provide additional resiliency across the Availability Zones. Together, the routing table, the NAT gateways, and the Internet gateway enable the HAProxy instances to access the internet.
    4. Create two private routing tables, one per Availability Zone. Create two private subnets, one per Availability Zone. The dual routing tables and subnets allow for a higher level of redundancy. Add each subnet to the routing table in the same Availability Zone. Add a default route in each routing table to the NAT gateway in the same Availability Zone. The Simple AD servers use subnets that you create.
    5. The LDAP service requires a DNS domain that resolves within your VPC and from your LDAP clients. If you do not have an existing DNS domain, follow the steps to create a private hosted zone and associate it with your VPC. To avoid encryption protocol errors, you must ensure that the DNS domain name is consistent across your Route 53 zone and in the SSL/TLS certificate (see Step 2 in the “Solution deployment” section).
  2. Make sure you have completed the Simple AD Prerequisites.
  3. We will use a self-signed certificate for ELB to perform SSL/TLS decryption. You can use a certificate issued by your preferred certificate authority or a certificate issued by AWS Certificate Manager (ACM).
    Note: To prevent unauthorized connections directly to your Simple AD servers, you can modify the Simple AD security group on port 389 to block traffic from locations outside of the Simple AD VPC. You can find the security group in the EC2 console by creating a search filter for your Simple AD directory ID. It is also important to allow the Simple AD servers to communicate with each other as shown on Simple AD Prerequisites.

Solution deployment

This solution includes five main parts:

  1. Create a Simple AD directory.
  2. Create a certificate.
  3. Create the ELB and HAProxy layers by using the supplied CloudFormation template.
  4. Create a Route 53 record.
  5. Test LDAPS access using an Amazon Linux client.

1. Create a Simple AD directory

With the prerequisites completed, you will create a Simple AD directory in your private VPC subnets:

  1. In the Directory Service console navigation pane, choose Directories and then choose Set up directory.
  2. Choose Simple AD.
    Screenshot of choosing "Simple AD"
  3. Provide the following information:
    • Directory DNS – The fully qualified domain name (FQDN) of the directory, such as corp.example.com. You will use the FQDN as part of the testing procedure.
    • NetBIOS name – The short name for the directory, such as CORP.
    • Administrator password – The password for the directory administrator. The directory creation process creates an administrator account with the user name Administrator and this password. Do not lose this password because it is nonrecoverable. You also need this password for testing LDAPS access in a later step.
    • Description – An optional description for the directory.
    • Directory Size – The size of the directory.
      Screenshot of the directory details to provide
  4. Provide the following information in the VPC Details section, and then choose Next Step:
    • VPC – Specify the VPC in which to install the directory.
    • Subnets – Choose two private subnets for the directory servers. The two subnets must be in different Availability Zones. Make a note of the VPC and subnet IDs for use as CloudFormation input parameters. In the following example, the Availability Zones are us-east-1a and us-east-1c.
      Screenshot of the VPC details to provide
  5. Review the directory information and make any necessary changes. When the information is correct, choose Create Simple AD.

It takes several minutes to create the directory. From the AWS Directory Service console , refresh the screen periodically and wait until the directory Status value changes to Active before continuing. Choose your Simple AD directory and note the two IP addresses in the DNS address section. You will enter them when you run the CloudFormation template later.

Note: Full administration of your Simple AD implementation is out of scope for this blog post. See the documentation to add users, groups, or instances to your directory. Also see the previous blog post, How to Manage Identities in Simple AD Directories.

2. Create a certificate

In the previous step, you created the Simple AD directory. Next, you will generate a self-signed SSL/TLS certificate using OpenSSL. You will use the certificate with ELB to secure the LDAPS endpoint. OpenSSL is a standard, open source library that supports a wide range of cryptographic functions, including the creation and signing of x509 certificates. You then import the certificate into ACM that is integrated with ELB.

  1. You must have a system with OpenSSL installed to complete this step. If you do not have OpenSSL, you can install it on Amazon Linux by running the command, sudo yum install openssl. If you do not have access to an Amazon Linux instance you can create one with SSH access enabled to proceed with this step. Run the command, openssl version, at the command line to see if you already have OpenSSL installed.
    [[email protected] ~]$ openssl version
    OpenSSL 1.0.1k-fips 8 Jan 2015

  2. Create a private key using the command, openssl genrsa command.
    [[email protected] tmp]$ openssl genrsa 2048 > privatekey.pem
    Generating RSA private key, 2048 bit long modulus
    ......................................................................................................................................................................+++
    ..........................+++
    e is 65537 (0x10001)

  3. Generate a certificate signing request (CSR) using the openssl req command. Provide the requested information for each field. The Common Name is the FQDN for your LDAPS endpoint (for example, ldap.corp.example.com). The Common Name must use the domain name you will later register in Route 53. You will encounter certificate errors if the names do not match.
    [[email protected] tmp]$ openssl req -new -key privatekey.pem -out server.csr
    You are about to be asked to enter information that will be incorporated into your certificate request.

  4. Use the openssl x509 command to sign the certificate. The following example uses the private key from the previous step (privatekey.pem) and the signing request (server.csr) to create a public certificate named server.crt that is valid for 365 days. This certificate must be updated within 365 days to avoid disruption of LDAPS functionality.
    [[email protected] tmp]$ openssl x509 -req -sha256 -days 365 -in server.csr -signkey privatekey.pem -out server.crt
    Signature ok
    subject=/C=XX/L=Default City/O=Default Company Ltd/CN=ldap.corp.example.com
    Getting Private key

  5. You should see three files: privatekey.pem, server.crt, and server.csr.
    [[email protected] tmp]$ ls
    privatekey.pem server.crt server.csr

    Restrict access to the private key.

    [[email protected] tmp]$ chmod 600 privatekey.pem

    Keep the private key and public certificate for later use. You can discard the signing request because you are using a self-signed certificate and not using a Certificate Authority. Always store the private key in a secure location and avoid adding it to your source code.

  6. In the ACM console, choose Import a certificate.
  7. Using your favorite Linux text editor, paste the contents of your server.crt file in the Certificate body box.
  8. Using your favorite Linux text editor, paste the contents of your privatekey.pem file in the Certificate private key box. For a self-signed certificate, you can leave the Certificate chain box blank.
  9. Choose Review and import. Confirm the information and choose Import.

3. Create the ELB and HAProxy layers by using the supplied CloudFormation template

Now that you have created your Simple AD directory and SSL/TLS certificate, you are ready to use the CloudFormation template to create the ELB and HAProxy layers.

  1. Load the supplied CloudFormation template to deploy an internal ELB and two HAProxy EC2 instances into a fixed Auto Scaling group. After you load the template, provide the following input parameters. Note: You can find the parameters relating to your Simple AD from the directory details page by choosing your Simple AD in the Directory Service console.
Input parameter Input parameter description
HAProxyInstanceSize The EC2 instance size for HAProxy servers. The default size is t2.micro and can scale up for large Simple AD environments.
MyKeyPair The SSH key pair for EC2 instances. If you do not have an existing key pair, you must create one.
VPCId The target VPC for this solution. Must be in the VPC where you deployed Simple AD and is available in your Simple AD directory details page.
SubnetId1 The Simple AD primary subnet. This information is available in your Simple AD directory details page.
SubnetId2 The Simple AD secondary subnet. This information is available in your Simple AD directory details page.
MyTrustedNetwork Trusted network Classless Inter-Domain Routing (CIDR) to allow connections to the LDAPS endpoint. For example, use the VPC CIDR to allow clients in the VPC to connect.
SimpleADPriIP The primary Simple AD Server IP. This information is available in your Simple AD directory details page.
SimpleADSecIP The secondary Simple AD Server IP. This information is available in your Simple AD directory details page.
LDAPSCertificateARN The Amazon Resource Name (ARN) for the SSL certificate. This information is available in the ACM console.
  1. Enter the input parameters and choose Next.
  2. On the Options page, accept the defaults and choose Next.
  3. On the Review page, confirm the details and choose Create. The stack will be created in approximately 5 minutes.

4. Create a Route 53 record

The next step is to create a Route 53 record in your private hosted zone so that clients can resolve your LDAPS endpoint.

  1. If you do not have an existing DNS domain for use with LDAP, create a private hosted zone and associate it with your VPC. The hosted zone name should be consistent with your Simple AD (for example, corp.example.com).
  2. When the CloudFormation stack is in CREATE_COMPLETE status, locate the value of the LDAPSURL on the Outputs tab of the stack. Copy this value for use in the next step.
  3. On the Route 53 console, choose Hosted Zones and then choose the zone you used for the Common Name box for your self-signed certificate. Choose Create Record Set and enter the following information:
    1. Name – The label of the record (such as ldap).
    2. Type – Leave as A – IPv4 address.
    3. Alias – Choose Yes.
    4. Alias Target – Paste the value of the LDAPSURL on the Outputs tab of the stack.
  4. Leave the defaults for Routing Policy and Evaluate Target Health, and choose Create.
    Screenshot of finishing the creation of the Route 53 record

5. Test LDAPS access using an Amazon Linux client

At this point, you have configured your LDAPS endpoint and now you can test it from an Amazon Linux client.

  1. Create an Amazon Linux instance with SSH access enabled to test the solution. Launch the instance into one of the public subnets in your VPC. Make sure the IP assigned to the instance is in the trusted IP range you specified in the CloudFormation parameter MyTrustedNetwork in Step 3.b.
  2. SSH into the instance and complete the following steps to verify access.
    1. Install the openldap-clients package and any required dependencies:
      sudo yum install -y openldap-clients.
    2. Add the server.crt file to the /etc/openldap/certs/ directory so that the LDAPS client will trust your SSL/TLS certificate. You can copy the file using Secure Copy (SCP) or create it using a text editor.
    3. Edit the /etc/openldap/ldap.conf file and define the environment variables BASE, URI, and TLS_CACERT.
      • The value for BASE should match the configuration of the Simple AD directory name.
      • The value for URI should match your DNS alias.
      • The value for TLS_CACERT is the path to your public certificate.

Here is an example of the contents of the file.

BASE dc=corp,dc=example,dc=com
URI ldaps://ldap.corp.example.com
TLS_CACERT /etc/openldap/certs/server.crt

To test the solution, query the directory through the LDAPS endpoint, as shown in the following command. Replace corp.example.com with your domain name and use the Administrator password that you configured with the Simple AD directory

$ ldapsearch -D "[email protected]corp.example.com" -W sAMAccountName=Administrator

You should see a response similar to the following response, which provides the directory information in LDAP Data Interchange Format (LDIF) for the administrator distinguished name (DN) from your Simple AD LDAP server.

# extended LDIF
#
# LDAPv3
# base <dc=corp,dc=example,dc=com> (default) with scope subtree
# filter: sAMAccountName=Administrator
# requesting: ALL
#

# Administrator, Users, corp.example.com
dn: CN=Administrator,CN=Users,DC=corp,DC=example,DC=com
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: user
description: Built-in account for administering the computer/domain
instanceType: 4
whenCreated: 20170721123204.0Z
uSNCreated: 3223
name: Administrator
objectGUID:: l3h0HIiKO0a/ShL4yVK/vw==
userAccountControl: 512
…

You can now use the LDAPS endpoint for directory operations and authentication within your environment. If you would like to learn more about how to interact with your LDAPS endpoint within a Linux environment, here are a few resources to get started:

Troubleshooting

If you receive an error such as the following error when issuing the ldapsearch command, there are a few things you can do to help identify issues.

ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)
  • You might be able to obtain additional error details by adding the -d1 debug flag to the ldapsearch command in the previous section.
    $ ldapsearch -D "[email protected]" -W sAMAccountName=Administrator –d1

  • Verify that the parameters in ldap.conf match your configured LDAPS URI endpoint and that all parameters can be resolved by DNS. You can use the following dig command, substituting your configured endpoint DNS name.
    $ dig ldap.corp.example.com

  • Confirm that the client instance from which you are connecting is in the CIDR range of the CloudFormation parameter, MyTrustedNetwork.
  • Confirm that the path to your public SSL/TLS certificate configured in ldap.conf as TLS_CAERT is correct. You configured this in Step 5.b.3. You can check your SSL/TLS connection with the command, substituting your configured endpoint DNS name for the string after –connect.
    $ echo -n | openssl s_client -connect ldap.corp.example.com:636

  • Verify that your HAProxy instances have the status InService in the EC2 console: Choose Load Balancers under Load Balancing in the navigation pane, highlight your LDAPS load balancer, and then choose the Instances

Conclusion

You can use ELB and HAProxy to provide an LDAPS endpoint for Simple AD and transport sensitive authentication information over untrusted networks. You can explore using LDAPS to authenticate SSH users or integrate with other software solutions that support LDAP authentication. This solution’s CloudFormation template is available on GitHub.

If you have comments about this post, submit them in the “Comments” section below. If you have questions about or issues implementing this solution, start a new thread on the Directory Service forum.

– Cameron and Jeff

Analyzing Salesforce Data with Amazon QuickSight

Post Syndicated from David McAmis original https://aws.amazon.com/blogs/big-data/analyzing-salesforce-data-with-amazon-quicksight/

Salesforce Sales Cloud is a powerful platform for managing customer data. One of the key functions that the platform provides is the ability to track customer opportunities. Opportunities in Salesforce are used to track revenue, sales pipelines, and other activities from the very first contact with a potential customer to a closed sale.

Amazon QuickSight is a rich data visualization tool that provides the ability to connect to Salesforce data and use it as a data source for creating analyses, stories, and dashboards  and easily share them with others in the organization. This post focuses on how to connect to Salesforce as a data source and create a useful opportunity dashboard, incorporating Amazon QuickSight features like relative date filters, Key Performance Indicator (KPI) charts, and more.

Walkthrough

In this post, you walk through the following tasks:

  • Creating a new data set based on Salesforce data
  • Creating your analysis and adding visuals
  • Creating an Amazon QuickSight dashboard
  • Working with filters

Note: For this walkthrough, I am using my own Salesforce.com Developer Edition account. You can sign up for your own free developer account at https://developer.salesforce.com/.

Creating a new Amazon QuickSight data set based on Salesforce data

To start, you need to create a new Amazon QuickSight data set. Sign in to Amazon QuickSight at https://quicksight.aws using the link from the home page. Enter your Amazon QuickSight account name and choose Continue. Next, enter your Email address or user name and password, then choose Sign In.

On the Amazon QuickSight start page, choose Manage Data, which takes you to a list of your data sets. Choose New Data Set, and choose Salesforce as your data source. Enter a data source name—in this example, I called mine “SFDC Opportunity.” Choose Create Data Source to open the Salesforce authentication page, where you can enter your Salesforce user name and password.

After you are authenticated to Salesforce, you are presented with a drop-down list that lets you select data from Reports or Objects. For this tutorial, choose Object. Scroll down in the list to choose the Opportunity object, and then choose Select.

To finish creating your data set, choose Visualize to go to where you can create a new Amazon QuickSight analysis from this data.

Creating your analysis and adding visuals

Now that you have acquired your data, it’s time to start working with your analysis. In Amazon Quicksight, an analysis is a container for a set of related visual stories. When you chose Visualize, a new analysis was created for you. This is where you start to create the visuals (charts, graphs, etc.) that will be the building blocks for your dashboard.

In Amazon QuickSight, Salesforce objects look like database tables. In the analysis that you just created, you can see the columns in the Fields list for the Opportunity object.

The Opportunity object in Salesforce has a number of default fields. Salesforce administrators can extend this object by adding other custom fields as required—these custom fields are usually marked with a “_c” at the end.

In the Fields List, you can see that Amazon QuickSight has divided the fields into Dimensions and Measures.  You use these to create your visualizations and dashboard. For this particular dashboard, you create five different visuals to display the data in a few different ways.

Opportunity by Stage

For the first visualization, you create a horizontal bar chart showing “Opportunity by Stage”. In the Fields List, choose the StageName dimension and the ExpectedRevenue measure. By default, this should create a horizontal bar chart for you, as shown in the following image.

Notice that this chart includes the Closed Won category, which we aren’t interested in showing. Choose the bar for Closed Won, and in the pop-up menu, choose Exclude Closed Won. This filters the chart to show only opportunities that are in progress.

It’s important to note that for this dashboard, we only want to show the opportunities that are not Closed Won. So in the menu bar on the left side, choose Filter.

By default, the filter that you just created was only applied to a single visualization. To change this, choose the filter, and then choose All Visuals from the drop-down list. This applies the filter to all visuals in the analysis.

To finish, select the chart title and rename the chart to Opportunity by Stage.

Opportunity by Month

Next, you need to create a new visual to show “Opportunity by Month.” You use a vertical bar chart to display the data. On the Amazon QuickSight toolbar, choose Add, and then choose Add visual. For this visual, choose CloseDate from the dimensions and ExpectedRevenue from the measures.

Using the Visual Types menu, change the chart type to a Vertical Bar Chart. By default, the chart displays the revenue by year, but we want to break it down a bit further. Choose Field Wells, and using the CloseDate drop-down menu, change the Aggregate to Month.

With the change to a monthly aggregate, your chart should look something like the following:

Select the chart title and rename the chart to Opportunity by Month.

Expected Revenue

When working with Salesforce opportunities, there are two measures that are important to most sales managers—the first is the total amount associated with the opportunity, and the second is what the actual expected revenue will be. For the next visual, you use the KPI chart to display these measures.

Choose Add on the Amazon QuickSight toolbar, and then choose Add visual. From the measures, choose ExpectedRevenue, and then Amount. To change your visualization, go to the Visual Types menu and choose the Key Performance Indicator (KPI). Your visualization should change and be similar to the following:

Select the chart title and rename the chart to Expected Revenue.

Opportunity by Lead Source

Next, you need to look at where the opportunity actually came from. This helps your dashboard users understand where the leads are being generated from and their value to the business. For this visual, you use a Horizontal Bar Chart.

On the Amazon QuickSight toolbar, choose Add, and then choose Add visual. From the measures, choose Amount, and for the dimensions, choose LeadSource. To change your visualization, go to the Visual Types menu and choose the Horizontal Bar Chart. Your visualization should change and be similar to the following:

Note: If you can’t read the chart labels for the bars, grab the axis line and drag to resize.

Select the chart title and rename the chart to Opportunity by Lead Source.

Expected Revenue vs. Opportunity Amount

For the last visual, you look at the individual opportunities and how they contribute to the total pipeline. A tree map is a specialized chart type that lets your dashboard users see how each opportunity amount contributes to the whole.  Additionally, you can highlight if there is a difference between the Expected Revenue and the Amount by sizing the marks by the Amount and coloring them by the Expected Amount.

On the Amazon QuickSight toolbar, choose Add, and then choose Add visual. From the measures, choose ExpectedRevenue and Amount. From the dimensions, choose Name. To change your visualization, go to the Visual Types menu and choose the Tree Map. Your visualization should change and be similar to the following:

Select the chart title and rename the chart to Expected Revenue vs Opportunity Amount.

Creating an Amazon QuickSight dashboard

Now that your visuals are created, it’s time to do the fun part—actually putting your Amazon QuickSight dashboard together. To create a dashboard, resize and position your visuals on the page, using the following layout:

To resize a visual, grab the handle in the lower-right corner and drag it to the height and width that you want.

To move your visual, use the grab bar at the top of the visual, as shown here:

When you are done resizing your visuals, your canvas should look something like this:

To create a dashboard, choose Share in the Amazon QuickSight toolbar. Then choose Create Dashboard. For this dashboard, give it a name of SFDC Opportunity Dashboard, and choose Create Dashboard. You are prompted to enter the email address or user name of the users you want to share this dashboard with.

Because we are just concentrating on the design at the moment, you can choose Cancel and share your dashboard later using the Share button on the dashboard toolbar.

Working with filters

There is one more feature that you can use when viewing your dashboard to make it even more useful. Earlier, when you were working with the Analysis, you added a filter to remove any opportunities that were tagged as Closed Won. Now, as you are viewing the dashboard, you add a filter that you can use to filter on a relative date.

This feature in Amazon QuickSight allows you to choose a time period (years, quarters, months, weeks, etc.) and then select from a list of relative time periods. For example, if you choose Year, you could set the filter options to Previous Year, This Year, Year to Date, or Last N Years.

This is especially handy for a Salesforce Opportunity dashboard, as you might want to filter the data using the Close Date field to see when the opportunity is actually set to close.

To create a relative date filter, choose Filter on the toolbar. Choose the filter icon, and then choose CloseDate, as shown in the following image:

At the top of the Edit Filter pane, change the drop-down list to apply the filter to All Visuals. The default filter type is Time Range, so use the drop-down list to change the filter type to Relative Dates.  For the time period, choose Quarters. To view all the current opportunities in your dashboard, choose the option for This Quarter, and choose Apply.

With the date filter in place, you have the final component for your dashboard, which should look something like the following example:

It’s important to note that at this point, you have added the filter when viewing the dashboard. If you think this is something that other users might want to do, you can go back to your Amazon QuickSight Analysis and add the filter there—that way it will be available for all dashboard users.

Summary

In this post, you learned how to connect to Salesforce data and create a basic dashboard. You can apply the same techniques to create analyses and dashboards from all different types of Salesforce data and objects. Whether you want to analyze your Salesforce account demographics or where your leads are coming from, or evaluate any other data stored in Salesforce, Amazon QuickSight helps you quickly connect to and visualize your data with only a few clicks.

 


Additional Reading

Learn how to visualize Amazon S3 analytics data with Amazon QuickSight!


About the Author

David McAmis is a Big Data & Analytics Consultant with Amazon Web Services. He works with customers to develop scalable platforms to gather, process and analyze data on AWS.

 

 

 

 

Announcing the Reputation Dashboard

Post Syndicated from Brent Meyer original https://aws.amazon.com/blogs/ses/announcing-the-reputation-dashboard/

The Amazon SES team is pleased to announce the addition of a reputation dashboard to the Amazon SES console. This new feature helps you track issues that could impact the sender reputation of your Amazon SES account.

What information does the reputation dashboard provide?

Amazon SES users must maintain bounce and complaint rates below a certain threshold. We put these rules in place to protect the sender reputations of all Amazon SES users, and to prevent Amazon SES from being used to deliver spam or malicious content. Users with very high rates of bounces or complaints may be put on probation. If the bounce or complaint rates are not within acceptable limits by the end of the probation period, these accounts may be shut down completely.

Previous versions of Amazon SES provided basic sending metrics, including information about bounces and complaints. However, the bounce and complaint metrics in this dashboard only included information for the past few days of email sent from your account, as opposed to an overall rate.

The new reputation dashboard provides overall bounce and complaint rates for your entire account. This enables you to more closely monitor the health of your account and adjust your email sending practices as needed.

Can’t I just calculate these values myself?

Because each Amazon SES account sends different volumes of email at different rates, we do not calculate bounce and complaint rates based on a fixed time period. Instead, we use a representative volume of email. This representative volume is the basis for the bounce and complaint rate calculations.

Why do we use representative volume in our calculations? Let’s imagine that you sent 1,000 emails one week, and 5 of them bounced. If we only considered a week of email sending, your metrics look good. Now imagine that the next week you only sent 5 emails, and one of them bounced. Suddenly, your bounce rate jumps from half a percent to 20%, and your account is automatically placed on probation. This example may be an extreme case, but it illustrates the reason that we don’t use fixed time periods when calculating bounce and complaint rates.

When you open the new reputation dashboard, you will see bounce and complaint rates calculated using the representative volume for your account. We automatically recalculate these rates every time you send email through Amazon SES.

What else can I do with these metrics?

The Bounce and Complaint Rate metrics in the reputation dashboard are automatically sent to Amazon CloudWatch. You can use CloudWatch to create dashboards that track your bounce and complaint rates over time, and to create alarms that send you notifications when these metrics cross certain thresholds. To learn more, see Creating Reputation Monitoring Alarms Using CloudWatch in the Amazon SES Developer Guide.

How can I see the reputation dashboard?

The reputation dashboard is now available to all Amazon SES users. To view the reputation dashboard, sign in to the Amazon SES console. On the left navigation menu, choose Reputation Dashboard. For more information, see Monitoring Your Sender Reputation in the Amazon SES Developer Guide.

We hope you find the information in the reputation dashboard to be useful in managing your email sending programs and campaigns. If you have any questions or comments, please leave a comment on this post, or let us know in the Amazon SES forum.

From Data Lake to Data Warehouse: Enhancing Customer 360 with Amazon Redshift Spectrum

Post Syndicated from Dylan Tong original https://aws.amazon.com/blogs/big-data/from-data-lake-to-data-warehouse-enhancing-customer-360-with-amazon-redshift-spectrum/

Achieving a 360o-view of your customer has become increasingly challenging as companies embrace omni-channel strategies, engaging customers across websites, mobile, call centers, social media, physical sites, and beyond. The promise of a web where online and physical worlds blend makes understanding your customers more challenging, but also more important. Businesses that are successful in this medium have a significant competitive advantage.

The big data challenge requires the management of data at high velocity and volume. Many customers have identified Amazon S3 as a great data lake solution that removes the complexities of managing a highly durable, fault tolerant data lake infrastructure at scale and economically.

AWS data services substantially lessen the heavy lifting of adopting technologies, allowing you to spend more time on what matters most—gaining a better understanding of customers to elevate your business. In this post, I show how a recent Amazon Redshift innovation, Redshift Spectrum, can enhance a customer 360 initiative.

Customer 360 solution

A successful customer 360 view benefits from using a variety of technologies to deliver different forms of insights. These could range from real-time analysis of streaming data from wearable devices and mobile interactions to historical analysis that requires interactive, on demand queries on billions of transactions. In some cases, insights can only be inferred through AI via deep learning. Finally, the value of your customer data and insights can’t be fully realized until it is operationalized at scale—readily accessible by fleets of applications. Companies are leveraging AWS for the breadth of services that cover these domains, to drive their data strategy.

A number of AWS customers stream data from various sources into a S3 data lake through Amazon Kinesis. They use Kinesis and technologies in the Hadoop ecosystem like Spark running on Amazon EMR to enrich this data. High-value data is loaded into an Amazon Redshift data warehouse, which allows users to analyze and interact with data through a choice of client tools. Redshift Spectrum expands on this analytics platform by enabling Amazon Redshift to blend and analyze data beyond the data warehouse and across a data lake.

The following diagram illustrates the workflow for such a solution.

This solution delivers value by:

  • Reducing complexity and time to value to deeper insights. For instance, an existing data model in Amazon Redshift may provide insights across dimensions such as customer, geography, time, and product on metrics from sales and financial systems. Down the road, you may gain access to streaming data sources like customer-care call logs and website activity that you want to blend in with the sales data on the same dimensions to understand how web and call center experiences maybe correlated with sales performance. Redshift Spectrum can join these dimensions in Amazon Redshift with data in S3 to allow you to quickly gain new insights, and avoid the slow and more expensive alternative of fully integrating these sources with your data warehouse.
  • Providing an additional avenue for optimizing costs and performance. In cases like call logs and clickstream data where volumes could be many TBs to PBs, storing the data exclusively in S3 yields significant cost savings. Interactive analysis on massive datasets may now be economically viable in cases where data was previously analyzed periodically through static reports generated by inexpensive batch processes. In some cases, you can improve the user experience while simultaneously lowering costs. Spectrum is powered by a large-scale infrastructure external to your Amazon Redshift cluster, and excels at scanning and aggregating large volumes of data. For instance, your analysts maybe performing data discovery on customer interactions across millions of consumers over years of data across various channels. On this large dataset, certain queries could be slow if you didn’t have a large Amazon Redshift cluster. Alternatively, you could use Redshift Spectrum to achieve a better user experience with a smaller cluster.

Proof of concept walkthrough

To make evaluation easier for you, I’ve conducted a Redshift Spectrum proof-of-concept (PoC) for the customer 360 use case. For those who want to replicate the PoC, the instructions, AWS CloudFormation templates, and public data sets are available in the GitHub repository.

The remainder of this post is a journey through the project, observing best practices in action, and learning how you can achieve business value. The walkthrough involves:

  • An analysis of performance data from the PoC environment involving queries that demonstrate blending and analysis of data across Amazon Redshift and S3. Observe that great results are achievable at scale.
  • Guidance by example on query tuning, design, and data preparation to illustrate the optimization process. This includes tuning a query that combines clickstream data in S3 with customer and time dimensions in Amazon Redshift, and aggregates ~1.9 B out of 3.7 B+ records in under 10 seconds with a small cluster!
  • Guidance and measurements to help assess deciding between two options: accessing and analyzing data exclusively in Amazon Redshift, or using Redshift Spectrum to access data left in S3.

Stream ingestion and enrichment

The focus of this post isn’t stream ingestion and enrichment on Kinesis and EMR, but be mindful of performance best practices on S3 to ensure good streaming and query performance:

  • Use random object keys: The data files provided for this project are prefixed with SHA-256 hashes to prevent hot partitions. This is important to ensure that optimal request rates to support PUT requests from the incoming stream in addition to certain queries from large Amazon Redshift clusters that could send a large number of parallel GET requests.
  • Micro-batch your data stream: S3 isn’t optimized for small random write workloads. Your datasets should be micro-batched into large files. For instance, the “parquet-1” dataset provided batches >7 million records per file. The optimal file size for Redshift Spectrum is usually in the 100 MB to 1 GB range.

If you have an edge case that may pose scalability challenges, AWS would love to hear about it. For further guidance, talk to your solutions architect.

Environment

The project consists of the following environment:

  • Amazon Redshift cluster: 4 X dc1.large
  • Data:
    • Time and customer dimension tables are stored on all Amazon Redshift nodes (ALL distribution style):
      • The data originates from the DWDATE and CUSTOMER tables in the Star Schema Benchmark
      • The customer table contains attributes for 3 million customers.
      • The time data is at the day-level granularity, and spans 7 years, from the start of 1992 to the end of 1998.
    • The clickstream data is stored in an S3 bucket, and serves as a fact table.
      • Various copies of this dataset in CSV and Parquet format have been provided, for reasons to be discussed later.
      • The data is a modified version of the uservisits dataset from AMPLab’s Big Data Benchmark, which was generated by Intel’s Hadoop benchmark tools.
      • Changes were minimal, so that existing test harnesses for this test can be adapted:
        • Increased the 751,754,869-row dataset 5X to 3,758,774,345 rows.
        • Added surrogate keys to support joins with customer and time dimensions. These keys were distributed evenly across the entire dataset to represents user visits from six customers over seven years.
        • Values for the visitDate column were replaced to align with the 7-year timeframe, and the added time surrogate key.

Queries across the data lake and data warehouse 

Imagine a scenario where a business analyst plans to analyze clickstream metrics like ad revenue over time and by customer, market segment and more. The example below is a query that achieves this effect: 

The query part highlighted in red retrieves clickstream data in S3, and joins the data with the time and customer dimension tables in Amazon Redshift through the part highlighted in blue. The query returns the total ad revenue for three customers over the last three months, along with info on their respective market segment.

Unfortunately, this query takes around three minutes to run, and doesn’t enable the interactive experience that you want. However, there’s a number of performance optimizations that you can implement to achieve the desired performance.

Performance analysis

Two key utilities provide visibility into Redshift Spectrum:

  • EXPLAIN
    Provides the query execution plan, which includes info around what processing is pushed down to Redshift Spectrum. Steps in the plan that include the prefix S3 are executed on Redshift Spectrum. For instance, the plan for the previous query has the step “S3 Seq Scan clickstream.uservisits_csv10”, indicating that Redshift Spectrum performs a scan on S3 as part of the query execution.
  • SVL_S3QUERY_SUMMARY
    Statistics for Redshift Spectrum queries are stored in this table. While the execution plan presents cost estimates, this table stores actual statistics for past query runs.

You can get the statistics of your last query by inspecting the SVL_S3QUERY_SUMMARY table with the condition (query = pg_last_query_id()). Inspecting the previous query reveals that the entire dataset of nearly 3.8 billion rows was scanned to retrieve less than 66.3 million rows. Improving scan selectivity in your query could yield substantial performance improvements.

Partitioning

Partitioning is a key means to improving scan efficiency. In your environment, the data and tables have already been organized, and configured to support partitions. For more information, see the PoC project setup instructions. The clickstream table was defined as:

CREATE EXTERNAL TABLE clickstream.uservisits_csv10
…
PARTITIONED BY(customer int4, visitYearMonth int4)

The entire 3.8 billion-row dataset is organized as a collection of large files where each file contains data exclusive to a particular customer and month in a year. This allows you to partition your data into logical subsets by customer and year/month. With partitions, the query engine can target a subset of files:

  • Only for specific customers
  • Only data for specific months
  • A combination of specific customers and year/months

You can use partitions in your queries. Instead of joining your customer data on the surrogate customer key (that is, c.c_custkey = uv.custKey), the partition key “customer” should be used instead:

SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, SUM(uv.adRevenue)
…
ON c.c_custkey = uv.customer
…
ORDER BY c.c_name, c.c_mktsegment, uv.yearMonthKey  ASC

This query should run approximately twice as fast as the previous query. If you look at the statistics for this query in SVL_S3QUERY_SUMMARY, you see that only half the dataset was scanned. This is expected because your query is on three out of six customers on an evenly distributed dataset. However, the scan is still inefficient, and you can benefit from using your year/month partition key as well:

SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, SUM(uv.adRevenue)
…
ON c.c_custkey = uv.customer
…
ON uv.visitYearMonth = t.d_yearmonthnum
…
ORDER BY c.c_name, c.c_mktsegment, uv.visitYearMonth ASC

All joins between the tables are now using partitions. Upon reviewing the statistics for this query, you should observe that Redshift Spectrum scans and returns the exact number of rows, 66,270,117. If you run this query a few times, you should see execution time in the range of 8 seconds, which is a 22.5X improvement on your original query!

Predicate pushdown and storage optimizations 

Previously, I mentioned that Redshift Spectrum performs processing through large-scale infrastructure external to your Amazon Redshift cluster. It is optimized for performing large scans and aggregations on S3. In fact, Redshift Spectrum may even out-perform a medium size Amazon Redshift cluster on these types of workloads with the proper optimizations. There are two important variables to consider for optimizing large scans and aggregations:

  • File size and count. As a general rule, use files 100 MB-1 GB in size, as Redshift Spectrum and S3 are optimized for reading this object size. However, the number of files operating on a query is directly correlated with the parallelism achievable by a query. There is an inverse relationship between file size and count: the bigger the files, the fewer files there are for the same dataset. Consequently, there is a trade-off between optimizing for object read performance, and the amount of parallelism achievable on a particular query. Large files are best for large scans as the query likely operates on sufficiently large number of files. For queries that are more selective and for which fewer files are operating, you may find that smaller files allow for more parallelism.
  • Data format. Redshift Spectrum supports various data formats. Columnar formats like Parquet can sometimes lead to substantial performance benefits by providing compression and more efficient I/O for certain workloads. Generally, format types like Parquet should be used for query workloads involving large scans, and high attribute selectivity. Again, there are trade-offs as formats like Parquet require more compute power to process than plaintext. For queries on smaller subsets of data, the I/O efficiency benefit of Parquet is diminished. At some point, Parquet may perform the same or slower than plaintext. Latency, compression rates, and the trade-off between user experience and cost should drive your decision.

To help illustrate how Redshift Spectrum performs on these large aggregation workloads, run a basic query that aggregates the entire ~3.7 billion record dataset on Redshift Spectrum, and compared that with running the query exclusively on Amazon Redshift:

SELECT uv.custKey, COUNT(uv.custKey)
FROM <your clickstream table> as uv
GROUP BY uv.custKey
ORDER BY uv.custKey ASC

For the Amazon Redshift test case, the clickstream data is loaded, and distributed evenly across all nodes (even distribution style) with optimal column compression encodings prescribed by the Amazon Redshift’s ANALYZE command.

The Redshift Spectrum test case uses a Parquet data format with each file containing all the data for a particular customer in a month. This results in files mostly in the range of 220-280 MB, and in effect, is the largest file size for this partitioning scheme. If you run tests with the other datasets provided, you see that this data format and size is optimal and out-performs others by ~60X. 

Performance differences will vary depending on the scenario. The important takeaway is to understand the testing strategy and the workload characteristics where Redshift Spectrum is likely to yield performance benefits. 

The following chart compares the query execution time for the two scenarios. The results indicate that you would have to pay for 12 X DC1.Large nodes to get performance comparable to using a small Amazon Redshift cluster that leverages Redshift Spectrum. 

Chart showing simple aggregation on ~3.7 billion records

So you’ve validated that Spectrum excels at performing large aggregations. Could you benefit by pushing more work down to Redshift Spectrum in your original query? It turns out that you can, by making the following modification:

The clickstream data is stored at a day-level granularity for each customer while your query rolls up the data to the month level per customer. In the earlier query that uses the day/month partition key, you optimized the query so that it only scans and retrieves the data required, but the day level data is still sent back to your Amazon Redshift cluster for joining and aggregation. The query shown here pushes aggregation work down to Redshift Spectrum as indicated by the query plan:

In this query, Redshift Spectrum aggregates the clickstream data to the month level before it is returned to the Amazon Redshift cluster and joined with the dimension tables. This query should complete in about 4 seconds, which is roughly twice as fast as only using the partition key. The speed increase is evident upon reviewing the SVL_S3QUERY_SUMMARY table:

  • Bytes scanned is 21.6X less because of the Parquet data format.
  • Only 90 records are returned back to the Amazon Redshift cluster as a result of the push-down, instead of ~66.2 million, leading to substantially less join overhead, and about 530 MB less data sent back to your cluster.
  • No adverse change in average parallelism.

Assessing the value of Amazon Redshift vs. Redshift Spectrum

At this point, you might be asking yourself, why would I ever not use Redshift Spectrum? Well, you still get additional value for your money by loading data into Amazon Redshift, and querying in Amazon Redshift vs. querying S3.

In fact, it turns out that the last version of our query runs even faster when executed exclusively in native Amazon Redshift, as shown in the following chart:

Chart comparing Amazon Redshift vs. Redshift Spectrum with pushdown aggregation over 3 months of data

As a general rule, queries that aren’t dominated by I/O and which involve multiple joins are better optimized in native Amazon Redshift. For instance, the performance difference between running the partition key query entirely in Amazon Redshift versus with Redshift Spectrum is twice as large as that that of the pushdown aggregation query, partly because the former case benefits more from better join performance.

Furthermore, the variability in latency in native Amazon Redshift is lower. For use cases where you have tight performance SLAs on queries, you may want to consider using Amazon Redshift exclusively to support those queries.

On the other hand, when you perform large scans, you could benefit from the best of both worlds: higher performance at lower cost. For instance, imagine that you wanted to enable your business analysts to interactively discover insights across a vast amount of historical data. In the example below, the pushdown aggregation query is modified to analyze seven years of data instead of three months:

SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, uv.totalRevenue
…
WHERE customer <= 3 and visitYearMonth >= 199201
… 
FROM dwdate WHERE d_yearmonthnum >= 199201) as t
…
ORDER BY c.c_name, c.c_mktsegment, uv.visitYearMonth ASC

This query requires scanning and aggregating nearly 1.9 billion records. As shown in the chart below, Redshift Spectrum substantially speeds up this query. A large Amazon Redshift cluster would have to be provisioned to support this use case. With the aid of Redshift Spectrum, you could use an existing small cluster, keep a single copy of your data in S3, and benefit from economical, durable storage while only paying for what you use via the pay per query pricing model.

Chart comparing Amazon Redshift vs. Redshift Spectrum with pushdown aggregation over 7 years of data

Summary

Redshift Spectrum lowers the time to value for deeper insights on customer data queries spanning the data lake and data warehouse. It can enable interactive analysis on datasets in cases that weren’t economically practical or technically feasible before.

There are cases where you can get the best of both worlds from Redshift Spectrum: higher performance at lower cost. However, there are still latency-sensitive use cases where you may want native Amazon Redshift performance. For more best practice tips, see the 10 Best Practices for Amazon Redshift post.

Please visit the Amazon Redshift Spectrum PoC Environment Github page. If you have questions or suggestions, please comment below.

 


Additional Reading

Learn more about how Amazon Redshift Spectrum extends data warehousing out to exabytes – no loading required.


About the Author

Dylan Tong is an Enterprise Solutions Architect at AWS. He works with customers to help drive their success on the AWS platform through thought leadership and guidance on designing well architected solutions. He has spent most of his career building on his expertise in data management and analytics by working for leaders and innovators in the space.

 

 

Weekly roundup: Games, mostly

Post Syndicated from Eevee original https://eev.ee/dev/2017/08/22/weekly-roundup-games-mostly/

  • cc: I fixed an obscure timing issue and… well, that’s all, how exciting.

    I should really talk about this game more, but it’s big and I’m not the one designing it and I don’t have a good sense of how much we want to keep under wraps yet?

  • blog: I wrote a stream of consciousness about how Nazis are bad.

  • potluck: What? Yes! I worked on potluck a bit, believe it or not. I’ve decided to try procedurally generating the whole game — something I’ve wanted to do for a while, and a decision that has piqued my interest in potluck considerably. Step one was to clean up all my map code, which was entangled with parsing Tiled’s JSON format, to make it actually possible to generate a map. I finally did that and made an extremely basic proof of concept that just varies the floor height.

  • fox flux: The usual brief work on player sprites. The game has a lot of them.

  • gamedev: I made a video game with glip again! It was a birthday present for two of our friends, and it’s extremely specific to them and basically incomprehensible to anyone else, so I haven’t decided yet whether it’d be appropriate to release publicly. But we made something pretty coherent on a whim in two and a half days and that’s nice.

I’m currently working on veekun, which has finally progressed to the point that it has data appearing within the website! Hallelujah. I expect there’ll be plenty of stuff to clean up, but this is a tremendous leap forwards. I’ll be so glad to have this off my plate at last, argh.

An Invitation for CrashPlan Customers: Try Backblaze

Post Syndicated from Gleb Budman original https://www.backblaze.com/blog/crashplan-alternative-backup-solution/

Welcome CrashPlan Users
With news coming out this morning of CrashPlan exiting the consumer market, we know some of you may be considering which backup provider to call home. We welcome you to try us.

For over a decade, Backblaze has provided unlimited cloud backup for Windows and Macintosh computers at $5 per month (or $50 per year).

Backblaze is excellent if you’re looking for the cheapest online backup option that still offers serious file protection.” — Dann Berg, Tom’s Guide.

That’s it. Ready to make sure your data is safe? Try Backblaze for free — it’ll take you less than a minute and you don’t need a credit card to start protecting your data.

Our customers don’t have to choose between competing feature sets or hard to understand fine print. There are no extra charges and no limits on the size of your files — no matter how many videos you want to back up. And when we say unlimited, we mean unlimited; there are no restrictions on files, gigabytes, or restores. Customers also love the choices they have for getting their data back — web, mobile apps, and our free Restore by Mail option. We’re also the fastest to back up your data. While other services throttle your upload speeds, we want to get you protected as quickly as possible.

Backblaze vs. Carbonite

We know that CrashPlan is encouraging customers to look at Carbonite as an alternative. We would like to offer you another option: Backblaze. We cost less, we offer more, we store over 350 Petabytes of data, we have restored over 20 billion files, and customers in over 120 countries around the world trust us with their data.

Backblaze Carbonite Basic Carbonite Prime
Price per Computer $50/year $59.99/year $149.99/year
Back Up All User Data By Default – No Picking And Choosing Yes No No
Automatically Back Up Files Of Any Size, Including Videos Yes No Yes1
Back Up Multiple USB External Hard Drives Yes No No
Restore by Mail for Free Yes No No
Locate Computer Yes No No
Manage Families & Teams Yes No No
Protect Accounts Via Two Factor VerificationSMS & Authenticator Apps Yes No No
Protect Data Via Private Encryption Key Yes No No2
(1) All videos and files over 4GB require manual selection.  (2) Available on Windows Only

To get just some of the features offered by Backblaze for $50/year, you would need to purchase Carbonite Prime at $149.99/year.

Reminder: Sync is Not Backup

“Backblaze is my favorite online backup service, mostly because everything about it is so simple, especially its pricing and software.“ Tim Fisher — Lifewire: 22 Online Backup Services Reviewed

Of course, there are plenty of options in the marketplace. We encourage you to choose one to make sure you stay backed up. One thing we tell our own friends and family: sync is not backup.

If you’re considering using a sync service — Dropbox, Google Drive, OneDrive, iCloud, etc. — you should know that these services are not designed to back up all your data. Typically, they only sync data from a specific directory or folder. If the service detects a file was deleted from your sync folder, it also will delete it from their server, and you’re out of luck. In addition, most don’t support external drives and have tiered pricing that gets quite expensive.

Backblaze is the Simple, Reliable, and Affordable Choice for Unlimited Backup of All Your Data
People have trusted Backblaze to protect their digital photos, music, movies, and documents for the past 10 years. We look forward to doing the same for your valuable data.

Your CrashPlan service may not be getting shut off today. But there’s no reason to wait until your data is at risk. Try Backblaze for FREE today — all you need to do is pick an email/password and click download.

The post An Invitation for CrashPlan Customers: Try Backblaze appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Announcement: IPS code

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/08/announcement-ips-code.html

So after 20 years, IBM is killing off my BlackICE code created in April 1998. So it’s time that I rewrite it.

BlackICE was the first “inline” intrusion-detection system, aka. an “intrusion prevention system” or IPS. ISS purchased my company in 2001 and replaced their RealSecure engine with it, and later renamed it Proventia. Then IBM purchased ISS in 2006. Now, they are formally canceling the project and moving customers onto Cisco’s products, which are based on Snort.

So now is a good time to write a replacement. The reason is that BlackICE worked fundamentally differently than Snort, using protocol analysis rather than pattern-matching. In this way, it worked more like Bro than Snort. The biggest benefit of protocol-analysis is speed, making it many times faster than Snort. The second benefit is better detection ability, as I describe in this post on Heartbleed.

So my plan is to create a new project. I’ll be checking in the starter bits into GitHub starting a couple weeks from now. I need to figure out a new name for the project, so I don’t have to rip off a name from William Gibson like I did last time :).

Some notes:

  • Yes, it’ll be GNU open source. I’m a capitalist, so I’ll earn money like snort/nmap dual-licensing it, charging companies who don’t want to open-source their addons. All capitalists GNU license their code.
  • C, not Rust. Sorry, I’m going for extreme scalability. We’ll re-visit this decision later when looking at building protocol parsers.
  • It’ll be 95% compatible with Snort signatures. Their language definition leaves so much ambiguous it’ll be hard to be 100% compatible.
  • It’ll support Snort output as well, though really, Snort’s events suck.
  • Protocol parsers in Lua, so you can use it as a replacement for Bro, writing parsers to extract data you are interested in.
  • Protocol state machine parsers in C, like you see in my Masscan project for X.509.
  • First version IDS only. These days, “inline” means also being able to MitM the SSL stack, so I’m gong to have to think harder on that.
  • Mutli-core worker threads off PF_RING/DPDK/netmap receive queues. Should handle 10gbps, tracking 10 million concurrent connections, with quad-core CPU.
So if you want to contribute to the project, here’s what I need:
  • Requirements from people who work daily with IDS/IPS today. I need you to write up what your products do well that you really like. I need to you write up what they suck at that needs to be fixed. These need to be in some detail.
  • Testing environment to play with. This means having a small server plugged into a real-world link running at a minimum of several gigabits-per-second available for the next year. I’ll sign NDAs related to the data I might see on the network.
  • Coders. I’ll be doing the basic architecture, but protocol parsers, output plugins, etc. will need work. Code will be in C and Lua for the near term. Unfortunately, since I’m going to dual-license, I’ll need waivers before accepting pull requests.
Anyway, follow me on Twitter @erratarob if you want to contribute.

Announcing Dedicated IP Pools

Post Syndicated from Brent Meyer original https://aws.amazon.com/blogs/ses/announcing-dedicated-ip-pools/

The Amazon SES team is pleased to announce that you can now create groups of dedicated IP addresses, called dedicated IP pools, for your email sending activities.

Prior to the availability of this feature, if you leased several dedicated IP addresses to use with Amazon SES, there was no way to specify which dedicated IP address to use for a specific email. Dedicated IP pools solve this problem by allowing you to send emails from specific IP addresses.

This post includes information and procedures related to dedicated IP pools.

What are dedicated IP pools?

In order to understand dedicated IP pools, you should first be familiar with the concept of dedicated IP addresses. Customers who send large volumes of email will typically lease one or more dedicated IP addresses to use when sending mail from Amazon SES. To learn more, see our blog post about dedicated IP addresses.

If you lease several dedicated IP addresses for use with Amazon SES, you can organize these addresses into groups, called pools. You can then associate each pool with a configuration set. When you send an email that specifies a configuration set, that email will be sent from the IP addresses in the associated pool.

When should I use dedicated IP pools?

Dedicated IP pools are especially useful for customers who send several different types of email using Amazon SES. For example, if you use Amazon SES to send both marketing emails and transactional emails, you can create a pool for marketing emails and another for transactional emails.

By using dedicated IP pools, you can isolate the sender reputations for each of these types of communications. Using dedicated IP pools gives you complete control over the sender reputations of the dedicated IP addresses you lease from Amazon SES.

How do I create and use dedicated IP pools?

There are two basic steps for creating and using dedicated IP pools. First, create a dedicated IP pool in the Amazon SES console and associate it with a configuration set. Next, when you send email, be sure to specify the configuration set associated with the IP pool you want to use.

For step-by-step procedures, see Creating Dedicated IP Pools in the Amazon SES Developer Guide.

Will my email sending process change?

If you do not use dedicated IP addresses with Amazon SES, then your email sending process will not change.

If you use dedicated IP pools, your email sending process may change slightly. In most cases, you will need to specify a configuration set in the emails you send. To learn more about using configuration sets, see Specifying a Configuration Set When You Send Email in the Amazon SES Developer Guide.

Any dedicated IP addresses that you lease that are not part of a dedicated IP pool will automatically be added to a default pool. If you send email without specifying a configuration set that is associated with a pool, then that email will be sent from one of the addresses in the default pool.

Dedicated IP pools are now available in the following AWS Regions: us-west-2 (Oregon), us-east-1 (Virginia), and eu-west-1 (Ireland).

We hope you enjoy this feature. If you have any questions or comments, please leave a comment on this post, or let us know in the Amazon SES Forum.

What’s the Diff: Programs, Processes, and Threads

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/whats-the-diff-programs-processes-and-threads/

let's talk about Threads

How often have you heard the term threading in relation to a computer program, but you weren’t exactly sure what it meant? How about processes? You likely understand that a thread is somehow closely related to a program and a process, but if you’re not a computer science major, maybe that’s as far as your understanding goes.

Knowing what these terms mean is absolutely essential if you are a programmer, but an understanding of them also can be useful to the average computer user. Being able to look at and understand the Activity Monitor on the Macintosh, the Task Manager on Windows, or Top on Linux can help you troubleshoot which programs are causing problems on your computer, or whether you might need to install more memory to make your system run better.

Let’s take a few minutes to delve into the world of computer programs and sort out what these terms mean. We’ll simplify and generalize some of the ideas, but the general concepts we cover should help clarify the difference between the terms.

Programs

First of all, you probably are aware that a program is the code that is stored on your computer that is intended to fulfill a certain task. There are many types of programs, including programs that help your computer function and are part of the operating system, and other programs that fulfill a particular job. These task-specific programs are also known as “applications,” and can include programs such as word processing, web browsing, or emailing a message to another computer.

Program

Programs are typically stored on disk or in non-volatile memory in a form that can be executed by your computer. Prior to that, they are created using a programming language such as C, Lisp, Pascal, or many others using instructions that involve logic, data and device manipulation, recurrence, and user interaction. The end result is a text file of code that is compiled into binary form (1’s and 0’s) in order to run on the computer. Another type of program is called “interpreted,” and instead of being compiled in advance in order to run, is interpreted into executable code at the time it is run. Some common, typically interpreted programming languages, are Python, PHP, JavaScript, and Ruby.

The end result is the same, however, in that when a program is run, it is loaded into memory in binary form. The computer’s CPU (Central Processing Unit) understands only binary instructions, so that’s the form the program needs to be in when it runs.

Perhaps you’ve heard the programmer’s joke, “There are only 10 types of people in the world, those who understand binary, and those who don’t.”

Binary is the native language of computers because an electrical circuit at its basic level has two states, on or off, represented by a one or a zero. In the common numbering system we use every day, base 10, each digit position can be anything from 0 to 9. In base 2 (or binary), each position is either a 0 or a 1. (In a future blog post we might cover quantum computing, which goes beyond the concept of just 1’s and 0’s in computing.)

Decimal—Base 10 Binary—Base 2
0 0000
1 0001
2 0010
3 0011
4 0100
5 0101
6 0110
7 0111
8 1000
9 1001

How Processes Work

The program has been loaded into the computer’s memory in binary form. Now what?

An executing program needs more than just the binary code that tells the computer what to do. The program needs memory and various operating system resources that it needs in order to run. A “process” is what we call a program that has been loaded into memory along with all the resources it needs to operate. The “operating system” is the brains behind allocating all these resources, and comes in different flavors such as macOS, iOS, Microsoft Windows, Linux, and Android. The OS handles the task of managing the resources needed to turn your program into a running process.

Some essential resources every process needs are registers, a program counter, and a stack. The “registers” are data holding places that are part of the computer processor (CPU). A register may hold an instruction, a storage address, or other kind of data needed by the process. The “program counter,” also called the “instruction pointer,” keeps track of where a computer is in its program sequence. The “stack” is a data structure that stores information about the active subroutines of a computer program and is used as scratch space for the process. It is distinguished from dynamically allocated memory for the process that is known as “the heap.”

diagram of how processes work

There can be multiple instances of a single program, and each instance of that running program is a process. Each process has a separate memory address space, which means that a process runs independently and is isolated from other processes. It cannot directly access shared data in other processes. Switching from one process to another requires some time (relatively) for saving and loading registers, memory maps, and other resources.

This independence of processes is valuable because the operating system tries its best to isolate processes so that a problem with one process doesn’t corrupt or cause havoc with another process. You’ve undoubtedly run into the situation in which one application on your computer freezes or has a problem and you’ve been able to quit that program without affecting others.

How Threads Work

So, are you still with us? We finally made it to threads!

A thread is the unit of execution within a process. A process can have anywhere from just one thread to many threads.

Process vs. Thread

diagram of threads in a process over time

When a process starts, it is assigned memory and resources. Each thread in the process shares that memory and resources. In single-threaded processes, the process contains one thread. The process and the thread are one and the same, and there is only one thing happening.

In multithreaded processes, the process contains more than one thread, and the process is accomplishing a number of things at the same time (technically, it’s almost at the same time—read more on that in the “What about Parallelism and Concurrency?” section below).

diagram of single and multi-treaded process

We talked about the two types of memory available to a process or a thread, the stack and the heap. It is important to distinguish between these two types of process memory because each thread will have its own stack, but all the threads in a process will share the heap.

Threads are sometimes called lightweight processes because they have their own stack but can access shared data. Because threads share the same address space as the process and other threads within the process, the operational cost of communication between the threads is low, which is an advantage. The disadvantage is that a problem with one thread in a process will certainly affect other threads and the viability of the process itself.

Threads vs. Processes

So to review:

  1. The program starts out as a text file of programming code,
  2. The program is compiled or interpreted into binary form,
  3. The program is loaded into memory,
  4. The program becomes one or more running processes.
  5. Processes are typically independent of each other,
  6. While threads exist as the subset of a process.
  7. Threads can communicate with each other more easily than processes can,
  8. But threads are more vulnerable to problems caused by other threads in the same process.

Processes vs. Threads — Advantages and Disadvantages

Process Thread
Processes are heavyweight operations Threads are lighter weight operations
Each process has its own memory space Threads use the memory of the process they belong to
Inter-process communication is slow as processes have different memory addresses Inter-thread communication can be faster than inter-process communication because threads of the same process share memory with the process they belong to
Context switching between processes is more expensive Context switching between threads of the same process is less expensive
Processes don’t share memory with other processes Threads share memory with other threads of the same process

What about Concurrency and Parallelism?

A question you might ask is whether processes or threads can run at the same time. The answer is: it depends. On a system with multiple processors or CPU cores (as is common with modern processors), multiple processes or threads can be executed in parallel. On a single processor, though, it is not possible to have processes or threads truly executing at the same time. In this case, the CPU is shared among running processes or threads using a process scheduling algorithm that divides the CPU’s time and yields the illusion of parallel execution. The time given to each task is called a “time slice.” The switching back and forth between tasks happens so fast it is usually not perceptible. The terms parallelism (true operation at the same time) and concurrency (simulated operation at the same time), distinguish between the two type of real or approximate simultaneous operation.

diagram of concurrency and parallelism

Why Choose Process over Thread, or Thread over Process?

So, how would a programmer choose between a process and a thread when creating a program in which she wants to execute multiple tasks at the same time? We’ve covered some of the differences above, but let’s look at a real world example with a program that many of us use, Google Chrome.

When Google was designing the Chrome browser, they needed to decide how to handle the many different tasks that needed computer, communications, and network resources at the same time. Each browser window or tab communicates with multiple servers on the internet to retrieve text, programs, graphics, audio, video, and other resources, and renders that data for display and interaction with the user. In addition, the browser can open many windows, each with many tasks.

Google had to decide how to handle that separation of tasks. They chose to run each browser window in Chrome as a separate process rather than a thread or many threads, as is common with other browsers. Doing that brought Google a number of benefits. Running each window as a process protects the overall application from bugs and glitches in the rendering engine and restricts access from each rendering engine process to others and to the rest of the system. Isolating JavaScript programs in a process prevents them from running away with too much CPU time and memory, and making the entire browser non-responsive.

Google made the calculated trade-off with a multi-processing design as starting a new process for each browser window has a higher fixed cost in memory and resources than using threads. They were betting that their approach would end up with less memory bloat overall.

Using processes instead of threads provides better memory usage when memory gets low. An inactive window is treated as a lower priority by the operating system and becomes eligible to be swapped to disk when memory is needed for other processes, helping to keep the user-visible windows more responsive. If the windows were threaded, it would be more difficult to separate the used and unused memory as cleanly, wasting both memory and performance.

You can read more about Google’s design decisions on Google’s Chromium Blog or on the Chrome Introduction Comic.

The screen capture below shows the Google Chrome processes running on a MacBook Air with many tabs open. Some Chrome processes are using a fair amount of CPU time and resources, and some are using very little. You can see that each process also has many threads running as well.

activity monitor of Google Chrome

The Activity Monitor or Task Manager on your system can be a valuable ally in helping fine-tune your computer or troubleshooting problems. If your computer is running slowly, or a program or browser window isn’t responding for a while, you can check its status using the system monitor. Sometimes you’ll see a process marked as “Not Responding.” Try quitting that process and see if your system runs better. If an application is a memory hog, you might consider choosing a different application that will accomplish the same task.

Windows Task Manager view

Made it This Far?

We hope this Tron-like dive into the fascinating world of computer programs, processes, and threads has helped clear up some questions you might have had.

The next time your computer is running slowly or an application is acting up, you know your assignment. Fire up the system monitor and take a look under the hood to see what’s going on. You’re in charge now.

We love to hear from you

Are you still confused? Have questions? If so, please let us know in the comments. And feel free to suggest topics for future blog posts.

The post What’s the Diff: Programs, Processes, and Threads appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.