Tag Archives: groups

Welcome Victoria — Sales Development Representative

Post Syndicated from Yev original https://www.backblaze.com/blog/welcome-victoria-sales-development-representative/

Ever since we introduced our Groups feature, Backblaze for Business has been growing at a rapid rate! We’ve been staffing up in order to support the product and the newest addition to the sales team, Victoria, joins us as a Sales Development Representative! Let’s learn a bit more about Victoria, shall we?

What is your Backblaze Title?
Sales Development Representative.

Where are you originally from?
Harrisburg, North Carolina.

What attracted you to Backblaze?
The leaders and family-style culture.

What do you expect to learn while being at Backblaze?
How to sell, sell, sell!

Where else have you worked?
The North Carolina Autism Society, an ophthalmologist’s office, home health care, and another tech startup.

Where did you go to school?
The University of North Carolina Chapel Hill and Duke University’s Fuqua School of Business.

What’s your dream job?
Fighter pilot, professional snowboarder or killer whale trainer.

Favorite place you’ve traveled?
Hawaii and Banff.

Favorite hobby?
Basketball and cars.

Of what achievement are you most proud?
Missionary work and helping patients feel better.

Star Trek or Star Wars?
Neither, but probably Star Wars.

Coke or Pepsi?
Neither, bubble tea.

Favorite food?
Snow crab legs.

Why do you like certain things?
Because God made me that way.

Anything else you’d like you’d like to tell us?
I’m a germophobe, drink a lot of water and unfortunately, am introverted.

Being on the phones all day is a good way to build up those extroversion skills! Welcome to the team and we hope you enjoy learning how to sell, sell, sell!

The post Welcome Victoria — Sales Development Representative appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Facebook Privacy Fiasco Sees Congress Urged on Anti-Piracy Action

Post Syndicated from Andy original https://torrentfreak.com/facebook-privacy-fiasco-sees-congress-urged-on-anti-piracy-action-180420/

It has been a tumultuous few weeks for Facebook, and some would say quite rightly so. The company is a notorious harvester of personal information but last month’s Cambridge Analytica scandal really brought things to a head.

With Facebook co-founder and Chief Executive Officer Mark Zuckerberg in the midst of a PR nightmare, last Tuesday the entrepreneur appeared before the Senate. A day later he faced a grilling from lawmakers, answering questions concerning the social networking giant’s problems with user privacy and how it responds to breaches.

What practical measures Zuckerberg and his team will take to calm the storm are yet to unfold but the opportunity to broaden the attack on both Facebook and others in the user-generated content field is now being seized upon. Yes, privacy is the number one controversy at the moment but Facebook and others of its ilk need to step up and take responsibility for everything posted on their platforms.

That’s the argument presented by the American Federation of Musicians, the Content Creators Coalition, CreativeFuture, and the Independent Film & Television Alliance, who together represent more than 650 entertainment industry companies and 240,000 members. CreativeFuture alone represents more than 500 companies, including all the big Hollywood studios and major players in the music industry.

In letters sent to the Senate Committee on the Judiciary; the Senate Committee on Commerce, Science, and Transportation; and the House Energy and Commerce Committee, the coalitions urge Congress to not only ensure that Facebook gets its house in order, but that Google, Twitter, and similar platforms do so too.

The letters begin with calls to protect user data and tackle the menace of fake news but given the nature of the coalitions and their entertainment industry members, it’s no surprise to see where this is heading.

“In last week’s hearing, Mr. Zuckerberg stressed several times that Facebook must ‘take a broader view of our responsibility,’ acknowledging that it is ‘responsible for the content’ that appears on its service and must ‘take a more active view in policing the ecosystem’ it created,” the letter reads.

“While most content on Facebook is not produced by Facebook, they are the publisher and distributor of immense amounts of content to billions around the world. It is worth noting that a lot of that content is posted without the consent of the people who created it, including those in the creative industries we represent.”

The letter recalls Zuckerberg as characterizing Facebook’s failure to take a broader view of its responsibilities as a “big mistake” while noting he’s also promised change.

However, the entertainment groups contend that the way the company has conducted itself – and the manner in which many Silicon Valley companies conduct themselves – is supported and encouraged by safe harbors and legal immunities that absolve internet platforms of accountability.

“We agree that change needs to happen – but we must ask ourselves whether we can expect to see real change as long as these companies are allowed to continue to operate in a policy framework that prioritizes the growth of the internet over accountability and protects those that fail to act responsibly. We believe this question must be at the center of any action Congress takes in response to the recent failures,” the groups write.

But while the Facebook fiasco has provided the opportunity for criticism, CreativeFuture and its colleagues see the problem from a much broader perspective. They suck in companies like Google, which is also criticized for shirking its responsibilities, largely because the law doesn’t compel it to act any differently.

“Google, another major global platform that has long resisted meaningful accountability, also needs to step forward and endorse the broader view of responsibility expressed by Mr. Zuckerberg – as do many others,” they continue.

“The real problem is not Facebook, or Mark Zuckerberg, regardless of how sincerely he seeks to own the ‘mistakes’ that led to the hearing last week. The problem is endemic in a system that applies a different set of rules to the internet and fails to impose ordinary norms of accountability on businesses that are built around monetizing other people’s personal information and content.”

Noting that Congress has encouraged technology companies to prosper by using a “light hand” for the past several decades, the groups say their level of success now calls for a fresh approach and a heavier touch.

“Facebook and Google are grown-ups – and it is time they behaved that way. If they will not act, then it is up to you and your colleagues in the House to take action and not let these platforms’ abuses continue to pile up,” they conclude.

But with all that said, there is an interesting conflict that develops when presenting the solution to piracy in the context of a user privacy fiasco.

In the EU, many of the companies involved in the coalitions above are calling for pre-emptive filters to prevent allegedly infringing content being uploaded to Facebook and YouTube. That means that all user uploads to such platforms will have to be opened and scanned to see what they contain before they’re allowed online.

So, user privacy or pro-active anti-piracy filters? It might not be easy or even legal to achieve both.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Cloudflare Kicks Out Torrent Site For Abuse Reporting Interference

Post Syndicated from Ernesto original https://torrentfreak.com/cloudflare-kicks-out-torrent-site-for-abuse-reporting-interference-180420/

As one of the leading CDN and DDoS protection services, Cloudflare is used by millions of websites across the globe.

The company’s clients include billion dollar companies and national governments, but also personal blogs, and even pirate sites.

Copyright holders are not happy with the latter category and are pressuring Cloudflare to cut their ties with sites like The Pirate Bay, both in and out of court.

Cloudflare, however, maintains that it’s a neutral service provider. They forward copyright infringement notices to their customers, for example, but deny any liability for these sites.

Generally speaking, the company only disconnects a customer in response to a court order, as it did with Sci-Hub earlier this year. That’s why it came as a surprise when the anime torrent site NYAA.si was disconnected this week.

The site, which is a replacement for the original NYAA, has millions of users and is particularly popular in Japan. Without prior warning, it became unavailable for several hours this week, after Cloudflare removed it from its services. So what happened?

TorrentFreak spoke to the operator who said that the exact reason for the termination remains a mystery to him. He reached out to Cloudflare looking for answers, but the comany simply stated that it’s about “avoiding measures taken to avoid abuse complaints,” as can be seen below.

One of Cloudflare’s messages

The operator says he hasn’t done anything out of the ordinary and showed his willingness to resolve any possible issues. However, that hasn’t changed Cloudflare’s stance.

“We asked multiple times for clarification. We also expressed that we were willing to attempt to work with them on whatever the problem actually was, if they would explain what they even mean.

“Naturally, I have been stonewalled by them at every stage. I’ve contacted numerous persons at Cloudflare and nobody will talk about this,” NYAA’s operator adds.

TorrentFreak asked Cloudflare for more details and the company confirmed that the matter was related to interference with its abuse reporting systems, without providing further detail.

“We determined that the customer had taken steps specifically intended to interfere with and thwart the operation of our abuse reporting systems,” Cloudflare’s General Counsel Doug Kramer informed us.

Cloudflare’s statement suggests that the site took active steps to interfere with the abuse process. The company added that it can’t go into detail, but says that the reason for the termination was shared with the website owner.

The website owner, on the other hand, informs us that he has no clue what the exact problem is. NYAA.si occasionally swaps IP addresses and have recently set up some mirror domains, but these were all under the same account. So, he has no idea why that would interfere with any abuse reports.

“I’m honestly unsure of what we could have done that ‘circumvents” their abuse system,” NYAA’s operator says, adding that the only abuse reports received were copyright related.

It’s unlikely, however, that copyright takedown notices alone would warrant account termination, as most of the largest torrent sites use Cloudflare.

NYAA’s operator says he can do little more than speculate at the point. Some have hinted at a secret court order while Japan’s recent crackdown on manga and anime piracy also came to mind, all without a grain of evidence of course.

Whatever the reason, NYAA.si now has to move on without Cloudflare, while the mystery remains.

“Frankly, this whole thing is a joke. I don’t understand why they would willingly host much bigger sites like ThePirateBay without any issue, or even ISIS, or the various hacking groups that have used them over time,” the operator says.

If more information about the abuse process interfere becomes available, we’ll definitely follow it up.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Pirates Taunt Amazon Over New “Turd Sandwich” Prime Video Quality

Post Syndicated from Andy original https://torrentfreak.com/pirates-taunt-amazon-over-new-turd-sandwich-prime-video-quality-180419/

Even though they generally aren’t paying for the content they consume, don’t fall into the trap of believing that all pirates are eternally grateful for even poor quality media.

Without a doubt, some of the most quality-sensitive individuals are to be found in pirate communities and they aren’t scared to make their voices known when release groups fail to come up with the best possible goods.

This week there’s been a sustained chorus of disapproval over the quality of pirate video releases sourced from Amazon Prime. The anger is usually directed at piracy groups who fail to capture content in the correct manner but according to a number of observers, the problem is actually at Amazon’s end.

Discussions on Reddit, for example, report that episodes in a single TV series have been declining in filesize and bitrate, from 1.56 GB in 720p at a 3073 kb/s video bitrate for episode 1, down to 907 MB in 720p at just 1514 kb/s video bitrate for episode 10.

Numerous theories as to why this may be the case are being floated around, including that Amazon is trying to save on bandwidth expenses. While this is a possibility, the company hasn’t made any announcements to that end.

Indeed, one legitimate customer reported that he’d raised the quality issue with Amazon and they’d said that the problem was “probably on his end”.

“I have Amazon Prime Video and I noticed the quality was always great for their exclusive shows, so I decided to try buying the shows on Amazon instead of iTunes this year. I paid for season pass subscriptions for Legion, Billions and Homeland this year,” he wrote.

“Just this past weekend, I have noticed a significant drop in details compared to weeks before! So naturally I assumed it was an issue on my end. I started trying different devices, calling support, etc, but nothing really helped.

“Billions continued to look like a blurry mess, almost like I was watching a standard definition DVD instead of the crystal clear HD I paid for and have experienced in the past! And when I check the previous episodes, sure enough, they look fantastic again. What the heck??”

With Amazon distancing itself from the issues, piracy groups have already begun to dig in the knife. Release group DEFLATE has been particularly critical.

“Amazon, in their infinite wisdom, have decided to start fucking with the quality of their encodes. They’re now reaching Netflix’s subpar 1080p.H264 levels, and their H265 encodes aren’t even close to what Netflix produces,” the group said in a file attached to S02E07 of The Good Fight released on Sunday.

“Netflix is able to produce drastic visual improvements with their H265 encodes compared to H264 across every original. In comparison, Amazon can’t decide whether H265 or H264 is going to produce better results, and as a result we suffer for it.”

Arrr! The quality be fallin’

So what’s happening exactly?

A TorrentFreak source (who tells us he’s been working in the BluRay/DCP authoring business for the last 10 years) was kind enough to give us two opinions, one aimed at the techies and another at us mere mortals.

“In technical terms, it appears [Amazon has] increased the CRF [Constant Rate Factor] value they use when encoding for both the HEVC [H265] and H264 streams. Previously, their H264 streams were using CRF 18 and a max bitrate of 15Mbit/s, which usually resulted in file sizes of roughly 3GB, or around 10Mbit/s. Similarly with their HEVC streams, they were using CRF 20 and resulting in streams which were around the same size,” he explained.

“In the past week, the H264 streams have decreased by up to 50% for some streams. While there are no longer any x264 headers embedded in the H264 streams, the HEVC streams still retain those headers and the CRF value used has been increased, so it does appear this change has been done on purpose.”

In layman’s terms, our source believes that Amazon had previously been using an encoding profile that was “right on the edge of relatively good quality” which kept bitrates relatively low but high enough to ensure no perceivable loss of quality.

“H264 streams encoded with CRF 18 could provide an acceptable compromise between quality and file size, where the loss of detail is often negligible when watched at regular viewing distances, at a desk, or in a lounge room on a larger TV,” he explained.

“Recently, it appears these values have been intentionally changed in order to lower the bitrate and file sizes for reasons unknown. As a result, the quality of some streams has been reduced by up to 50% of their previous values. This has introduced a visual loss of quality, comparable to that of viewing something in standard definition versus high definition.”

With the situation failing to improve during the week, by the time piracy group DEFLATE released S03E14 of Supergirl on Tuesday their original criticism had transformed into flat-out insults.

“These are only being done in H265 because Amazon have shit the bed, and it’s a choice between a turd sandwich and a giant douche,” they wrote, offering these images as illustrative of the problem and these indicating what should be achievable.

With DEFLATE advising customers to start complaining to Amazon, the memes have already begun, with unfavorable references to now-defunct group YIFY (which was often chastized for its low quality rips) and even a spin on one of the most well known anti-piracy campaigns.

You wouldn’t download stream….

TorrentFreak contacted Amazon Prime for comment on both the recent changes and growing customer complaints but at the time of publication we were yet to receive a response.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

How Pirates Use New Technologies for Old Sharing Habits

Post Syndicated from Ernesto original https://torrentfreak.com/how-pirates-use-new-technologies-for-old-sharing-habits-180415/

While piracy today is more widespread than ever, the urge to share content online has been around for several decades.

The first generation used relatively primitive tools, such as a bulletin board systems (BBS), newsgroups or IRC. Nothing too fancy, but they worked well for those who got over the initial learning curve.

When Napster came along things started to change. More content became available and with just a few clicks anyone could get an MP3 transferred from one corner of the world to another. The same was true for Kazaa and Limewire, which further popularized online piracy.

After this initial boom of piracy applications, BitTorrent came along, shaking up the sharing landscape even further. As torrent sites are web-based, pirated media became even more public and easy to find.

At the same time, BitTorrent brought back the smaller and more organized sharing culture of the early days through private trackers.

These communities often focused on a specific type of content and put strict rules and guidelines in place. They promoted sharing and avoided the spam that plagued their public counterparts.

That was fifteen years ago.

Today the piracy landscape is more diverse than ever. Private torrent trackers are still around and so are IRC and newsgroups. However, most piracy today takes place in public. Streaming sites and devices are booming, with central hosting platforms offering the majority of the underlying content.

That said, there is still an urge for some pirates to band together and some use newer technologies to do so.

This week The Outline ran an interesting piece on the use of Telegram channels to share pirated media. These groups use the encrypted communication platform to share copies of movies, TV shows, and a wide range of other material.

Telegram allows users to upload files up to 1.5GB in size, but larger ones can be split, in common with the good old newsgroups.

These type of sharing groups are not new. On social media platforms such as Facebook and VK, there are hundreds or thousands of dedicated communities that do the same. Both public and private. And Reddit has similar groups, relying on external links.

According to an administrator of a piracy-focused Telegram channel, the appeal of the platform is that the groups are not shut down so easily. While that may be the case with hyper-private groups, Telegram will still pull the plug if it receives enough complaints about a channel.

The same is true for Discord, another application that can be used to share content in ‘private’ communities. Discord is particularly popular among gamers, but pirates have also found their way to the platform.

While smaller communities are able to thrive, once the word gets out to copyright holders, the party can soon be over. This is also what the /r/piracy subreddit community found out a few days ago when its Discord server was pulled offline.

This triggered a discussion about possible alternatives. Telegram was mentioned by some, although not everyone liked the idea of connecting their phone number to a pirate group. Others mentioned Slack, Weechat, Hexchat and Riot.im.

None of these tools are revolutionary. At least, not for the intended use by this group. Some may be harder to take down than others, but they are all means to share files, directly or through external links.

What really caught our eye, however, were several mentions of an ancient application layer protocol that, apparently, hasn’t lost its use to pirates.

“I’ll make an IRC server and host that,” one user said, with others suggesting the same.

And so we have come full circle…

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

WHOIS Limits Under GDPR Will Make Pirates Harder to Catch, Groups Fear

Post Syndicated from Andy original https://torrentfreak.com/whois-limits-under-gdpr-will-make-pirates-harder-to-catch-groups-fear-180413/

The General Data Protection Regulation (GDPR) is a regulation in EU law covering data protection and privacy for all individuals within the European Union.

As more and more personal data is gathered, stored and (ab)used online, the aim of the GDPR is to protect EU citizens from breaches of privacy. The regulation applies to all companies processing the personal data of subjects residing in the Union, no matter where in the world the company is located.

Penalties for non-compliance can be severe. While there is a tiered approach according to severity, organizations can be fined up to 4% of annual global turnover or €20 million, whichever is greater. Needless to say, the regulations will need to be taken seriously.

Among those affected are domain name registries and registrars who publish the personal details of domain name owners in the public WHOIS database. In a full entry, a person or organization’s name, address, telephone numbers and email addresses can often be found.

This raises a serious issue. While registries and registrars are instructed and contractually obliged to publish data in the WHOIS database by global domain name authority ICANN, in millions of cases this conflicts with the requirements of the GDPR, which prevents the details of private individuals being made freely available on the Internet.

As explained in detail by the EFF, ICANN has been trying to resolve this clash. Its proposed interim model for GDPR compliance (pdf) envisions registrars continuing to collect full WHOIS data but not necessarily publishing it, to “allow the existing data
to be preserved while the community discussions continue on the next generation of WHOIS.”

But the proposed changes that will inevitably restrict free access to WHOIS information has plenty of people spooked, including thousands of companies belonging to entertainment industry groups such as the MPAA, IFPI, RIAA and the Copyright Alliance.

In a letter sent to Vice President Andrus Ansip of the European Commission, these groups and dozens of others warn that restricted access to WHOIS will have a serious effect on their ability to protect their intellectual property rights from “cybercriminals” which pose a threat to their businesses.

Signed by 50 organizations involved in IP protection and other areas of online security, the letter expresses concern that in attempting to comply with the GDPR, ICANN is on a course to “over-correct” while disregarding proportionality, accountability and transparency.

A small sample of the groups calling on ICANN

“We strongly assert that this model does not properly account for the critical public and legitimate interests served by maintaining a sufficient amount of data publicly available while respecting privacy interests of registrants by instituting a tiered or layered access system for the vast majority of personal data as defined by the GDPR,” the groups write.

The letter focuses on two aspects of “over-correction”, the first being ICANN’s proposal that no personal data whatsoever of a domain name registrant will be made available “without appropriate consideration or balancing of the countervailing interests in public disclosure of a limited amount of such data.”

In response to ICANN’s proposal that only the province/state and country of a domain name registrant be made publicly available, the groups advise the organization that publishing “a natural person registrant’s e-mail address” in a publicly accessible WHOIS directory will not constitute a breach of the GDPR.

“[W]e strongly believe that the continued public availability of the registrant’s e-mail address – specifically the e-mail address that the registrant supplies to the registrar at the time the domain name is purchased and which e-mail address the registrar is required to validate – is critical for several reasons,” the groups write.

“First, it is the data element that is typically the most important to have readily available for law enforcement, consumer protection, particularly child protection, intellectual property enforcement and cybersecurity/anti-malware purposes.

“Second, the public accessibility of the registrant’s e-mail address permits a broad array of threats and illegal activities to be addressed quickly and the damage from such threats mitigated and contained in a timely manner, particularly where the abusive/illegal activity may be spawned from a variety of different domain names on different generic Top Level Domains,” they add.

The groups also argue that since making email addresses is effectively required in light of Article 5.1(c) ECD, “there is no legitimate justification to discontinue public availability of the registrant’s e-mail address in the WHOIS directory and especially not in light of other legitimate purposes.”

The EFF, on the other hand, says that being able to contact a domain owner wouldn’t necessarily require an email address to be made public.

“There are other cases in which it makes sense to allow members of the public to contact the owner of a domain, without having to obtain a court order,” EFF writes.

“But this could be achieved very simply if ICANN were simply to provide something like a CAPTCHA-protected contact form, which would deliver email to the appropriate contact point with no need to reveal the registrant’s actual email address.”

The groups’ second main concern is that ICANN reportedly makes no distinction between name registrants that are “natural persons versus those that are legal entities” and intends to treat them all as if they are subject to the GDPR, despite the fact that the regulation only applies to data associated with an “identified or identifiable natural person”.

They say it is imperative that EU Data Protection Authorities are made to understand that when registrants obtain a domain for illegal purposes, they often only register it as a “natural person” when registering as a legal person (legal entity) would be more appropriate, despite that granting them less privacy.

“Consequently, the test for differentiating between a legal and natural person should not merely be the legal status of the registrant, but also whether the registrant is, in fact, acting as a legal or natural person vis a vis the use of the domain name,” the groups note.

“We therefore urge that ICANN be given appropriate guidance as to the importance of maintaining a distinction between natural person and legal person registrants and keeping as much data about legal person domain name registrants as publicly accessible as possible,” they conclude.

What will happen with WHOIS on May 25 still isn’t clear. It wasn’t until October 2017 that ICANN finally determined that it would be affected by the GDPR, meaning that it’s been scrambling ever since to meet the compliance date. And it still is, according to the latest available documentation (pdf).

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

How to retain system tables’ data spanning multiple Amazon Redshift clusters and run cross-cluster diagnostic queries

Post Syndicated from Karthik Sonti original https://aws.amazon.com/blogs/big-data/how-to-retain-system-tables-data-spanning-multiple-amazon-redshift-clusters-and-run-cross-cluster-diagnostic-queries/

Amazon Redshift is a data warehouse service that logs the history of the system in STL log tables. The STL log tables manage disk space by retaining only two to five days of log history, depending on log usage and available disk space.

To retain STL tables’ data for an extended period, you usually have to create a replica table for every system table. Then, for each you load the data from the system table into the replica at regular intervals. By maintaining replica tables for STL tables, you can run diagnostic queries on historical data from the STL tables. You then can derive insights from query execution times, query plans, and disk-spill patterns, and make better cluster-sizing decisions. However, refreshing replica tables with live data from STL tables at regular intervals requires schedulers such as Cron or AWS Data Pipeline. Also, these tables are specific to one cluster and they are not accessible after the cluster is terminated. This is especially true for transient Amazon Redshift clusters that last for only a finite period of ad hoc query execution.

In this blog post, I present a solution that exports system tables from multiple Amazon Redshift clusters into an Amazon S3 bucket. This solution is serverless, and you can schedule it as frequently as every five minutes. The AWS CloudFormation deployment template that I provide automates the solution setup in your environment. The system tables’ data in the Amazon S3 bucket is partitioned by cluster name and query execution date to enable efficient joins in cross-cluster diagnostic queries.

I also provide another CloudFormation template later in this post. This second template helps to automate the creation of tables in the AWS Glue Data Catalog for the system tables’ data stored in Amazon S3. After the system tables are exported to Amazon S3, you can run cross-cluster diagnostic queries on the system tables’ data and derive insights about query executions in each Amazon Redshift cluster. You can do this using Amazon QuickSight, Amazon Athena, Amazon EMR, or Amazon Redshift Spectrum.

You can find all the code examples in this post, including the CloudFormation templates, AWS Glue extract, transform, and load (ETL) scripts, and the resolution steps for common errors you might encounter in this GitHub repository.

Solution overview

The solution in this post uses AWS Glue to export system tables’ log data from Amazon Redshift clusters into Amazon S3. The AWS Glue ETL jobs are invoked at a scheduled interval by AWS Lambda. AWS Systems Manager, which provides secure, hierarchical storage for configuration data management and secrets management, maintains the details of Amazon Redshift clusters for which the solution is enabled. The last-fetched time stamp values for the respective cluster-table combination are maintained in an Amazon DynamoDB table.

The following diagram covers the key steps involved in this solution.

The solution as illustrated in the preceding diagram flows like this:

  1. The Lambda function, invoke_rs_stl_export_etl, is triggered at regular intervals, as controlled by Amazon CloudWatch. It’s triggered to look up the AWS Systems Manager parameter store to get the details of the Amazon Redshift clusters for which the system table export is enabled.
  2. The same Lambda function, based on the Amazon Redshift cluster details obtained in step 1, invokes the AWS Glue ETL job designated for the Amazon Redshift cluster. If an ETL job for the cluster is not found, the Lambda function creates one.
  3. The ETL job invoked for the Amazon Redshift cluster gets the cluster credentials from the parameter store. It gets from the DynamoDB table the last exported time stamp of when each of the system tables was exported from the respective Amazon Redshift cluster.
  4. The ETL job unloads the system tables’ data from the Amazon Redshift cluster into an Amazon S3 bucket.
  5. The ETL job updates the DynamoDB table with the last exported time stamp value for each system table exported from the Amazon Redshift cluster.
  6. The Amazon Redshift cluster system tables’ data is available in Amazon S3 and is partitioned by cluster name and date for running cross-cluster diagnostic queries.

Understanding the configuration data

This solution uses AWS Systems Manager parameter store to store the Amazon Redshift cluster credentials securely. The parameter store also securely stores other configuration information that the AWS Glue ETL job needs for extracting and storing system tables’ data in Amazon S3. Systems Manager comes with a default AWS Key Management Service (AWS KMS) key that it uses to encrypt the password component of the Amazon Redshift cluster credentials.

The following table explains the global parameters and cluster-specific parameters required in this solution. The global parameters are defined once and applicable at the overall solution level. The cluster-specific parameters are specific to an Amazon Redshift cluster and repeat for each cluster for which you enable this post’s solution. The CloudFormation template explained later in this post creates these parameters as part of the deployment process.

Parameter name Type Description
Global parametersdefined once and applied to all jobs
redshift_query_logs.global.s3_prefix String The Amazon S3 path where the query logs are exported. Under this path, each exported table is partitioned by cluster name and date.
redshift_query_logs.global.tempdir String The Amazon S3 path that AWS Glue ETL jobs use for temporarily staging the data.
redshift_query_logs.global.role> String The name of the role that the AWS Glue ETL jobs assume. Just the role name is sufficient. The complete Amazon Resource Name (ARN) is not required.
redshift_query_logs.global.enabled_cluster_list StringList A comma-separated list of cluster names for which system tables’ data export is enabled. This gives flexibility for a user to exclude certain clusters.
Cluster-specific parametersfor each cluster specified in the enabled_cluster_list parameter
redshift_query_logs.<<cluster_name>>.connection String The name of the AWS Glue Data Catalog connection to the Amazon Redshift cluster. For example, if the cluster name is product_warehouse, the entry is redshift_query_logs.product_warehouse.connection.
redshift_query_logs.<<cluster_name>>.user String The user name that AWS Glue uses to connect to the Amazon Redshift cluster.
redshift_query_logs.<<cluster_name>>.password Secure String The password that AWS Glue uses to connect the Amazon Redshift cluster’s encrypted-by key that is managed in AWS KMS.

For example, suppose that you have two Amazon Redshift clusters, product-warehouse and category-management, for which the solution described in this post is enabled. In this case, the parameters shown in the following screenshot are created by the solution deployment CloudFormation template in the AWS Systems Manager parameter store.

Solution deployment

To make it easier for you to get started, I created a CloudFormation template that automatically configures and deploys the solution—only one step is required after deployment.

Prerequisites

To deploy the solution, you must have one or more Amazon Redshift clusters in a private subnet. This subnet must have a network address translation (NAT) gateway or a NAT instance configured, and also a security group with a self-referencing inbound rule for all TCP ports. For more information about why AWS Glue ETL needs the configuration it does, described previously, see Connecting to a JDBC Data Store in a VPC in the AWS Glue documentation.

To start the deployment, launch the CloudFormation template:

CloudFormation stack parameters

The following table lists and describes the parameters for deploying the solution to export query logs from multiple Amazon Redshift clusters.

Property Default Description
S3Bucket mybucket The bucket this solution uses to store the exported query logs, stage code artifacts, and perform unloads from Amazon Redshift. For example, the mybucket/extract_rs_logs/data bucket is used for storing all the exported query logs for each system table partitioned by the cluster. The mybucket/extract_rs_logs/temp/ bucket is used for temporarily staging the unloaded data from Amazon Redshift. The mybucket/extract_rs_logs/code bucket is used for storing all the code artifacts required for Lambda and the AWS Glue ETL jobs.
ExportEnabledRedshiftClusters Requires Input A comma-separated list of cluster names from which the system table logs need to be exported.
DataStoreSecurityGroups Requires Input A list of security groups with an inbound rule to the Amazon Redshift clusters provided in the parameter, ExportEnabledClusters. These security groups should also have a self-referencing inbound rule on all TCP ports, as explained on Connecting to a JDBC Data Store in a VPC.

After you launch the template and create the stack, you see that the following resources have been created:

  1. AWS Glue connections for each Amazon Redshift cluster you provided in the CloudFormation stack parameter, ExportEnabledRedshiftClusters.
  2. All parameters required for this solution created in the parameter store.
  3. The Lambda function that invokes the AWS Glue ETL jobs for each configured Amazon Redshift cluster at a regular interval of five minutes.
  4. The DynamoDB table that captures the last exported time stamps for each exported cluster-table combination.
  5. The AWS Glue ETL jobs to export query logs from each Amazon Redshift cluster provided in the CloudFormation stack parameter, ExportEnabledRedshiftClusters.
  6. The IAM roles and policies required for the Lambda function and AWS Glue ETL jobs.

After the deployment

For each Amazon Redshift cluster for which you enabled the solution through the CloudFormation stack parameter, ExportEnabledRedshiftClusters, the automated deployment includes temporary credentials that you must update after the deployment:

  1. Go to the parameter store.
  2. Note the parameters <<cluster_name>>.user and redshift_query_logs.<<cluster_name>>.password that correspond to each Amazon Redshift cluster for which you enabled this solution. Edit these parameters to replace the placeholder values with the right credentials.

For example, if product-warehouse is one of the clusters for which you enabled system table export, you edit these two parameters with the right user name and password and choose Save parameter.

Querying the exported system tables

Within a few minutes after the solution deployment, you should see Amazon Redshift query logs being exported to the Amazon S3 location, <<S3Bucket_you_provided>>/extract_redshift_query_logs/data/. In that bucket, you should see the eight system tables partitioned by customer name and date: stl_alert_event_log, stl_dlltext, stl_explain, stl_query, stl_querytext, stl_scan, stl_utilitytext, and stl_wlm_query.

To run cross-cluster diagnostic queries on the exported system tables, create external tables in the AWS Glue Data Catalog. To make it easier for you to get started, I provide a CloudFormation template that creates an AWS Glue crawler, which crawls the exported system tables stored in Amazon S3 and builds the external tables in the AWS Glue Data Catalog.

Launch this CloudFormation template to create external tables that correspond to the Amazon Redshift system tables. S3Bucket is the only input parameter required for this stack deployment. Provide the same Amazon S3 bucket name where the system tables’ data is being exported. After you successfully create the stack, you can see the eight tables in the database, redshift_query_logs_db, as shown in the following screenshot.

Now, navigate to the Athena console to run cross-cluster diagnostic queries. The following screenshot shows a diagnostic query executed in Athena that retrieves query alerts logged across multiple Amazon Redshift clusters.

You can build the following example Amazon QuickSight dashboard by running cross-cluster diagnostic queries on Athena to identify the hourly query count and the key query alert events across multiple Amazon Redshift clusters.

How to extend the solution

You can extend this post’s solution in two ways:

  • Add any new Amazon Redshift clusters that you spin up after you deploy the solution.
  • Add other system tables or custom query results to the list of exports from an Amazon Redshift cluster.

Extend the solution to other Amazon Redshift clusters

To extend the solution to more Amazon Redshift clusters, add the three cluster-specific parameters in the AWS Systems Manager parameter store following the guidelines earlier in this post. Modify the redshift_query_logs.global.enabled_cluster_list parameter to append the new cluster to the comma-separated string.

Extend the solution to add other tables or custom queries to an Amazon Redshift cluster

The current solution ships with the export functionality for the following Amazon Redshift system tables:

  • stl_alert_event_log
  • stl_dlltext
  • stl_explain
  • stl_query
  • stl_querytext
  • stl_scan
  • stl_utilitytext
  • stl_wlm_query

You can easily add another system table or custom query by adding a few lines of code to the AWS Glue ETL job, <<cluster-name>_extract_rs_query_logs. For example, suppose that from the product-warehouse Amazon Redshift cluster you want to export orders greater than $2,000. To do so, add the following five lines of code to the AWS Glue ETL job product-warehouse_extract_rs_query_logs, where product-warehouse is your cluster name:

  1. Get the last-processed time-stamp value. The function creates a value if it doesn’t already exist.

salesLastProcessTSValue = functions.getLastProcessedTSValue(trackingEntry=”mydb.sales_2000",job_configs=job_configs)

  1. Run the custom query with the time stamp.

returnDF=functions.runQuery(query="select * from sales s join order o where o.order_amnt > 2000 and sale_timestamp > '{}'".format (salesLastProcessTSValue) ,tableName="mydb.sales_2000",job_configs=job_configs)

  1. Save the results to Amazon S3.

functions.saveToS3(dataframe=returnDF,s3Prefix=s3Prefix,tableName="mydb.sales_2000",partitionColumns=["sale_date"],job_configs=job_configs)

  1. Get the latest time-stamp value from the returned data frame in Step 2.

latestTimestampVal=functions.getMaxValue(returnDF,"sale_timestamp",job_configs)

  1. Update the last-processed time-stamp value in the DynamoDB table.

functions.updateLastProcessedTSValue(“mydb.sales_2000",latestTimestampVal[0],job_configs)

Conclusion

In this post, I demonstrate a serverless solution to retain the system tables’ log data across multiple Amazon Redshift clusters. By using this solution, you can incrementally export the data from system tables into Amazon S3. By performing this export, you can build cross-cluster diagnostic queries, build audit dashboards, and derive insights into capacity planning by using services such as Athena. I also demonstrate how you can extend this solution to other ad hoc query use cases or tables other than system tables by adding a few lines of code.


Additional Reading

If you found this post useful, be sure to check out Using Amazon Redshift Spectrum, Amazon Athena, and AWS Glue with Node.js in Production and Amazon Redshift – 2017 Recap.


About the Author

Karthik Sonti is a senior big data architect at Amazon Web Services. He helps AWS customers build big data and analytical solutions and provides guidance on architecture and best practices.

 

 

 

 

The Digital Security Exchange Is Live

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/04/the_digital_sec.html

Last year I wrote about the Digital Security Exchange. The project is live:

The DSX works to strengthen the digital resilience of U.S. civil society groups by improving their understanding and mitigation of online threats.

We do this by pairing civil society and social sector organizations with credible and trustworthy digital security experts and trainers who can help them keep their data and networks safe from exposure, exploitation, and attack. We are committed to working with community-based organizations, legal and journalistic organizations, civil rights advocates, local and national organizers, and public and high-profile figures who are working to advance social, racial, political, and economic justice in our communities and our world.

If you are either an organization who needs help, or an expert who can provide help, visit their website.

Note: I am on their advisory committee.

Spooky Torrent Warns EZTV Users About “Huge Security Risk”

Post Syndicated from Ernesto original https://torrentfreak.com/spooky-torrent-warns-eztv-users-about-huge-security-risk-180408/

For more than a decade EZTV has been a widely recognized brand among many BitTorrent users and known as one of the main TV-distribution groups.

While the original EZTV shut down following a hostile takeover, the people who took over are still serving torrents to millions of people every month.

Generally speaking, EZTV takes releases from outside encoders which they then distribute with their own nametag. It’s been like this for years and has never caused any real problems.

Last week, however, a disturbing release was added to the site, sending a message to EZTV users. What appeared to be a regular release of Lucifer S03E19, turned into something darker.

Ten minutes into the episode, a red warning appears, telling viewers that EZTV.ag is a huge security risk.

Huge Security Risk

Throughout the rest of the episode, a few dozen IP-addresses appear plastered across the screen. Needless to say, this makes the program rather unwatchable.

According to the earlier message, these IP-addresses are “used on EZTV.ag.” This seems to suggest that the website has a leak somewhere unless it refers to IP-addresses of downloaders, which are public anyway.

IP-addresses

It is hard to grasp what’s really going on here and there is no direct evidence that the site has been breached in any way. Not directly at least.

At the end of the episode, a final message appears, adding to the intrigue. The message comes from the encoder DeXoX and offers up a complete IP-address database, email addresses of registered EZTV users, and more.

DeXoX

Again, we have not been able to verify the validity of these claims but it’s certainly not good PR for EZTV. The spooky torrent has been downloaded by thousands of people already and is still listed on the site several days after first appearing.

We are not familiar with DeXoX, but it appears that the person behind the handle is not a fan of EZTV.ag, to say the least.

It remains unclear how the torrent was added to the site. It could be that the EZTV site has indeed been breached in some way, or DeXoX has access to the site where EZTV sources its material. In any event, the release page or the site itself contains no warnings, only the video itself.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

New – Machine Learning Inference at the Edge Using AWS Greengrass

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-machine-learning-inference-at-the-edge-using-aws-greengrass/

What happens when you combine the Internet of Things, Machine Learning, and Edge Computing? Before I tell you, let’s review each one and discuss what AWS has to offer.

Internet of Things (IoT) – Devices that connect the physical world and the digital one. The devices, often equipped with one or more types of sensors, can be found in factories, vehicles, mines, fields, homes, and so forth. Important AWS services include AWS IoT Core, AWS IoT Analytics, AWS IoT Device Management, and Amazon FreeRTOS, along with others that you can find on the AWS IoT page.

Machine Learning (ML) – Systems that can be trained using an at-scale dataset and statistical algorithms, and used to make inferences from fresh data. At Amazon we use machine learning to drive the recommendations that you see when you shop, to optimize the paths in our fulfillment centers, fly drones, and much more. We support leading open source machine learning frameworks such as TensorFlow and MXNet, and make ML accessible and easy to use through Amazon SageMaker. We also provide Amazon Rekognition for images and for video, Amazon Lex for chatbots, and a wide array of language services for text analysis, translation, speech recognition, and text to speech.

Edge Computing – The power to have compute resources and decision-making capabilities in disparate locations, often with intermittent or no connectivity to the cloud. AWS Greengrass builds on AWS IoT, giving you the ability to run Lambda functions and keep device state in sync even when not connected to the Internet.

ML Inference at the Edge
Today I would like to toss all three of these important new technologies into a blender! You can now perform Machine Learning inference at the edge using AWS Greengrass. This allows you to use the power of the AWS cloud (including fast, powerful instances equipped with GPUs) to build, train, and test your ML models before deploying them to small, low-powered, intermittently-connected IoT devices running in those factories, vehicles, mines, fields, and homes that I mentioned.

Here are a few of the many ways that you can put Greengrass ML Inference to use:

Precision Farming – With an ever-growing world population and unpredictable weather that can affect crop yields, the opportunity to use technology to increase yields is immense. Intelligent devices that are literally in the field can process images of soil, plants, pests, and crops, taking local corrective action and sending status reports to the cloud.

Physical Security – Smart devices (including the AWS DeepLens) can process images and scenes locally, looking for objects, watching for changes, and even detecting faces. When something of interest or concern arises, the device can pass the image or the video to the cloud and use Amazon Rekognition to take a closer look.

Industrial Maintenance – Smart, local monitoring can increase operational efficiency and reduce unplanned downtime. The monitors can run inference operations on power consumption, noise levels, and vibration to flag anomalies, predict failures, detect faulty equipment.

Greengrass ML Inference Overview
There are several different aspects to this new AWS feature. Let’s take a look at each one:

Machine Learning ModelsPrecompiled TensorFlow and MXNet libraries, optimized for production use on the NVIDIA Jetson TX2 and Intel Atom devices, and development use on 32-bit Raspberry Pi devices. The optimized libraries can take advantage of GPU and FPGA hardware accelerators at the edge in order to provide fast, local inferences.

Model Building and Training – The ability to use Amazon SageMaker and other cloud-based ML tools to build, train, and test your models before deploying them to your IoT devices. To learn more about SageMaker, read Amazon SageMaker – Accelerated Machine Learning.

Model Deployment – SageMaker models can (if you give them the proper IAM permissions) be referenced directly from your Greengrass groups. You can also make use of models stored in S3 buckets. You can add a new machine learning resource to a group with a couple of clicks:

These new features are available now and you can start using them today! To learn more read Perform Machine Learning Inference.

Jeff;

 

Ефектите на медийната консолидация: по сценарий

Post Syndicated from nellyo original https://nellyo.wordpress.com/2018/04/04/sinclair-2/

 Sinclair Broadcast Group е най-голяма тв група по брой медии и също така по покритие в САЩ. Известни са с  новинарско съдържание и предавания, които популяризират консервативни политически позиции  и са в подкрепа на Републиканската партия.

 Когато Тръмп каже следното –

 

– ето какво следва в медиите на Синклер  – каскада от еднотипни изпълнения по сценарий, както се вижда във видеото:

“Някои  медии  използват своите платформи, за да наложат своето лично пристрастие. Това е изключително опасно за нашата демокрация.”

Видеото е публикувано и в NYT:

 

Показва ефектите  на медийната консолидация за правото на информация.  А също показва и как това управление означава критиката като fake.

+ op-ed от вчера

 

Police Assisted By MPAA Shut Down Pirate TV Box Sellers

Post Syndicated from Andy original https://torrentfreak.com/police-assisted-by-mpaa-shut-down-pirate-tv-box-sellers-180404/

Piracy configured set-top boxes are the next big thing, today. Millions have been sold around the world and anti-piracy groups are scrambling to rein them in.

Many strategies are being tested, from pressurizing developers of allegedly infringing addons to filing aggressive lawsuits against sites such as TVAddons, a Kodi addon repository now facing civil action in both the United States and Canada.

Also under fire are companies that sell set-top boxes that come ready configured for piracy. Both Tickbox TV and Dragon Media Inc are being sued by the Alliance for Creativity and Entertainment (ACE) in the US. At this stage, neither case looks promising for the defendants.

However, civil action isn’t the only way to deal with defendants in the United States, as a man and woman team from Tampa, Florida, have just discovered after being arrested by local police.

Mickael Cantrell and Nancy Major were allegedly the brains behind NBEETV, a company promising to supply set-top boxes that deliver “every movie, every tv show that’s ever been made, plus live sports with no blackouts” with “no monthly fees ever.”

As similar cases have shown, this kind of marketing spiel rarely ends well for defendants but the people behind NBEE TV (also known as FreeTVForLife Inc.) were either oblivious or simply didn’t care about the consequences.

A company press release dated April 2017 advertising the company’s NBPro 3+ box and tracked down by TF this week reveals the extent of the boasts.

“NBPRO 3+ is a TV box that offers instant access to watch every episode of any TV show without paying any monthly bill. One just must attach the loaded box to his TV and stream whatever they want, with no commercials,” the company wrote.

But while “Free TV for Life” was the slogan, that wasn’t the reality at the outset.

NBEETV’s Kodi-powered Android boxes were hellishly expensive with the NBPRO 1, NBPRO 3, NBPRO 5 costing $199.00, $279.00 and $359.00 respectively. This, however, was presented as a bargain alongside a claim that the “average [monthly] cable bill across the country is approximately $198.00” per month.

On top of the base product, NBEETV offered an 800 number for customer support and from their physical premises, they ran “training classes every Tuesday and Thursdays at 11:00” for people to better understand their products.

The location of that building isn’t mentioned in local media but a WHOIS on the company’s FreeTVForLife domain yields a confirmed address. It’s one that’s also been complained about in the past by an unhappy customer.

“Free TV for LIFE [redacted]..(next to K-Mart) Hudson, Fl.. 34667. We bought the Little black box costing $277.00. The pictures were not clear,” Rita S. wrote.

“The screen froze up on us all the time, even after hooking straight into the router. When we took the unit back they kept $80 of our money….were very rude, using the ************* word and we will not get the remainder of our money for 14-28 days according to the employee at the store. Buyers beware and I am telling everyone!!!”

While this customer was clearly unhappy, NBEETV claimed to be a “movement which is spreading across the country.” Unfortunately, that movement reached the eyes of the police, who didn’t think that the content being offered on the devices should have been presented for free.

“We saw [the boxes] had Black Panther, The Shape Of Water, Jumanji was on there as well,” said Detective Darren Hill.

“This is someone blatantly on the side of the road just selling them, with signage, a store front; advertising on the internet with a website.”

Detective Hill worked on the case with the MPAA but even from TorrentFreak’s limited investigations this week, the couple were incredibly easy to identify.

Aside from providing accurate and non-hidden address data in WHOIS records, Mickael Cantrell (also known as Michael Cantrell) put in his real name too. The listed email address is also easily traced back to a company called Nanny Bees Corporation which was operated by Cantrell and partner Nancy Major, who was also arrested in the NBEETV case.

Unfortunately for the couple, the blundering didn’t stop there. Their company YouTube channel, which is packed with tutorials, is also in Cantrell’s real name. Indeed, the photograph supplied to YouTube even matches the mugshot published by ABC Action News.

The publication reports that the Sheriff’s Office found the couple with around 50 ‘pirate’ boxes. The store operated by the couple has also been shutdown.

Finally, another curious aspect of NBEETV’s self-promotion comes via a blog post/press release dated August 2017 in which Cantrell suddenly ups the ante by becoming Michael W. Cantrell, Ph. D alongside some bold and unusual claims.

“Dr. Cantrell unleashes his latest innovation, a Smart TV Box that literally updates every ten minutes. Not only does the content (what you can view) but the whole platform updates automatically. If the Company changes an icon you receive the change in real time,” the release reads.

“Thanks to the Overlay Processor that Dr. Cantrell created, this processor named B-D.A.D (Binary Data Acceleration Dump) which enhances an Android unit’s operating power 5 times than the original bench test, has set a new industry standard around the world.”

Sounds epic….perhaps it powered the following video clip.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Tag Amazon EBS Snapshots on Creation and Implement Stronger Security Policies

Post Syndicated from Woo Kim original https://aws.amazon.com/blogs/compute/tag-amazon-ebs-snapshots-on-creation-and-implement-stronger-security-policies/

This blog was contributed by Rucha Nene, Sr. Product Manager for Amazon EBS

AWS customers use tags to track ownership of resources, implement compliance protocols, control access to resources via IAM policies, and drive their cost accounting processes. Last year, we made tagging for Amazon EC2 instances and Amazon EBS volumes easier by adding the ability to tag these resources upon creation. We are now extending this capability to EBS snapshots.

Earlier, you could tag your EBS snapshots only after the resource had been created and sometimes, ended up with EBS snapshots in an untagged state if tagging failed. You also could not control the actions that users and groups could take over specific snapshots, or enforce tighter security policies.

To address these issues, we are making tagging for EBS snapshots more flexible and giving customers more control over EBS snapshots by introducing two new capabilities:

  • Tag on creation for EBS snapshots – You can now specify tags for EBS snapshots as part of the API call that creates the resource or via the Amazon EC2 Console when creating an EBS snapshot.
  • Resource-level permission and enforced tag usage – The CreateSnapshot, DeleteSnapshot, and ModifySnapshotAttrribute API actions now support IAM resource-level permissions. You can now write IAM policies that mandate the use of specific tags when taking actions on EBS snapshots.

Tag on creation

You can now specify tags for EBS snapshots as part of the API call that creates the resources. The resource creation and the tagging are performed atomically; both must succeed in order for the operation CreateSnapshot to succeed. You no longer need to build tagging scripts that run after EBS snapshots have been created.

Here’s how you specify tags when you create an EBS snapshot, using the console:

  1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
  2. In the navigation pane, choose Snapshots, Create Snapshot.
  3. On the Create Snapshot page, select the volume for which to create a snapshot.
  4. (Optional) Choose Add tags to your snapshot. For each tag, provide a tag key and a tag value.
  5. Choose Create Snapshot.

Using the AWS CLI:

aws ec2 create-snapshot --volume-id vol-0c0e757e277111f3c --description 'Prod_Backup' --tag-specifications 
'ResourceType=snapshot,Tags=[{Key=costcenter,Value=115},{Key=IsProd,Value=Yes}]'

To learn more, see Using Tags.

Resource-level permissions and enforced tag usage

CreateSnapshot, DeleteSnapshot, and ModifySnapshotAttribute now support resource-level permissions, which allow you to exercise more control over EBS snapshots. You can write IAM policies that give you precise control over access to resources and let you specify which users are able to create snapshots for a given set of volumes. You can also enforce the use of specific tags to help track resources and achieve more accurate cost allocation reporting.

For example, here’s a statement that requires that the costcenter tag (with a value of “115”) be present on the volume from which snapshots are being created. It requires that this tag be applied to all newly created snapshots. In addition, it requires that the created snapshots are tagged with User:username for the customer.

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Effect":"Allow",
         "Action":"ec2:CreateSnapshot",
         "Resource":"arn:aws:ec2:us-east-1:123456789012:volume/*",
	   "Condition": {
		"StringEquals":{
               "ec2:ResourceTag/costcenter":"115"
}
 }
	
      },
      {
         "Sid":"AllowCreateTaggedSnapshots",
         "Effect":"Allow",
         "Action":"ec2:CreateSnapshot",
         "Resource":"arn:aws:ec2:us-east-1::snapshot/*",
         "Condition":{
            "StringEquals":{
               "aws:RequestTag/costcenter":"115",
		   "aws:RequestTag/User":"${aws:username}"
            },
            "ForAllValues:StringEquals":{
               "aws:TagKeys":[
                  "costcenter",
			"User"
               ]
            }
         }
      },
      {
         "Effect":"Allow",
         "Action":"ec2:CreateTags",
         "Resource":"arn:aws:ec2:us-east-1::snapshot/*",
         "Condition":{
            "StringEquals":{
               "ec2:CreateAction":"CreateSnapshot"
            }
         }
      }
   ]
}

To implement stronger compliance and security policies, you could also restrict access to DeleteSnapshot, if the resource is not tagged with the user’s name. Here’s a statement that allows the deletion of a snapshot only if the snapshot is tagged with User:username for the customer.

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Effect":"Allow",
         "Action":"ec2:DeleteSnapshot",
         "Resource":"arn:aws:ec2:us-east-1::snapshot/*",
         "Condition":{
            "StringEquals":{
               "ec2:ResourceTag/User":"${aws:username}"
            }
         }
      }
   ]
}

To learn more and to see some sample policies, see IAM Policies for Amazon EC2 and Working with Snapshots.

Available Now

These new features are available now in all AWS Regions. You can start using it today from the Amazon EC2 Console, AWS Command Line Interface (CLI), or the AWS APIs.

How to migrate a Hue database from an existing Amazon EMR cluster

Post Syndicated from Anvesh Ragi original https://aws.amazon.com/blogs/big-data/how-to-migrate-a-hue-database-from-an-existing-amazon-emr-cluster/

Hadoop User Experience (Hue) is an open-source, web-based, graphical user interface for use with Amazon EMR and Apache Hadoop. The Hue database stores things like users, groups, authorization permissions, Apache Hive queries, Apache Oozie workflows, and so on.

There might come a time when you want to migrate your Hue database to a new EMR cluster. For example, you might want to upgrade from an older version of the Amazon EMR AMI (Amazon Machine Image), but your Hue application and its database have had a lot of customization.You can avoid re-creating these user entities and retain query/workflow histories in Hue by migrating the existing Hue database, or remote database in Amazon RDS, to a new cluster.

By default, Hue user information and query histories are stored in a local MySQL database on the EMR cluster’s master node. However, you can create one or more Hue-enabled clusters using a configuration stored in Amazon S3 and a remote MySQL database in Amazon RDS. This allows you to preserve user information and query history that Hue creates without keeping your Amazon EMR cluster running.

This post describes the step-by-step process for migrating the Hue database from an existing EMR cluster.

Note: Amazon EMR supports different Hue versions across different AMI releases. Keep in mind the compatibility of Hue versions between the old and new clusters in this migration activity. Currently, Hue 3.x.x versions are not compatible with Hue 4.x.x versions, and therefore a migration between these two Hue versions might create issues. In addition, Hue 3.10.0 is not backward compatible with its previous 3.x.x versions.

Before you begin

First, let’s create a new testUser in Hue on an existing EMR cluster, as shown following:

You will use these credentials later to log in to Hue on the new EMR cluster and validate whether you have successfully migrated the Hue database.

Let’s get started!

Migration how-to

Follow these steps to migrate your database to a new EMR cluster and then validate the migration process.

1.) Make a backup of the existing Hue database.

Use SSH to connect to the master node of the old cluster, as shown following (if you are using Linux/Unix/macOS), and dump the Hue database to a JSON file.

$ ssh -i ~/key.pem [email protected]
$ /usr/lib/hue/build/env/bin/hue dumpdata > ./hue-mysql.json

Edit the hue-mysql.json output file by removing all JSON objects that have useradmin.userprofile in the model field, and save the file. For example, remove the objects as shown following:

{
  "pk": 1,
  "model": "useradmin.userprofile",
  "fields": {
    "last_activity": "2018-01-10T11:41:04",
    "creation_method": "HUE",
    "first_login": false,
    "user": 1,
    "home_directory": "/user/hue_admin"
  }
},

2.) Store the hue-mysql.json file on persistent storage like Amazon S3.

You can copy the file from the old EMR cluster to Amazon S3 using the AWS CLI or Secure Copy (SCP) client. For example, the following uses the AWS CLI:

$ aws s3 cp ./hue-mysql.json s3://YourBucketName/folder/

3.) Recover/reload the backed-up Hue database into the new EMR cluster.

a.) Use SSH to connect to the master node of the new EMR cluster, and stop the Hue service that is already running.

$ ssh -i ~/key.pem [email protected]
$ sudo stop hue
hue stop/waiting

b.) Connect to the Hue database—either the local MySQL database or the remote database in Amazon RDS for your cluster as shown following, using the mysql client.

$ mysql -h HOST –u USER –pPASSWORD

For a local MySQL database, you can find the hostname, user name, and password for connecting to the database in the /etc/hue/conf/hue.ini file on the master node.

[[database]]
    engine = mysql
    name = huedb
    case_insensitive_collation = utf8_unicode_ci
    test_charset = utf8
    test_collation = utf8_bin
    host = ip-172-31-37-133.us-west-2.compute.internal
    user = hue
    test_name = test_huedb
    password = QdWbL3Ai6GcBqk26
    port = 3306

Based on the preceding example configuration, the sample command is as follows. (Replace the host, user, and password details based on your EMR cluster settings.)

$ mysql -h ip-172-31-37-133.us-west-2.compute.internal -u hue -pQdWbL3Ai6GcBqk26

c.) Drop the existing Hue database with the name huedb from the MySQL server.

mysql> DROP DATABASE IF EXISTS huedb;

d.) Create a new empty database with the same name huedb.

mysql> CREATE DATABASE huedb DEFAULT CHARACTER SET utf8 DEFAULT COLLATE=utf8_bin;

e.) Now, synchronize Hue with its database huedb.

$ sudo /usr/lib/hue/build/env/bin/hue syncdb --noinput
$ sudo /usr/lib/hue/build/env/bin/hue migrate

(This populates the new huedb with all Hue tables that are required.)

f.) Log in to MySQL again, and drop the foreign key to clean tables.

mysql> SHOW CREATE TABLE huedb.auth_permission;

In the following example, replace <id value> with the actual value from the preceding output.

mysql> ALTER TABLE huedb.auth_permission DROP FOREIGN KEY
content_type_id_refs_id_<id value>;

g.) Delete the contents of the django_content_type

mysql> DELETE FROM huedb.django_content_type;

h.) Download the backed-up Hue database dump from Amazon S3 to the new EMR cluster, and load it into Hue.

$ aws s3 cp s3://YourBucketName/folder/hue-mysql.json ./
$ sudo /usr/lib/hue/build/env/bin/hue loaddata ./hue-mysql.json

i.) In MySQL, add the foreign key content_type_id back to the auth_permission

mysql> use huedb;
mysql> ALTER TABLE huedb.auth_permission ADD FOREIGN KEY (`content_type_id`) REFERENCES `django_content_type` (`id`);

j.) Start the Hue service again.

$ sudo start hue
hue start/running, process XXXX

That’s it! Now, verify whether you can successfully access the Hue UI, and sign in using your existing testUser credentials.

After a successful sign in to Hue on the new EMR cluster, you should see a similar Hue homepage as shown following with testUser as the user signed in:

Conclusion

You have now learned how to migrate an existing Hue database to a new Amazon EMR cluster and validate the migration process. If you have any similar Amazon EMR administration topics that you want to see covered in a future post, please let us know in the comments below.


Additional Reading

If you found this post useful, be sure to check out Anomaly Detection Using PySpark, Hive, and Hue on Amazon EMR and Dynamically Create Friendly URLs for Your Amazon EMR Web Interfaces.


About the Author


Anvesh Ragi is a Big Data Support Engineer with Amazon Web Services. He works closely with AWS customers to provide them architectural and engineering assistance for their data processing workflows. In his free time, he enjoys traveling and going for hikes.

[$] An audit container ID proposal

Post Syndicated from corbet original https://lwn.net/Articles/750313/rss

The kernel development community has consistently resisted adding any
formal notion of what a “container” is to the kernel. While the needed
building blocks (namespaces, control groups, etc.) are provided, it is up
to user space to assemble the pieces into the sort of container
implementation it needs. This approach maximizes flexibility and makes it
possible to implement a number of different container abstractions, but it
also can make it hard to associate events in the kernel with the container
that caused them. Audit container IDs are an attempt to fix that problem
for one specific use case; they have not been universally well received in
the past, but work on this mechanism continues regardless.