Tag Archives: Partners

E-Mail Tracking

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/12/e-mail_tracking_1.html

Good article on the history and practice of e-mail tracking:

The tech is pretty simple. Tracking clients embed a line of code in the body of an email­ — usually in a 1×1 pixel image, so tiny it’s invisible, but also in elements like hyperlinks and custom fonts. When a recipient opens the email, the tracking client recognizes that pixel has been downloaded, as well as where and on what device. Newsletter services, marketers, and advertisers have used the technique for years, to collect data about their open rates; major tech companies like Facebook and Twitter followed suit in their ongoing quest to profile and predict our behavior online.

But lately, a surprising­ — and growing­ — number of tracked emails are being sent not from corporations, but acquaintances. “We have been in touch with users that were tracked by their spouses, business partners, competitors,” says Florian Seroussi, the founder of OMC. “It’s the wild, wild west out there.”

According to OMC’s data, a full 19 percent of all “conversational” email is now tracked. That’s one in five of the emails you get from your friends. And you probably never noticed.

I admit it’s enticing. I would very much like the statistics that adding trackers to Crypto-Gram would give me. But I still don’t do it.

Now Open – AWS China (Ningxia) Region

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-open-aws-china-ningxia-region/

Today we launched our 17th Region globally, and the second in China. The AWS China (Ningxia) Region, operated by Ningxia Western Cloud Data Technology Co. Ltd. (NWCD), is generally available now and provides customers another option to run applications and store data on AWS in China.

The Details
At launch, the new China (Ningxia) Region, operated by NWCD, supports Auto Scaling, AWS Config, AWS CloudFormation, AWS CloudTrail, Amazon CloudWatch, CloudWatch Events, Amazon CloudWatch Logs, AWS CodeDeploy, AWS Direct Connect, Amazon DynamoDB, Amazon Elastic Compute Cloud (EC2), Amazon Elastic Block Store (EBS), Amazon EC2 Systems Manager, AWS Elastic Beanstalk, Amazon ElastiCache, Amazon Elasticsearch Service, Elastic Load Balancing, Amazon EMR, Amazon Glacier, AWS Identity and Access Management (IAM), Amazon Kinesis Streams, Amazon Redshift, Amazon Relational Database Service (RDS), Amazon Simple Storage Service (S3), Amazon Simple Notification Service (SNS), Amazon Simple Queue Service (SQS), AWS Support API, AWS Trusted Advisor, Amazon Simple Workflow Service (SWF), Amazon Virtual Private Cloud, and VM Import. Visit the AWS China Products page for additional information on these services.

The Region supports all sizes of C4, D2, M4, T2, R4, I3, and X1 instances.

Check out the AWS Global Infrastructure page to learn more about current and future AWS Regions.

Operating Partner
To comply with China’s legal and regulatory requirements, AWS has formed a strategic technology collaboration with NWCD to operate and provide services from the AWS China (Ningxia) Region. Founded in 2015, NWCD is a licensed datacenter and cloud services provider, based in Ningxia, China. NWCD joins Sinnet, the operator of the AWS China China (Beijing) Region, as an AWS operating partner in China. Through these relationships, AWS provides its industry-leading technology, guidance, and expertise to NWCD and Sinnet, while NWCD and Sinnet operate and provide AWS cloud services to local customers. While the cloud services offered in both AWS China Regions are the same as those available in other AWS Regions, the AWS China Regions are different in that they are isolated from all other AWS Regions and operated by AWS’s Chinese partners separately from all other AWS Regions. Customers using the AWS China Regions enter into customer agreements with Sinnet and NWCD, rather than with AWS.

Use it Today
The AWS China (Ningxia) Region, operated by NWCD, is open for business, and you can start using it now! Starting today, Chinese developers, startups, and enterprises, as well as government, education, and non-profit organizations, can leverage AWS to run their applications and store their data in the new AWS China (Ningxia) Region, operated by NWCD. Customers already using the AWS China (Beijing) Region, operated by Sinnet, can select the AWS China (Ningxia) Region directly from the AWS Management Console, while new customers can request an account at www.amazonaws.cn to begin using both AWS China Regions.

Jeff;

 

 

Hollywood and Netflix Ask Court to Seize Tickbox Streaming Devices

Post Syndicated from Ernesto original https://torrentfreak.com/hollywood-and-netflix-ask-court-to-seize-tickbox-streaming-devices-171209/

More and more people are starting to use Kodi-powered set-top boxes to stream video content to their TVs.

While Kodi itself is a neutral platform, sellers who ship devices with unauthorized add-ons give it a bad reputation.

According to the Alliance for Creativity and Entertainment (ACE), an anti-piracy partnership between Hollywood studios, Netflix, Amazon, and more than two dozen other companies, Tickbox TV is one of these bad actors.

Earlier this year, ACE filed a lawsuit against the Georgia-based company, which sells set-top boxes that allow users to stream a variety of popular media. The Tickbox devices use the Kodi media player and come with instructions on how to add various add-ons.

According to ACE, these devices are nothing more than pirate tools, allowing buyers to stream copyright infringing content. “TickBox promotes and distributes TickBox TV for infringing use, and that is exactly the result of its use,” they told court this week.

After the complaint was filed in October, Tickbox made some cosmetic changes to the site, removing some allegedly inducing language. The streaming devices are still for sale, however, but not for long if it’s up to the media giants.

This week ACE submitted a request for a preliminary injunction to the court, hoping to stop Tickbox’s sales activities.

“TickBox is intentionally inducing infringement, pure and simple. Plaintiffs respectfully request that the Court enter a preliminary injunction that requires TickBox to halt its flagrantly illegal conduct immediately,” they write in their application.

The companies explain that that since Tickbox is causing irreparable harm, all existing devices should be impounded.

“[A]ll TickBox TV devices in the possession of TickBox and all of its officers, directors, agents, servants, and employees, and all persons in active concert or participation or in privity with any of them are to be impounded and shall be retained by Defendant until further order of the Court,” the proposed order reads.

In addition, Tickbox should push out a software update which remove all infringing add-ons from the devices that were previously sold.

“TickBox shall, via software update, remove from all distributed TickBox TV devices all Kodi ‘Themes,’ ‘Builds,’ ‘Addons,’ or any other software that facilitates the infringing public performances of Plaintiffs’ Copyrighted Works.”

Among others, the list of allegedly infringing add-ons and themes includes Spinz, Lodi Black, Stream on Fire, Wookie, Aqua, CMM, Spanish Quasar, Paradox, Covenant, Elysium, UK Turk, Gurzil, Maverick, and Poseidon.

The filing shows that ACE is serious about its efforts to stop the sale of these type of streaming devices. Tickbox has yet to reply to the original complaint or the injunction request.

While this is the first US lawsuit of its kind, the anti-piracy conglomerate has been rather active in recent weeks. The group has successfully pressured several addon developers to quit and has been involved in enforcement actions around the globe.

A copy of the proposed preliminary injunction is available here (pdf).

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

CrimeStoppers Campaign Targets Pirate Set-Top Boxes & Their Users

Post Syndicated from Andy original https://torrentfreak.com/crimestoppers-campaign-targets-pirate-set-top-boxes-their-users-171209/

While many people might believe CrimeStoppers to be an official extension of the police in the UK, the truth is a little more subtle.

CrimeStoppers is a charity that operates a service through which members of the public can report crime anonymously, either using a dedicated phone line or via a website. Callers are not required to give their name, meaning that for those concerned about reprisals or becoming involved in a case for other sensitive reasons, it’s the perfect buffer between them and the authorities.

The people at CrimeStoppers deal with all kinds of crime but perhaps a little surprisingly, they’ve just got involved in the set-top box controversy in the UK.

“Advances in technology have allowed us to enjoy on-screen entertainment in more ways than ever before, with ever increasing amounts of exciting and original content,” the CrimeStoppers campaign begins.

“However, some people are avoiding paying for this content by using modified streaming hardware devices, like a set-top box or stick, in conjunction with software such as illegal apps or add-ons, or illegal mobile apps which allow them to watch new movie releases, TV that hasn’t yet aired, and subscription sports channels for free.”

The campaign has been launched in partnership with the Intellectual Property Office and unnamed “industry partners”. Who these companies are isn’t revealed but given the standard messages being portrayed by the likes of ACE, Premier League and Federation Against Copyright Theft lately, it wouldn’t be a surprise if some or all of them were involved.

Those messages are revealed in a series of four video ads, each taking a different approach towards discouraging the public from using devices loaded with pirate software.

The first video clearly targets the consumer, dispelling the myth that watching pirate video isn’t against the law. It is, that’s not in any doubt, but from the constant tone of the video, one could be forgiven that it’s an extremely serious crime rather than something which is likely to be a civil matter, if anything at all.

It also warns people who are configuring and selling pirate devices that they are breaking the law. Again, this is absolutely true but this activity is clearly several magnitudes more serious than simply viewing. The video blurs the boundaries for what appears to be dramatic effect, however.

Selling and watching is illegal

The second video is all about demonizing the people and groups who may offer set-top boxes to the public.

Instead of portraying the hundreds of “cottage industry” suppliers behind many set-top box sales in the UK, the CrimeStoppers video paints a picture of dark organized crime being the main driver. By buying from these people, the charity warns, criminals are being welcomed in.

“It is illegal. You could also be helping to fund organized crime and bringing it into your community,” the video warns.

Are you funding organized crime?

The third video takes another approach, warning that set-top boxes have few if any parental controls. This could lead to children being exposed to inappropriate content, the charity warns.

“What are your children watching. Does it worry you?” the video asks.

Of course, the same can be said about the Internet, period. Web browsers don’t filter what content children have access to unless parents take pro-active steps to configure special services or software for the purpose.

There’s always the option to supervise children, of course, but Netflix is probably a safer option for those with a preference to stand off. It’s also considerably more expensive, a fact that won’t have escaped users of these devices.

Got kids? Take care….

Finally, video four picks up a theme that’s becoming increasingly common in anti-piracy campaigns – malware and identity theft.

“Why risk having your identity stolen or your bank account or home network hacked. If you access entertainment or sports using dodgy streaming devices or apps, or illegal addons for Kodi, you are increasing the risks,” the ad warns.

Danger….Danger….

Perhaps of most interest is that this entire campaign, which almost certainly has Big Media behind the scenes in advisory and financial capacities, barely mentions the entertainment industries at all.

Indeed, the success of the whole campaign hinges on people worrying about the supposed ill effects of illicit streaming on them personally and then feeling persuaded to inform on suppliers and others involved in the chain.

“Know of someone supplying or promoting these dodgy devices or software? It is illegal. Call us now and help stop crime in your community,” the videos warn.

That CrimeStoppers has taken on this campaign at all is a bit of a head-scratcher, given the bigger crime picture. Struggling with severe budget cuts, police in the UK are already de-prioritizing a number of crimes, leading to something called “screening out”, a process through which victims are given a crime number but no investigation is carried out.

This means that in 2016, 45% of all reported crimes in Greater Manchester weren’t investigated and a staggering 57% of all recorded domestic burglaries weren’t followed up by the police. But it gets worse.

“More than 62pc of criminal damage and arson offenses were not investigated, along with one in three reported shoplifting incidents,” MEN reports.

Given this backdrop, how will police suddenly find the resources to follow up lots of leads from the public and then subsequently prosecute people who sell pirate boxes? Even if they do, will that be at the expense of yet more “screening out” of other public-focused offenses?

No one is saying that selling pirate devices isn’t a crime or at least worthy of being followed up, but is this niche likely to be important to the public when they’re being told that nothing will be done when their homes are emptied by intruders? “NO” says a comment on one of the CrimeStoppers videos on YouTube.

“This crime affects multi-million dollar corporations, I’d rather see tax payers money invested on videos raising awareness of crimes committed against the people rather than the 0.001%,” it concludes.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Is blockchain a security topic? (Opensource.com)

Post Syndicated from jake original https://lwn.net/Articles/740929/rss

At Opensource.com, Mike Bursell looks at blockchain security from the angle of trust. Unlike cryptocurrencies, which are pseudonymous typically, other kinds of blockchains will require mapping users to real-life identities; that raises the trust issue.

What’s really interesting is that, if you’re thinking about moving to a permissioned blockchain or distributed ledger with permissioned actors, then you’re going to have to spend some time thinking about trust. You’re unlikely to be using a proof-of-work system for making blocks—there’s little point in a permissioned system—so who decides what comprises a “valid” block that the rest of the system should agree on? Well, you can rotate around some (or all) of the entities, or you can have a random choice, or you can elect a small number of über-trusted entities. Combinations of these schemes may also work.

If these entities all exist within one trust domain, which you control, then fine, but what if they’re distributors, or customers, or partners, or other banks, or manufacturers, or semi-autonomous drones, or vehicles in a commercial fleet? You really need to ensure that the trust relationships that you’re encoding into your implementation/deployment truly reflect the legal and IRL [in real life] trust relationships that you have with the entities that are being represented in your system.

And the problem is that, once you’ve deployed that system, it’s likely to be very difficult to backtrack, adjust, or reset the trust relationships that you’ve designed.”

Coalition Against Piracy Wants Singapore to Block Streaming Piracy Software

Post Syndicated from Andy original https://torrentfreak.com/coalition-against-piracy-wants-singapore-to-block-streaming-piracy-software-171204/

Earlier this year, major industry players including Disney, HBO, Netflix, Amazon and NBCUniversal formed the Alliance for Creativity and Entertainment (ACE), a huge coalition set to tackle piracy on a global scale.

Shortly after the Coalition Against Piracy (CAP) was announced. With a focus on Asia and backed by CASBAA, CAP counts Disney, Fox, HBO Asia, NBCUniversal, Premier League, Turner Asia-Pacific, A&E Networks, BBC Worldwide, National Basketball Association, Viacom International, and others among its members.

In several recent reports, CAP has homed in on the piracy situation in Singapore. Describing the phenomenon as “rampant”, the group says that around 40% of locals engage in the practice, many of them through unlicensed streaming. Now CAP, in line with its anti-streaming stance, wants the government to do more – much more.

Since a large proportion of illicit streaming takes place through set-top devices, CAP’s 21 members want the authorities to block the software inside them that enables piracy, Straits Times reports.

“Within the Asia-Pacific region, Singapore is the worst in terms of availability of illicit streaming devices,” said CAP General Manager Neil Gane.

“They have access to hundreds of illicit broadcasts of channels and video-on-demand content.”

There are no precise details on CAP’s demands but it is far from clear how any government could effectively block software.

Blocking access to the software package itself would prove all but impossible, so that would leave blocking the infrastructure the software uses. While that would be relatively straightforward technically, the job would be large and fast-moving, particularly when dozens of apps and addons would need to be targeted.

However, CAP is also calling on the authorities to block pirate streams from entering Singapore. The country already has legislation in place that can be used for site-blocking, so that is not out of the question. It’s notable that the English Premier League is part of the CAP coalition and following legal action taken in the UK earlier this year, now has plenty of experience in blocking streams, particularly of live broadcasts.

While that is a game of cat-and-mouse, TorrentFreak sources that have been monitoring the Premier League’s actions over the past several months report that the soccer outfit has become more effective over time. Its blocks can still be evaded but it can be hard work for those involved. That kind of expertise could prove invaluable to CAP.

“The Premier League is currently engaged in its most comprehensive global anti-piracy programme,” a spokesperson told ST. “This includes supporting our broadcast partners in South-east Asia with their efforts to prevent the sale of illicit streaming devices.”

In common with other countries around the world, the legality of using ‘pirate’ streaming boxes is somewhat unclear in Singapore. A Bloomberg report cites a local salesman who reports sales of 10 to 20 boxes on a typical weekend, rising to 300 a day during electronic fairs. He believes the devices are legal, since they don’t download full copies of programs.

While that point is yet to be argued in court (previously an Intellectual Property Office of Singapore spokesperson said that copyright owners could potentially go after viewers), it seems unlikely that those selling the devices will be allowed to continue completely unhindered. The big question is how current legislation can be successfully applied.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

NSA "Red Disk" Data Leak

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/11/nsa_red_disk_da.html

ZDNet is reporting about another data leak, this one from US Army’s Intelligence and Security Command (INSCOM), which is also within to the NSA.

The disk image, when unpacked and loaded, is a snapshot of a hard drive dating back to May 2013 from a Linux-based server that forms part of a cloud-based intelligence sharing system, known as Red Disk. The project, developed by INSCOM’s Futures Directorate, was slated to complement the Army’s so-called distributed common ground system (DCGS), a legacy platform for processing and sharing intelligence, surveillance, and reconnaissance information.

[…]

Red Disk was envisioned as a highly customizable cloud system that could meet the demands of large, complex military operations. The hope was that Red Disk could provide a consistent picture from the Pentagon to deployed soldiers in the Afghan battlefield, including satellite images and video feeds from drones trained on terrorists and enemy fighters, according to a Foreign Policy report.

[…]

Red Disk was a modular, customizable, and scalable system for sharing intelligence across the battlefield, like electronic intercepts, drone footage and satellite imagery, and classified reports, for troops to access with laptops and tablets on the battlefield. Marking files found in several directories imply the disk is “top secret,” and restricted from being shared to foreign intelligence partners.

A couple of points. One, this isn’t particularly sensitive. It’s an intelligence distribution system under development. It’s not raw intelligence. Two, this doesn’t seem to be classified data. Even the article hedges, using the unofficial term of “highly sensitive.” Three, it doesn’t seem that Chris Vickery, the researcher that discovered the data, has published it.

Chris Vickery, director of cyber risk research at security firm UpGuard, found the data and informed the government of the breach in October. The storage server was subsequently secured, though its owner remains unknown.

This doesn’t feel like a big deal to me.

Slashdot thread.

Glenn’s Take on re:Invent 2017 Part 1

Post Syndicated from Glenn Gore original https://aws.amazon.com/blogs/architecture/glenns-take-on-reinvent-2017-part-1/

GREETINGS FROM LAS VEGAS

Glenn Gore here, Chief Architect for AWS. I’m in Las Vegas this week — with 43K others — for re:Invent 2017. We have a lot of exciting announcements this week. I’m going to post to the AWS Architecture blog each day with my take on what’s interesting about some of the announcements from a cloud architectural perspective.

Why not start at the beginning? At the Midnight Madness launch on Sunday night, we announced Amazon Sumerian, our platform for VR, AR, and mixed reality. The hype around VR/AR has existed for many years, though for me, it is a perfect example of how a working end-to-end solution often requires innovation from multiple sources. For AR/VR to be successful, we need many components to come together in a coherent manner to provide a great experience.

First, we need lightweight, high-definition goggles with motion tracking that are comfortable to wear. Second, we need to track movement of our body and hands in a 3-D space so that we can interact with virtual objects in the virtual world. Third, we need to build the virtual world itself and populate it with assets and define how the interactions will work and connect with various other systems.

There has been rapid development of the physical devices for AR/VR, ranging from iOS devices to Oculus Rift and HTC Vive, which provide excellent capabilities for the first and second components defined above. With the launch of Amazon Sumerian we are solving for the third area, which will help developers easily build their own virtual worlds and start experimenting and innovating with how to apply AR/VR in new ways.

Already, within 48 hours of Amazon Sumerian being announced, I have had multiple discussions with customers and partners around some cool use cases where VR can help in training simulations, remote-operator controls, or with new ideas around interacting with complex visual data sets, which starts bringing concepts straight out of sci-fi movies into the real (virtual) world. I am really excited to see how Sumerian will unlock the creative potential of developers and where this will lead.

Amazon MQ
I am a huge fan of distributed architectures where asynchronous messaging is the backbone of connecting the discrete components together. Amazon Simple Queue Service (Amazon SQS) is one of my favorite services due to its simplicity, scalability, performance, and the incredible flexibility of how you can use Amazon SQS in so many different ways to solve complex queuing scenarios.

While Amazon SQS is easy to use when building cloud-native applications on AWS, many of our customers running existing applications on-premises required support for different messaging protocols such as: Java Message Service (JMS), .Net Messaging Service (NMS), Advanced Message Queuing Protocol (AMQP), MQ Telemetry Transport (MQTT), Simple (or Streaming) Text Orientated Messaging Protocol (STOMP), and WebSockets. One of the most popular applications for on-premise message brokers is Apache ActiveMQ. With the release of Amazon MQ, you can now run Apache ActiveMQ on AWS as a managed service similar to what we did with Amazon ElastiCache back in 2012. For me, there are two compelling, major benefits that Amazon MQ provides:

  • Integrate existing applications with cloud-native applications without having to change a line of application code if using one of the supported messaging protocols. This removes one of the biggest blockers for integration between the old and the new.
  • Remove the complexity of configuring Multi-AZ resilient message broker services as Amazon MQ provides out-of-the-box redundancy by always storing messages redundantly across Availability Zones. Protection is provided against failure of a broker through to complete failure of an Availability Zone.

I believe that Amazon MQ is a major component in the tools required to help you migrate your existing applications to AWS. Having set up cross-data center Apache ActiveMQ clusters in the past myself and then testing to ensure they work as expected during critical failure scenarios, technical staff working on migrations to AWS benefit from the ease of deploying a fully redundant, managed Apache ActiveMQ cluster within minutes.

Who would have thought I would have been so excited to revisit Apache ActiveMQ in 2017 after using SQS for many, many years? Choice is a wonderful thing.

Amazon GuardDuty
Maintaining application and information security in the modern world is increasingly complex and is constantly evolving and changing as new threats emerge. This is due to the scale, variety, and distribution of services required in a competitive online world.

At Amazon, security is our number one priority. Thus, we are always looking at how we can increase security detection and protection while simplifying the implementation of advanced security practices for our customers. As a result, we released Amazon GuardDuty, which provides intelligent threat detection by using a combination of multiple information sources, transactional telemetry, and the application of machine learning models developed by AWS. One of the biggest benefits of Amazon GuardDuty that I appreciate is that enabling this service requires zero software, agents, sensors, or network choke points. which can all impact performance or reliability of the service you are trying to protect. Amazon GuardDuty works by monitoring your VPC flow logs, AWS CloudTrail events, DNS logs, as well as combing other sources of security threats that AWS is aggregating from our own internal and external sources.

The use of machine learning in Amazon GuardDuty allows it to identify changes in behavior, which could be suspicious and require additional investigation. Amazon GuardDuty works across all of your AWS accounts allowing for an aggregated analysis and ensuring centralized management of detected threats across accounts. This is important for our larger customers who can be running many hundreds of AWS accounts across their organization, as providing a single common threat detection of their organizational use of AWS is critical to ensuring they are protecting themselves.

Detection, though, is only the beginning of what Amazon GuardDuty enables. When a threat is identified in Amazon GuardDuty, you can configure remediation scripts or trigger Lambda functions where you have custom responses that enable you to start building automated responses to a variety of different common threats. Speed of response is required when a security incident may be taking place. For example, Amazon GuardDuty detects that an Amazon Elastic Compute Cloud (Amazon EC2) instance might be compromised due to traffic from a known set of malicious IP addresses. Upon detection of a compromised EC2 instance, we could apply an access control entry restricting outbound traffic for that instance, which stops loss of data until a security engineer can assess what has occurred.

Whether you are a customer running a single service in a single account, or a global customer with hundreds of accounts with thousands of applications, or a startup with hundreds of micro-services with hourly release cycle in a devops world, I recommend enabling Amazon GuardDuty. We have a 30-day free trial available for all new customers of this service. As it is a monitor of events, there is no change required to your architecture within AWS.

Stay tuned for tomorrow’s post on AWS Media Services and Amazon Neptune.

 

Glenn during the Tour du Mont Blanc

AWS Fargate: A Product Overview

Post Syndicated from Deepak Dayama original https://aws.amazon.com/blogs/compute/aws-fargate-a-product-overview/

It was just about three years ago that AWS announced Amazon Elastic Container Service (Amazon ECS), to run and manage containers at scale on AWS. With Amazon ECS, you’ve been able to run your workloads at high scale and availability without having to worry about running your own cluster management and container orchestration software.

Today, AWS announced the availability of AWS Fargate – a technology that enables you to use containers as a fundamental compute primitive without having to manage the underlying instances. With Fargate, you don’t need to provision, configure, or scale virtual machines in your clusters to run containers. Fargate can be used with Amazon ECS today, with plans to support Amazon Elastic Container Service for Kubernetes (Amazon EKS) in the future.

Fargate has flexible configuration options so you can closely match your application needs and granular, per-second billing.

Amazon ECS with Fargate

Amazon ECS enables you to run containers at scale. This service also provides native integration into the AWS platform with VPC networking, load balancing, IAM, Amazon CloudWatch Logs, and CloudWatch metrics. These deep integrations make the Amazon ECS task a first-class object within the AWS platform.

To run tasks, you first need to stand up a cluster of instances, which involves picking the right types of instances and sizes, setting up Auto Scaling, and right-sizing the cluster for performance. With Fargate, you can leave all that behind and focus on defining your application and policies around permissions and scaling.

The same container management capabilities remain available so you can continue to scale your container deployments. With Fargate, the only entity to manage is the task. You don’t need to manage the instances or supporting software like Docker daemon or the Amazon ECS agent.

Fargate capabilities are available natively within Amazon ECS. This means that you don’t need to learn new API actions or primitives to run containers on Fargate.

Using Amazon ECS, Fargate is a launch type option. You continue to define the applications the same way by using task definitions. In contrast, the EC2 launch type gives you more control of your server clusters and provides a broader range of customization options.

For example, a RunTask command example is pasted below with the Fargate launch type:

ecs run-task --launch-type FARGATE --cluster fargate-test --task-definition nginx --network-configuration
"awsvpcConfiguration={subnets=[subnet-b563fcd3]}"

Key features of Fargate

Resource-based pricing and per second billing
You pay by the task size and only for the time for which resources are consumed by the task. The price for CPU and memory is charged on a per-second basis. There is a one-minute minimum charge.

Flexible configurations options
Fargate is available with 50 different combinations of CPU and memory to closely match your application needs. You can use 2 GB per vCPU anywhere up to 8 GB per vCPU for various configurations. Match your workload requirements closely, whether they are general purpose, compute, or memory optimized.

Networking
All Fargate tasks run within your own VPC. Fargate supports the recently launched awsvpc networking mode and the elastic network interface for a task is visible in the subnet where the task is running. This provides the separation of responsibility so you retain full control of networking policies for your applications via VPC features like security groups, routing rules, and NACLs. Fargate also supports public IP addresses.

Load Balancing
ECS Service Load Balancing  for the Application Load Balancer and Network Load Balancer is supported. For the Fargate launch type, you specify the IP addresses of the Fargate tasks to register with the load balancers.

Permission tiers
Even though there are no instances to manage with Fargate, you continue to group tasks into logical clusters. This allows you to manage who can run or view services within the cluster. The task IAM role is still applicable. Additionally, there is a new Task Execution Role that grants Amazon ECS permissions to perform operations such as pushing logs to CloudWatch Logs or pulling image from Amazon Elastic Container Registry (Amazon ECR).

Container Registry Support
Fargate provides seamless authentication to help pull images from Amazon ECR via the Task Execution Role. Similarly, if you are using a public repository like DockerHub, you can continue to do so.

Amazon ECS CLI
The Amazon ECS CLI provides high-level commands to help simplify to create and run Amazon ECS clusters, tasks, and services. The latest version of the CLI now supports running tasks and services with Fargate.

EC2 and Fargate Launch Type Compatibility
All Amazon ECS clusters are heterogeneous – you can run both Fargate and Amazon ECS tasks in the same cluster. This enables teams working on different applications to choose their own cadence of moving to Fargate, or to select a launch type that meets their requirements without breaking the existing model. You can make an existing ECS task definition compatible with the Fargate launch type and run it as a Fargate service, and vice versa. Choosing a launch type is not a one-way door!

Logging and Visibility
With Fargate, you can send the application logs to CloudWatch logs. Service metrics (CPU and Memory utilization) are available as part of CloudWatch metrics. AWS partners for visibility, monitoring and application performance management including Datadog, Aquasec, Splunk, Twistlock, and New Relic also support Fargate tasks.

Conclusion

Fargate enables you to run containers without having to manage the underlying infrastructure. Today, Fargate is availabe for Amazon ECS, and in 2018, Amazon EKS. Visit the Fargate product page to learn more, or get started in the AWS Console.

–Deepak Dayama

Using Amazon Redshift Spectrum, Amazon Athena, and AWS Glue with Node.js in Production

Post Syndicated from Rafi Ton original https://aws.amazon.com/blogs/big-data/using-amazon-redshift-spectrum-amazon-athena-and-aws-glue-with-node-js-in-production/

This is a guest post by Rafi Ton, founder and CEO of NUVIAD. NUVIAD is, in their own words, “a mobile marketing platform providing professional marketers, agencies and local businesses state of the art tools to promote their products and services through hyper targeting, big data analytics and advanced machine learning tools.”

At NUVIAD, we’ve been using Amazon Redshift as our main data warehouse solution for more than 3 years.

We store massive amounts of ad transaction data that our users and partners analyze to determine ad campaign strategies. When running real-time bidding (RTB) campaigns in large scale, data freshness is critical so that our users can respond rapidly to changes in campaign performance. We chose Amazon Redshift because of its simplicity, scalability, performance, and ability to load new data in near real time.

Over the past three years, our customer base grew significantly and so did our data. We saw our Amazon Redshift cluster grow from three nodes to 65 nodes. To balance cost and analytics performance, we looked for a way to store large amounts of less-frequently analyzed data at a lower cost. Yet, we still wanted to have the data immediately available for user queries and to meet their expectations for fast performance. We turned to Amazon Redshift Spectrum.

In this post, I explain the reasons why we extended Amazon Redshift with Redshift Spectrum as our modern data warehouse. I cover how our data growth and the need to balance cost and performance led us to adopt Redshift Spectrum. I also share key performance metrics in our environment, and discuss the additional AWS services that provide a scalable and fast environment, with data available for immediate querying by our growing user base.

Amazon Redshift as our foundation

The ability to provide fresh, up-to-the-minute data to our customers and partners was always a main goal with our platform. We saw other solutions provide data that was a few hours old, but this was not good enough for us. We insisted on providing the freshest data possible. For us, that meant loading Amazon Redshift in frequent micro batches and allowing our customers to query Amazon Redshift directly to get results in near real time.

The benefits were immediately evident. Our customers could see how their campaigns performed faster than with other solutions, and react sooner to the ever-changing media supply pricing and availability. They were very happy.

However, this approach required Amazon Redshift to store a lot of data for long periods, and our data grew substantially. In our peak, we maintained a cluster running 65 DC1.large nodes. The impact on our Amazon Redshift cluster was evident, and we saw our CPU utilization grow to 90%.

Why we extended Amazon Redshift to Redshift Spectrum

Redshift Spectrum gives us the ability to run SQL queries using the powerful Amazon Redshift query engine against data stored in Amazon S3, without needing to load the data. With Redshift Spectrum, we store data where we want, at the cost that we want. We have the data available for analytics when our users need it with the performance they expect.

Seamless scalability, high performance, and unlimited concurrency

Scaling Redshift Spectrum is a simple process. First, it allows us to leverage Amazon S3 as the storage engine and get practically unlimited data capacity.

Second, if we need more compute power, we can leverage Redshift Spectrum’s distributed compute engine over thousands of nodes to provide superior performance – perfect for complex queries running against massive amounts of data.

Third, all Redshift Spectrum clusters access the same data catalog so that we don’t have to worry about data migration at all, making scaling effortless and seamless.

Lastly, since Redshift Spectrum distributes queries across potentially thousands of nodes, they are not affected by other queries, providing much more stable performance and unlimited concurrency.

Keeping it SQL

Redshift Spectrum uses the same query engine as Amazon Redshift. This means that we did not need to change our BI tools or query syntax, whether we used complex queries across a single table or joins across multiple tables.

An interesting capability introduced recently is the ability to create a view that spans both Amazon Redshift and Redshift Spectrum external tables. With this feature, you can query frequently accessed data in your Amazon Redshift cluster and less-frequently accessed data in Amazon S3, using a single view.

Leveraging Parquet for higher performance

Parquet is a columnar data format that provides superior performance and allows Redshift Spectrum (or Amazon Athena) to scan significantly less data. With less I/O, queries run faster and we pay less per query. You can read all about Parquet at https://parquet.apache.org/ or https://en.wikipedia.org/wiki/Apache_Parquet.

Lower cost

From a cost perspective, we pay standard rates for our data in Amazon S3, and only small amounts per query to analyze data with Redshift Spectrum. Using the Parquet format, we can significantly reduce the amount of data scanned. Our costs are now lower, and our users get fast results even for large complex queries.

What we learned about Amazon Redshift vs. Redshift Spectrum performance

When we first started looking at Redshift Spectrum, we wanted to put it to the test. We wanted to know how it would compare to Amazon Redshift, so we looked at two key questions:

  1. What is the performance difference between Amazon Redshift and Redshift Spectrum on simple and complex queries?
  2. Does the data format impact performance?

During the migration phase, we had our dataset stored in Amazon Redshift and S3 as CSV/GZIP and as Parquet file formats. We tested three configurations:

  • Amazon Redshift cluster with 28 DC1.large nodes
  • Redshift Spectrum using CSV/GZIP
  • Redshift Spectrum using Parquet

We performed benchmarks for simple and complex queries on one month’s worth of data. We tested how much time it took to perform the query, and how consistent the results were when running the same query multiple times. The data we used for the tests was already partitioned by date and hour. Properly partitioning the data improves performance significantly and reduces query times.

Simple query

First, we tested a simple query aggregating billing data across a month:

SELECT 
  user_id, 
  count(*) AS impressions, 
  SUM(billing)::decimal /1000000 AS billing 
FROM <table_name> 
WHERE 
  date >= '2017-08-01' AND 
  date <= '2017-08-31'  
GROUP BY 
  user_id;

We ran the same query seven times and measured the response times (red marking the longest time and green the shortest time):

Execution Time (seconds)
  Amazon Redshift Redshift Spectrum
CSV
Redshift Spectrum Parquet
Run #1 39.65 45.11 11.92
Run #2 15.26 43.13 12.05
Run #3 15.27 46.47 13.38
Run #4 21.22 51.02 12.74
Run #5 17.27 43.35 11.76
Run #6 16.67 44.23 13.67
Run #7 25.37 40.39 12.75
Average 21.53  44.82 12.61

For simple queries, Amazon Redshift performed better than Redshift Spectrum, as we thought, because the data is local to Amazon Redshift.

What was surprising was that using Parquet data format in Redshift Spectrum significantly beat ‘traditional’ Amazon Redshift performance. For our queries, using Parquet data format with Redshift Spectrum delivered an average 40% performance gain over traditional Amazon Redshift. Furthermore, Redshift Spectrum showed high consistency in execution time with a smaller difference between the slowest run and the fastest run.

Comparing the amount of data scanned when using CSV/GZIP and Parquet, the difference was also significant:

Data Scanned (GB)
CSV (Gzip) 135.49
Parquet 2.83

Because we pay only for the data scanned by Redshift Spectrum, the cost saving of using Parquet is evident and substantial.

Complex query

Next, we compared the same three configurations with a complex query.

Execution Time (seconds)
  Amazon Redshift Redshift Spectrum CSV Redshift Spectrum Parquet
Run #1 329.80 84.20 42.40
Run #2 167.60 65.30 35.10
Run #3 165.20 62.20 23.90
Run #4 273.90 74.90 55.90
Run #5 167.70 69.00 58.40
Average 220.84 71.12 43.14

This time, Redshift Spectrum using Parquet cut the average query time by 80% compared to traditional Amazon Redshift!

Bottom line: For complex queries, Redshift Spectrum provided a 67% performance gain over Amazon Redshift. Using the Parquet data format, Redshift Spectrum delivered an 80% performance improvement over Amazon Redshift. For us, this was substantial.

Optimizing the data structure for different workloads

Because the cost of S3 is relatively inexpensive and we pay only for the data scanned by each query, we believe that it makes sense to keep our data in different formats for different workloads and different analytics engines. It is important to note that we can have any number of tables pointing to the same data on S3. It all depends on how we partition the data and update the table partitions.

Data permutations

For example, we have a process that runs every minute and generates statistics for the last minute of data collected. With Amazon Redshift, this would be done by running the query on the table with something as follows:

SELECT 
  user, 
  COUNT(*) 
FROM 
  events_table 
WHERE 
  ts BETWEEN ‘2017-08-01 14:00:00’ AND ‘2017-08-01 14:00:59’ 
GROUP BY 
  user;

(Assuming ‘ts’ is your column storing the time stamp for each event.)

With Redshift Spectrum, we pay for the data scanned in each query. If the data is partitioned by the minute instead of the hour, a query looking at one minute would be 1/60th the cost. If we use a temporary table that points only to the data of the last minute, we save that unnecessary cost.

Creating Parquet data efficiently

On the average, we have 800 instances that process our traffic. Each instance sends events that are eventually loaded into Amazon Redshift. When we started three years ago, we would offload data from each server to S3 and then perform a periodic copy command from S3 to Amazon Redshift.

Recently, Amazon Kinesis Firehose added the capability to offload data directly to Amazon Redshift. While this is now a viable option, we kept the same collection process that worked flawlessly and efficiently for three years.

This changed, however, when we incorporated Redshift Spectrum. With Redshift Spectrum, we needed to find a way to:

  • Collect the event data from the instances.
  • Save the data in Parquet format.
  • Partition the data effectively.

To accomplish this, we save the data as CSV and then transform it to Parquet. The most effective method to generate the Parquet files is to:

  1. Send the data in one-minute intervals from the instances to Kinesis Firehose with an S3 temporary bucket as the destination.
  2. Aggregate hourly data and convert it to Parquet using AWS Lambda and AWS Glue.
  3. Add the Parquet data to S3 by updating the table partitions.

With this new process, we had to give more attention to validating the data before we sent it to Kinesis Firehose, because a single corrupted record in a partition fails queries on that partition.

Data validation

To store our click data in a table, we considered the following SQL create table command:

create external TABLE spectrum.blog_clicks (
    user_id varchar(50),
    campaign_id varchar(50),
    os varchar(50),
    ua varchar(255),
    ts bigint,
    billing float
)
partitioned by (date date, hour smallint)  
stored as parquet
location 's3://nuviad-temp/blog/clicks/';

The above statement defines a new external table (all Redshift Spectrum tables are external tables) with a few attributes. We stored ‘ts’ as a Unix time stamp and not as Timestamp, and billing data is stored as float and not decimal (more on that later). We also said that the data is partitioned by date and hour, and then stored as Parquet on S3.

First, we need to get the table definitions. This can be achieved by running the following query:

SELECT 
  * 
FROM 
  svv_external_columns 
WHERE 
  tablename = 'blog_clicks';

This query lists all the columns in the table with their respective definitions:

schemaname tablename columnname external_type columnnum part_key
spectrum blog_clicks user_id varchar(50) 1 0
spectrum blog_clicks campaign_id varchar(50) 2 0
spectrum blog_clicks os varchar(50) 3 0
spectrum blog_clicks ua varchar(255) 4 0
spectrum blog_clicks ts bigint 5 0
spectrum blog_clicks billing double 6 0
spectrum blog_clicks date date 7 1
spectrum blog_clicks hour smallint 8 2

Now we can use this data to create a validation schema for our data:

const rtb_request_schema = {
    "name": "clicks",
    "items": {
        "user_id": {
            "type": "string",
            "max_length": 100
        },
        "campaign_id": {
            "type": "string",
            "max_length": 50
        },
        "os": {
            "type": "string",
            "max_length": 50            
        },
        "ua": {
            "type": "string",
            "max_length": 255            
        },
        "ts": {
            "type": "integer",
            "min_value": 0,
            "max_value": 9999999999999
        },
        "billing": {
            "type": "float",
            "min_value": 0,
            "max_value": 9999999999999
        }
    }
};

Next, we create a function that uses this schema to validate data:

function valueIsValid(value, item_schema) {
    if (schema.type == 'string') {
        return (typeof value == 'string' && value.length <= schema.max_length);
    }
    else if (schema.type == 'integer') {
        return (typeof value == 'number' && value >= schema.min_value && value <= schema.max_value);
    }
    else if (schema.type == 'float' || schema.type == 'double') {
        return (typeof value == 'number' && value >= schema.min_value && value <= schema.max_value);
    }
    else if (schema.type == 'boolean') {
        return typeof value == 'boolean';
    }
    else if (schema.type == 'timestamp') {
        return (new Date(value)).getTime() > 0;
    }
    else {
        return true;
    }
}

Near real-time data loading with Kinesis Firehose

On Kinesis Firehose, we created a new delivery stream to handle the events as follows:

Delivery stream name: events
Source: Direct PUT
S3 bucket: nuviad-events
S3 prefix: rtb/
IAM role: firehose_delivery_role_1
Data transformation: Disabled
Source record backup: Disabled
S3 buffer size (MB): 100
S3 buffer interval (sec): 60
S3 Compression: GZIP
S3 Encryption: No Encryption
Status: ACTIVE
Error logging: Enabled

This delivery stream aggregates event data every minute, or up to 100 MB, and writes the data to an S3 bucket as a CSV/GZIP compressed file. Next, after we have the data validated, we can safely send it to our Kinesis Firehose API:

if (validated) {
    let itemString = item.join('|')+'\n'; //Sending csv delimited by pipe and adding new line

    let params = {
        DeliveryStreamName: 'events',
        Record: {
            Data: itemString
        }
    };

    firehose.putRecord(params, function(err, data) {
        if (err) {
            console.error(err, err.stack);        
        }
        else {
            // Continue to your next step 
        }
    });
}

Now, we have a single CSV file representing one minute of event data stored in S3. The files are named automatically by Kinesis Firehose by adding a UTC time prefix in the format YYYY/MM/DD/HH before writing objects to S3. Because we use the date and hour as partitions, we need to change the file naming and location to fit our Redshift Spectrum schema.

Automating data distribution using AWS Lambda

We created a simple Lambda function triggered by an S3 put event that copies the file to a different location (or locations), while renaming it to fit our data structure and processing flow. As mentioned before, the files generated by Kinesis Firehose are structured in a pre-defined hierarchy, such as:

S3://your-bucket/your-prefix/2017/08/01/20/events-4-2017-08-01-20-06-06-536f5c40-6893-4ee4-907d-81e4d3b09455.gz

All we need to do is parse the object name and restructure it as we see fit. In our case, we did the following (the event is an object received in the Lambda function with all the data about the object written to S3):

/*
	object key structure in the event object:
your-prefix/2017/08/01/20/event-4-2017-08-01-20-06-06-536f5c40-6893-4ee4-907d-81e4d3b09455.gz
	*/

let key_parts = event.Records[0].s3.object.key.split('/'); 

let event_type = key_parts[0];
let date = key_parts[1] + '-' + key_parts[2] + '-' + key_parts[3];
let hour = key_parts[4];
if (hour.indexOf('0') == 0) {
 		hour = parseInt(hour, 10) + '';
}
    
let parts1 = key_parts[5].split('-');
let minute = parts1[7];
if (minute.indexOf('0') == 0) {
        minute = parseInt(minute, 10) + '';
}

Now, we can redistribute the file to the two destinations we need—one for the minute processing task and the other for hourly aggregation:

    copyObjectToHourlyFolder(event, date, hour, minute)
        .then(copyObjectToMinuteFolder.bind(null, event, date, hour, minute))
        .then(addPartitionToSpectrum.bind(null, event, date, hour, minute))
        .then(deleteOldMinuteObjects.bind(null, event))
        .then(deleteStreamObject.bind(null, event))        
        .then(result => {
            callback(null, { message: 'done' });            
        })
        .catch(err => {
            console.error(err);
            callback(null, { message: err });            
        }); 

Kinesis Firehose stores the data in a temporary folder. We copy the object to another folder that holds the data for the last processed minute. This folder is connected to a small Redshift Spectrum table where the data is being processed without needing to scan a much larger dataset. We also copy the data to a folder that holds the data for the entire hour, to be later aggregated and converted to Parquet.

Because we partition the data by date and hour, we created a new partition on the Redshift Spectrum table if the processed minute is the first minute in the hour (that is, minute 0). We ran the following:

ALTER TABLE 
  spectrum.events 
ADD partition
  (date='2017-08-01', hour=0) 
  LOCATION 's3://nuviad-temp/events/2017-08-01/0/';

After the data is processed and added to the table, we delete the processed data from the temporary Kinesis Firehose storage and from the minute storage folder.

Migrating CSV to Parquet using AWS Glue and Amazon EMR

The simplest way we found to run an hourly job converting our CSV data to Parquet is using Lambda and AWS Glue (and thanks to the awesome AWS Big Data team for their help with this).

Creating AWS Glue jobs

What this simple AWS Glue script does:

  • Gets parameters for the job, date, and hour to be processed
  • Creates a Spark EMR context allowing us to run Spark code
  • Reads CSV data into a DataFrame
  • Writes the data as Parquet to the destination S3 bucket
  • Adds or modifies the Redshift Spectrum / Amazon Athena table partition for the table
import sys
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import boto3

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME','day_partition_key', 'hour_partition_key', 'day_partition_value', 'hour_partition_value' ])

#day_partition_key = "partition_0"
#hour_partition_key = "partition_1"
#day_partition_value = "2017-08-01"
#hour_partition_value = "0"

day_partition_key = args['day_partition_key']
hour_partition_key = args['hour_partition_key']
day_partition_value = args['day_partition_value']
hour_partition_value = args['hour_partition_value']

print("Running for " + day_partition_value + "/" + hour_partition_value)

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

df = spark.read.option("delimiter","|").csv("s3://nuviad-temp/events/"+day_partition_value+"/"+hour_partition_value)
df.registerTempTable("data")

df1 = spark.sql("select _c0 as user_id, _c1 as campaign_id, _c2 as os, _c3 as ua, cast(_c4 as bigint) as ts, cast(_c5 as double) as billing from data")

df1.repartition(1).write.mode("overwrite").parquet("s3://nuviad-temp/parquet/"+day_partition_value+"/hour="+hour_partition_value)

client = boto3.client('athena', region_name='us-east-1')

response = client.start_query_execution(
    QueryString='alter table parquet_events add if not exists partition(' + day_partition_key + '=\'' + day_partition_value + '\',' + hour_partition_key + '=' + hour_partition_value + ')  location \'s3://nuviad-temp/parquet/' + day_partition_value + '/hour=' + hour_partition_value + '\'' ,
    QueryExecutionContext={
        'Database': 'spectrumdb'
    },
    ResultConfiguration={
        'OutputLocation': 's3://nuviad-temp/convertresults'
    }
)

response = client.start_query_execution(
    QueryString='alter table parquet_events partition(' + day_partition_key + '=\'' + day_partition_value + '\',' + hour_partition_key + '=' + hour_partition_value + ') set location \'s3://nuviad-temp/parquet/' + day_partition_value + '/hour=' + hour_partition_value + '\'' ,
    QueryExecutionContext={
        'Database': 'spectrumdb'
    },
    ResultConfiguration={
        'OutputLocation': 's3://nuviad-temp/convertresults'
    }
)

job.commit()

Note: Because Redshift Spectrum and Athena both use the AWS Glue Data Catalog, we could use the Athena client to add the partition to the table.

Here are a few words about float, decimal, and double. Using decimal proved to be more challenging than we expected, as it seems that Redshift Spectrum and Spark use them differently. Whenever we used decimal in Redshift Spectrum and in Spark, we kept getting errors, such as:

S3 Query Exception (Fetch). Task failed due to an internal error. File 'https://s3-external-1.amazonaws.com/nuviad-temp/events/2017-08-01/hour=2/part-00017-48ae5b6b-906e-4875-8cde-bc36c0c6d0ca.c000.snappy.parquet has an incompatible Parquet schema for column 's3://nuviad-events/events.lat'. Column type: DECIMAL(18, 8), Parquet schema:\noptional float lat [i:4 d:1 r:0]\n (https://s3-external-1.amazonaws.com/nuviad-temp/events/2017-08-01/hour=2/part-00017-48ae5b6b-906e-4875-8cde-bc36c0c6d0ca.c000.snappy.parq

We had to experiment with a few floating-point formats until we found that the only combination that worked was to define the column as double in the Spark code and float in Spectrum. This is the reason you see billing defined as float in Spectrum and double in the Spark code.

Creating a Lambda function to trigger conversion

Next, we created a simple Lambda function to trigger the AWS Glue script hourly using a simple Python code:

import boto3
import json
from datetime import datetime, timedelta
 
client = boto3.client('glue')
 
def lambda_handler(event, context):
    last_hour_date_time = datetime.now() - timedelta(hours = 1)
    day_partition_value = last_hour_date_time.strftime("%Y-%m-%d") 
    hour_partition_value = last_hour_date_time.strftime("%-H") 
    response = client.start_job_run(
    JobName='convertEventsParquetHourly',
    Arguments={
         '--day_partition_key': 'date',
         '--hour_partition_key': 'hour',
         '--day_partition_value': day_partition_value,
         '--hour_partition_value': hour_partition_value
         }
    )

Using Amazon CloudWatch Events, we trigger this function hourly. This function triggers an AWS Glue job named ‘convertEventsParquetHourly’ and runs it for the previous hour, passing job names and values of the partitions to process to AWS Glue.

Redshift Spectrum and Node.js

Our development stack is based on Node.js, which is well-suited for high-speed, light servers that need to process a huge number of transactions. However, a few limitations of the Node.js environment required us to create workarounds and use other tools to complete the process.

Node.js and Parquet

The lack of Parquet modules for Node.js required us to implement an AWS Glue/Amazon EMR process to effectively migrate data from CSV to Parquet. We would rather save directly to Parquet, but we couldn’t find an effective way to do it.

One interesting project in the works is the development of a Parquet NPM by Marc Vertes called node-parquet (https://www.npmjs.com/package/node-parquet). It is not in a production state yet, but we think it would be well worth following the progress of this package.

Timestamp data type

According to the Parquet documentation, Timestamp data are stored in Parquet as 64-bit integers. However, JavaScript does not support 64-bit integers, because the native number type is a 64-bit double, giving only 53 bits of integer range.

The result is that you cannot store Timestamp correctly in Parquet using Node.js. The solution is to store Timestamp as string and cast the type to Timestamp in the query. Using this method, we did not witness any performance degradation whatsoever.

Lessons learned

You can benefit from our trial-and-error experience.

Lesson #1: Data validation is critical

As mentioned earlier, a single corrupt entry in a partition can fail queries running against this partition, especially when using Parquet, which is harder to edit than a simple CSV file. Make sure that you validate your data before scanning it with Redshift Spectrum.

Lesson #2: Structure and partition data effectively

One of the biggest benefits of using Redshift Spectrum (or Athena for that matter) is that you don’t need to keep nodes up and running all the time. You pay only for the queries you perform and only for the data scanned per query.

Keeping different permutations of your data for different queries makes a lot of sense in this case. For example, you can partition your data by date and hour to run time-based queries, and also have another set partitioned by user_id and date to run user-based queries. This results in faster and more efficient performance of your data warehouse.

Storing data in the right format

Use Parquet whenever you can. The benefits of Parquet are substantial. Faster performance, less data to scan, and much more efficient columnar format. However, it is not supported out-of-the-box by Kinesis Firehose, so you need to implement your own ETL. AWS Glue is a great option.

Creating small tables for frequent tasks

When we started using Redshift Spectrum, we saw our Amazon Redshift costs jump by hundreds of dollars per day. Then we realized that we were unnecessarily scanning a full day’s worth of data every minute. Take advantage of the ability to define multiple tables on the same S3 bucket or folder, and create temporary and small tables for frequent queries.

Lesson #3: Combine Athena and Redshift Spectrum for optimal performance

Moving to Redshift Spectrum also allowed us to take advantage of Athena as both use the AWS Glue Data Catalog. Run fast and simple queries using Athena while taking advantage of the advanced Amazon Redshift query engine for complex queries using Redshift Spectrum.

Redshift Spectrum excels when running complex queries. It can push many compute-intensive tasks, such as predicate filtering and aggregation, down to the Redshift Spectrum layer, so that queries use much less of your cluster’s processing capacity.

Lesson #4: Sort your Parquet data within the partition

We achieved another performance improvement by sorting data within the partition using sortWithinPartitions(sort_field). For example:

df.repartition(1).sortWithinPartitions("campaign_id")…

Conclusion

We were extremely pleased with using Amazon Redshift as our core data warehouse for over three years. But as our client base and volume of data grew substantially, we extended Amazon Redshift to take advantage of scalability, performance, and cost with Redshift Spectrum.

Redshift Spectrum lets us scale to virtually unlimited storage, scale compute transparently, and deliver super-fast results for our users. With Redshift Spectrum, we store data where we want at the cost we want, and have the data available for analytics when our users need it with the performance they expect.


About the Author

With 7 years of experience in the AdTech industry and 15 years in leading technology companies, Rafi Ton is the founder and CEO of NUVIAD. He enjoys exploring new technologies and putting them to use in cutting edge products and services, in the real world generating real money. Being an experienced entrepreneur, Rafi believes in practical-programming and fast adaptation of new technologies to achieve a significant market advantage.

 

 

AWS Achieves FedRAMP JAB Moderate Provisional Authorization for 20 Services in the AWS US East/West Region

Post Syndicated from Chris Gile original https://aws.amazon.com/blogs/security/aws-achieves-fedramp-jab-moderate-authorization-for-20-services-in-us-eastwest/

The AWS US East/West Region has received a Provisional Authority to Operate (P-ATO) from the Joint Authorization Board (JAB) at the Federal Risk and Authorization Management Program (FedRAMP) Moderate baseline.

Though AWS has maintained an AWS US East/West Region Agency-ATO since early 2013, this announcement represents AWS’s carefully deliberated move to the JAB for the centralized maintenance of our P-ATO for 10 services already authorized. This also includes the addition of 10 new services to our FedRAMP program (see the complete list of services below). This doubles the number of FedRAMP Moderate services available to our customers to enable increased use of the cloud and support modernized IT missions. Our public sector customers now can leverage this FedRAMP P-ATO as a baseline for their own authorizations and look to the JAB for centralized Continuous Monitoring reporting and updates. In a significant enhancement for our partners that build their solutions on the AWS US East/West Region, they can now achieve FedRAMP JAB P-ATOs of their own for their Platform as a Service (PaaS) and Software as a Service (SaaS) offerings.

In line with FedRAMP security requirements, our independent FedRAMP assessment was completed in partnership with a FedRAMP accredited Third Party Assessment Organization (3PAO) on our technical, management, and operational security controls to validate that they meet or exceed FedRAMP’s Moderate baseline requirements. Effective immediately, you can begin leveraging this P-ATO for the following 20 services in the AWS US East/West Region:

  • Amazon Aurora (MySQL)*
  • Amazon CloudWatch Logs*
  • Amazon DynamoDB
  • Amazon Elastic Block Store
  • Amazon Elastic Compute Cloud
  • Amazon EMR*
  • Amazon Glacier*
  • Amazon Kinesis Streams*
  • Amazon RDS (MySQL, Oracle, Postgres*)
  • Amazon Redshift
  • Amazon Simple Notification Service*
  • Amazon Simple Queue Service*
  • Amazon Simple Storage Service
  • Amazon Simple Workflow Service*
  • Amazon Virtual Private Cloud
  • AWS CloudFormation*
  • AWS CloudTrail*
  • AWS Identity and Access Management
  • AWS Key Management Service
  • Elastic Load Balancing

* Services with first-time FedRAMP Moderate authorizations

We continue to work with the FedRAMP Project Management Office (PMO), other regulatory and compliance bodies, and our customers and partners to ensure that we are raising the bar on our customers’ security and compliance needs.

To learn more about how AWS helps customers meet their security and compliance requirements, see the AWS Compliance website. To learn about what other public sector customers are doing on AWS, see our Government, Education, and Nonprofits Case Studies and Customer Success Stories. To review the public posting of our FedRAMP authorizations, see the FedRAMP Marketplace.

– Chris Gile, Senior Manager, AWS Public Sector Risk and Compliance

Prepare to run a Code Club on FutureLearn

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/code-club-futurelearn/

Prepare to run a Code Club with our newest free online course, available now on FutureLearn!

FutureLearn: Prepare to Run a Code Club

Ready to launch! Our free FutureLearn course ‘Prepare to Run a Code Club’ starts next week and you can sign up now: https://www.futurelearn.com/courses/code-club

Code Club

As of today, more than 10000 Code Clubs run in 130 countries, delivering free coding opportunities to approximately 150000 children across the globe.

A child absorbed in a task at a Code Club

As an organisation, Code Club provides free learning resources and training materials to supports the ever-growing and truly inspiring community of volunteers and educators who set up and run Code Clubs.

FutureLearn

Today we’re launching our latest free online course on FutureLearn, dedicated to training and supporting new Code Club volunteers. It will give you practical guidance on all things Code Club, as well as a taste of beginner programming!

Split over three weeks and running for 3–4 hours in total, the course provides hands-on advice and tips on everything you need to know to run a successful, fun, and educational club.

“Week 1 kicks off with advice on how to prepare to start a Code Club, for example which hardware and software are needed. Week 2 focusses on how to deliver Code Club sessions, with practical tips on helping young people learn and an easy taster coding project to try out. In the final week, the course looks at interesting ideas to enrich and extend club sessions.”
— Sarah Sherman-Chase, Code Club Participation Manager

The course is available wherever you live, and it is completely free — sign up now!

If you’re already a volunteer, the course will be a great refresher, and a chance to share your insights with newcomers. Moreover, it is also useful for parents and guardians who wish to learn more about Code Club.

Your next step

Interested in learning more? You can start the course today by visiting FutureLearn. And to find out more about Code Clubs in your country, visit Code Club UK or Code Club International.

Code Club partners from across the globe gathered together for a group photo at the International Meetup

We love hearing your Code Club stories! If you’re a volunteer, are in the process of setting up a club, or are inspired to learn more, share your story in the comments below or via social media, making sure to tag @CodeClub and @CodeClubWorld.

You might also be interested in our other free courses on the FutureLearn platform, including Teaching Physical Computing with Raspberry Pi and Python and Teaching Programming in Primary Schools.

 

The post Prepare to run a Code Club on FutureLearn appeared first on Raspberry Pi.

Kodi Addon Dev Says “Show of Force” Will Be Met With Defiance

Post Syndicated from Andy original https://torrentfreak.com/kodi-addon-dev-says-show-force-will-met-defiance-171119/

For many years, the members of the MPAA have flexed their muscles all around the globe, working to prevent people from engaging in online piracy. If the last 17 years ‘progress’ is anything to go by, it’s a war that will go on indefinitely.

With Columbia, Disney, Paramount, Twentieth Century Fox, Universal, and Warner on board, the MPAA has historically relied on sheer power to intimidate opponents. That has certainly worked in many large piracy cases but for many peripheral smaller-scale pirates, their presence is largely ignored.

This week, however, several players in the Kodi scene discovered that these giants – and more besides – have the ability to literally turn up at their front door. As reported Thursday, UK-based Kodi addon developer The_Alpha received a hand-delivered cease-and-desist letter from all of the above, accompanied by new faces Netflix, Amazon and Sky TV.

These companies are part of the Alliance for Creativity and Entertainment (ACE), a massive and recently-formed anti-piracy coalition comprised of 30 global entertainment brands. TorrentFreak reached out to The_Alpha for his thoughts on coming under such a dazzling spotlight but perhaps understandably he didn’t want to comment.

The leader of the Ares Project was willing to go on the record, however, after he too received a hand-delivered threat during the week. His decision was to immediately comply and shutdown but TF is informed that others might not be so willing to follow suit.

A Kodi addon developer living in the UK who spoke to us on condition of anonymity told us that most people operating in the scene expected some kind of trouble – just not on this scale.

“Did you see the [company logos] across the top of Alpha’s letter? That’s some serious shit right there. The film companies are no surprise but Amazon delivers my groceries so I don’t expect this shit from them,” he said.

When the ACE partnership was formed earlier this year, it seemed pretty clear that the main drive was towards the pooling of anti-piracy resources to be more effective and efficient. However, it can’t have escaped ACE that such a broad and powerful alliance could also have a profound psychological effect on its adversaries.

“There’s no doubt in my mind that they’re turning up mob-handed to put the shits up people like Alpha and the rest of us,” the developer said. “It’s hardly a fair dust-up is it? What have we got to fight back with, a giro [state benefits]? It’s a show of force, ‘look how important we are’!”

Interestingly, however, the dev told us that it isn’t necessarily the size of the coalition that has him most concerned. What caught his eye was the inclusion of two influential UK-based companies in the alliance.

“Having Sly [a local derogatory nickname for Sky TV] and the Premier League on the letter makes it much more serious to me than seeing Warner or whatever,” he commented.

“I don’t get involved in footie but Sly is everywhere round here and I think it’s something the Brit dev scene might take notice of, even if most say ‘fuck it’ and carry on anyway.”

When questioned whether that’s likely, our source said that while ACE might be able to tackle some of the bigger targets like Ares Project or Colossus, they fundamentally misunderstand how the Kodi scene works.

“If you want a good example of a scattered pirate scene, I give you Kodi. They can bomb the base or whatever but nobody lives there,” he explained.

“There’s some older blokes like me who can do without the stress but a lot of younger coders, builders and YouTubers who thrive on it. They’re used to running around council estates with real-life problems. A faffy letter from some toff in a suit means literally nothing. Like I said, all they have to lose is a giro.”

Whether this is just bravado will remain to be seen, but our earlier discussions with others in the scene indicate a particular weakness in the UK, with many players vulnerable to being found after failing to hide their identities in the past. To a point, our source agrees that this is a problem.

“People are saying that Alpha was found after trying to raise some charity money related to his disabled son but I don’t know for sure and nor does anybody else. What strikes me is that none of us really thought things would get this on top here because all you ever hear about is America this, Canada that, whatever. Does this means that more of us are getting done in England? You tell me,” he said.

Only time will tell but stamping out the pirate Kodi scene is going to be hard work.

Within hours of several projects disappearing Wednesday and Thursday, YouTube and myriad blogs were being flooded with guides detailing immediate replacements. This ad-hoc network of enthusiasts makes the exchange of information happen at an alarming rate and it’s hard to see how any company – no matter how powerful – will ever be able to keep up.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

New White House Announcement on the Vulnerability Equities Process

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/11/new_white_house_1.html

The White House has released a new version of the Vulnerabilities Equities Process (VEP). This is the inter-agency process by which the US government decides whether to inform the software vendor of a vulnerability it finds, or keep it secret and use it to eavesdrop on or attack other systems. You can read the new policy or the fact sheet, but the best place to start is Cybersecurity Coordinator Rob Joyce’s blog post.

In considering a way forward, there are some key tenets on which we can build a better process.

Improved transparency is critical. The American people should have confidence in the integrity of the process that underpins decision making about discovered vulnerabilities. Since I took my post as Cybersecurity Coordinator, improving the VEP and ensuring its transparency have been key priorities, and we have spent the last few months reviewing our existing policy in order to improve the process and make key details about the VEP available to the public. Through these efforts, we have validated much of the existing process and ensured a rigorous standard that considers many potential equities.

The interests of all stakeholders must be fairly represented. At a high level we consider four major groups of equities: defensive equities; intelligence / law enforcement / operational equities; commercial equities; and international partnership equities. Additionally, ordinary people want to know the systems they use are resilient, safe, and sound. These core considerations, which have been incorporated into the VEP Charter, help to standardize the process by which decision makers weigh the benefit to national security and the national interest when deciding whether to disclose or restrict knowledge of a vulnerability.

Accountability of the process and those who operate it is important to establish confidence in those served by it. Our public release of the unclassified portions Charter will shed light on aspects of the VEP that were previously shielded from public review, including who participates in the VEP’s governing body, known as the Equities Review Board. We make it clear that departments and agencies with protective missions participate in VEP discussions, as well as other departments and agencies that have broader equities, like the Department of State and the Department of Commerce. We also clarify what categories of vulnerabilities are submitted to the process and ensure that any decision not to disclose a vulnerability will be reevaluated regularly. There are still important reasons to keep many of the specific vulnerabilities evaluated in the process classified, but we will release an annual report that provides metrics about the process to further inform the public about the VEP and its outcomes.

Our system of government depends on informed and vigorous dialogue to discover and make available the best ideas that our diverse society can generate. This publication of the VEP Charter will likely spark discussion and debate. This discourse is important. I also predict that articles will make breathless claims of “massive stockpiles” of exploits while describing the issue. That simply isn’t true. The annual reports and transparency of this effort will reinforce that fact.

Mozilla is pleased with the new charter. I am less so; it looks to me like the same old policy with some new transparency measures — which I’m not sure I trust. The devil is in the details, and we don’t know the details — and it has giant loopholes that pretty much anything can fall through:

The United States Government’s decision to disclose or restrict vulnerability information could be subject to restrictions by partner agreements and sensitive operations. Vulnerabilities that fall within these categories will be cataloged by the originating Department/Agency internally and reported directly to the Chair of the ERB. The details of these categories are outlined in Annex C, which is classified. Quantities of excepted vulnerabilities from each department and agency will be provided in ERB meetings to all members.

This is me from last June:

There’s a lot we don’t know about the VEP. The Washington Post says that the NSA used EternalBlue “for more than five years,” which implies that it was discovered after the 2010 process was put in place. It’s not clear if all vulnerabilities are given such consideration, or if bugs are periodically reviewed to determine if they should be disclosed. That said, any VEP that allows something as dangerous as EternalBlue — or the Cisco vulnerabilities that the Shadow Brokers leaked last August — to remain unpatched for years isn’t serving national security very well. As a former NSA employee said, the quality of intelligence that could be gathered was “unreal.” But so was the potential damage. The NSA must avoid hoarding vulnerabilities.

I stand by that, and am not sure the new policy changes anything.

More commentary.

Here’s more about the Windows vulnerabilities hoarded by the NSA and released by the Shadow Brokers.

EDITED TO ADD (11/18): More news.

EDITED TO ADD (11/22): Adam Shostack points out that the process does not cover design flaws or trade-offs, and that those need to be covered:

…we need the VEP to expand to cover those issues. I’m not going to claim that will be easy, that the current approach will translate, or that they should have waited to handle those before publishing. One obvious place it gets harder is the sources and methods tradeoff. But we need the internet to be a resilient and trustworthy infrastructure.

Visualising Weather Station data with Initial State

Post Syndicated from Richard Hayler original https://www.raspberrypi.org/blog/initial-state/

Since we launched the Oracle Weather Station project, we’ve collected more than six million records from our network of stations at schools and colleges around the world. Each one of these records contains data from ten separate sensors — that’s over 60 million individual weather measurements!

Weather station measurements in Oracle database - Initial State

Weather station measurements in Oracle database

Weather data collection

Having lots of data covering a long period of time is great for spotting trends, but to do so, you need some way of visualising your measurements. We’ve always had great resources like Graphing the weather to help anyone analyse their weather data.

And from now on its going to be even easier for our Oracle Weather Station owners to display and share their measurements. I’m pleased to announce a new partnership with our friends at Initial State: they are generously providing a white-label platform to which all Oracle Weather Station recipients can stream their data.

Using Initial State

Initial State makes it easy to create vibrant dashboards that show off local climate data. The service is perfect for having your Oracle Weather Station data on permanent display, for example in the school reception area or on the school’s website.

But that’s not all: the Initial State toolkit includes a whole range of easy-to-use analysis tools for extracting trends from your data. Distribution plots and statistics are just a few clicks away!

Humidity value distribution (May-Nov 2017) - Raspberry Pi Oracle Weather Station Initial State

Looks like Auntie Beryl is right — it has been a damp old year! (Humidity value distribution May–Nov 2017)

The wind direction data from my Weather Station supports my excuse as to why I’ve not managed a high-altitude balloon launch this year: to use my launch site, I need winds coming from the east, and those have been in short supply.

Chart showing wind direction over time - Raspberry Pi Oracle Weather Station Initial State

Chart showing wind direction over time

Initial State credientials

Every Raspberry Pi Oracle Weather Station school will shortly be receiving the credentials needed to start streaming their data to Initial State. If you’re super keen though, please email [email protected] with a photo of your Oracle Weather Station, and I’ll let you jump the queue!

The Initial State folks are big fans of Raspberry Pi and have a ton of Pi-related projects on their website. They even included shout-outs to us in the music video they made to celebrate the publication of their 50th tutorial. Can you spot their weather station?

Your home-brew weather station

If you’ve built your own Raspberry Pi–powered weather station and would like to dabble with the Initial State dashboards, you’re in luck! The team at Initial State is offering 14-day trials for everyone. For more information on Initial State, and to sign up for the trial, check out their website.

The post Visualising Weather Station data with Initial State appeared first on Raspberry Pi.