Tag Archives: India

Cloudflare Global Network Expands to 193 Cities

Post Syndicated from Nitin Rao original https://blog.cloudflare.com/scaling-the-cloudflare-global/

Cloudflare Global Network Expands to 193 Cities

Cloudflare’s global network currently spans 193 cities across 90+ countries. With over 20 million Internet properties on our network, we increase the security, performance, and reliability of large portions of the Internet every time we add a location.

Cloudflare Global Network Expands to 193 Cities

Expanding Network to New Cities

So far in 2019, we’ve added a score of new locations: Amman, Antananarivo*, Arica*, Asunción, Bengaluru, Buffalo, Casablanca, Córdoba*, Cork, Curitiba, Dakar*, Dar es Salaam, Fortaleza, Göteborg, Guatemala City, Hyderabad, Kigali, Kolkata, Male*, Maputo, Nagpur, Neuquén*, Nicosia, Nouméa, Ottawa, Port-au-Prince, Porto Alegre, Querétaro, Ramallah, and Thessaloniki.

Our Humble Beginnings

When Cloudflare launched in 2010, we focused on putting servers at the Internet’s crossroads: large data centers with key connections, like the Amsterdam Internet Exchange and Equinix Ashburn. This not only provided the most value to the most people at once but was also easier to manage by keeping our servers in the same buildings as all the local ISPs, server providers, and other people they needed to talk to streamline our services.

This is a great approach for bootstrapping a global network, but we’re obsessed with speed in general. There are over five hundred cities in the world with over one million inhabitants, but only a handful of them have the kinds of major Internet exchanges that we targeted. Our goal as a company is to help make a better Internet for all, not just those lucky enough to live in areas with affordable and easily-accessible interconnection points. However, we ran up against two broad, nasty problems: a) running out of major Internet exchanges and b) latency still wasn’t as low as we wanted. Clearly, we had to start scaling in new ways.

One of our first big steps was entering into partnerships around the world with local ISPs, who have many of the same problems we do: ISPs want to save money and provide fast Internet to their customers, but they often don’t have a major Internet exchange nearby to connect to. Adding Cloudflare equipment to their infrastructure effectively brought more of the Internet closer to them. We help them speed up millions of Internet properties while reducing costs by serving traffic locally. Additionally, since all of our servers are designed to support all our products, a relatively small physical footprint can also provide security, performance, reliability, and more.

Upgrading Capacity in Existing Cities

Though it may be obvious and easy to overlook, continuing to build out existing locations is also a key facet of building a global network. This year, we have significantly increased the computational capacity at the edge of our network. Additionally, by making it easier to interconnect with Cloudflare, we have increased the number of unique networks directly connected with us to over 8,000. This makes for a faster, more reliable Internet experience for the >1 billion IPs that we see daily.

To make these capacity upgrades possible for our customers, efficient infrastructure deployment has been one of our keys to success. We want our infrastructure deployment to be targeted and flexible.

Targeted Deployment

The next Cloudflare customer through our door could be a small restaurant owner on a Pro plan with thousands of monthly pageviews or a fast-growing global tech company like Discord. As a result, we need to always stay one step ahead and synthesize a lot of data all at once for our customers.

To accommodate this expansion, our Capacity Planning team is learning new ways to optimize our servers. One key strategy is targeting exactly where to send our servers. However, staying on top of everything isn’t easy – we are a global anycast network, which introduces unpredictability as to where incoming traffic goes. To make things even more difficult, each city can contain as many as five distinct deployments. Planning isn’t just a question of what city to send servers to, it’s one of which address.

To make sense of it all, we tackle the problem with simulations. Some, but not all, of the variables we model include historical traffic growth rates, foreseeable anomalous spikes (e.g., Cyber Day in Chile), and consumption states from our live deal pipeline, as well as product costs, user growth, end-customer adoption. We also add in site reliability, potential for expansion, and expected regional expansion and partnerships, as well as strategic priorities and, of course, feedback from our fantastic Systems Reliability Engineers.

Flexible Supply Chain

Knowing where to send a server is only the first challenge of many when it comes to a global network. Just like our user base, our supply chain must span the entire world while also staying flexible enough to quickly react to time constraints, pricing changes including taxes and tariffs, import/export restrictions and required certifications – not to mention local partnerships many more dynamic location-specific variables. Even more reason we have to stay quick on our feet, there will always be unforeseen roadblocks and detours even in the most well-prepared plans. For example, a planned expansion in our Prague location might warrant an expanded presence in Vienna for failover.

Once servers arrive at our data centers, our Data Center Deployment and Technical Operations teams work with our vendors and on-site data center personnel (our “Remote Hands” and “Smart Hands”) to install the physical server, manage the cabling, and handle other early-stage provisioning processes.

Our architecture, which is designed so that every server can support every service, makes it easier to withstand hardware failures and efficiently load balance workloads between equipment and between locations.

Join Our Team

If working at a rapidly expanding, globally diverse company interests you, we’re hiring for scores of positions, including in the Infrastructure group. If you want to help increase hardware efficiency, deploy and maintain servers, work on our supply chain, or strengthen ISP partnerships, get in touch.

*Represents cities where we have data centers with active Internet ports and where we are configuring our servers to handle traffic for more customers (at the time of publishing)

Warner Bros. Obtains Several Blocking Orders Targeting Major Pirate Sites

Post Syndicated from Andy original https://torrentfreak.com/warner-bros-obtains-blocking-orders-against-major-pirate-sites-190813/

India is no stranger to blocking pirate sites. Just last week, a court ordered local Internet service providers to block more than 1,200 sites to prevent the spread of a single movie.

Now, however, it appears that there additional legal moves underway to ensure that sites are not only blocked temporarily but also on a more permanent basis.

Over the past several weeks the High Court in Delhi has been handling many separate applications for permanent injunction filed by US-based Warner Bros. Entertainment Inc.

In all cases, the company states that several of its copyrighted works – movies Aquaman, A Star is Born, Wonder Woman, plus TV show Arrow – were made available via a broad range of torrent, streaming, linking, and proxy-type sites.

The complaints also cite works by studios including Columbia, Paramount, Universal, and Netflix as further examples of content being infringed on the platforms.

In just one of the complaints the list of infringing domains runs to 124 and includes some very well known names including local giant Tamilrockers, TorrentDownload.ch, TorrentDownloads.me and EZTV, iStole.it, Zoink.it, Torrents.me, Torrents.io, Zooqle, MovieRulz, LimeTorrents, Bolly4u, KatMovie, Monova, and 9xMovies.

In many cases, multiple domains are listed for the above sites, including alternates, proxies and other variants that are accessible via various unblocking platforms. All are accused of infringing the rights of Warner Bros. by providing access to its movie and TV shows content without authorization.

“[D]efendant Websites are primarily and substantially engaged in communicating to the public, hosting, streaming and/or making available to the public Plaintiff’s original content without authorization, and/or facilitating the same,” one order reads.

The order covering the above sites notes that Warner investigated and then served legal notices on the platforms ordering them to cease-and-desist. However, it’s reported that none acted to prevent their infringing activities.

To boost its case, Warner also informed the Court that some of the sites have already been blocked in other jurisdictions (including the UK, Portugal, Malaysia, Australia, Belgium, Denmark, Russia, and Italy) for similar behavior.

After consideration, the Court found that there is a prima facie case and Warner should be awarded an interim injunction to prevent the sites from continuing their infringing activities. Furthermore, the sites should have their domains blocked by ISPs in India, to prevent further damage and losses.

The Court also addressed the issue of additional domains or platforms appearing to circumvent any blocking, by granting Warner permission to file additional updates with the Court that will allow for such mechanisms to be disabled by ISPs via an expedited process.

The example order detailed above is very specific, in that it orders ISPs to block the domain names of the sites plus a list of IP addresses. However, the vast majority appear to be using Cloudflare, so it remains to be seen whether the ISPs will use discretion or blindly block, which could cause considerable disruption to other sites using the same IP locations.

In some of the orders, it appears that domain registrars are also required to suspend domain names belonging or connected to various sites, including TamilRockers, Hindilinks4u, Otorrents, Filmlinks4u, Mp4Moviez, Series9.io, uWatchFree, OnlineWatchMovies, MovieRulzFree, and SkyMovies.

Several additional applications from Warner are on record at the Delhi High Court but are yet to be published as interim orders.

The main order detailed above can be found here (pdf), the rest here 2,3,4,5,6,7,8,9,10

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Indian High Court Orders ISPs to Block 1,129 Sites to Protect One Movie From Piracy

Post Syndicated from Andy original https://torrentfreak.com/indian-high-court-orders-isps-to-block-1129-sites-to-protect-one-movie-190808/

Blocking orders to prevent the distribution of copyright content are commonplace in several regions around the world.

In India, however, blocking injunctions are regularly handed down to protect specific movies, oftentimes before those movies are even released.

That is also the case with the movie ‘Nerkonda Paarvai’, a legal drama set to hit big screens worldwide today. In anticipation of this release, copyright holder Bayview LLP headed off to the Madras High Court, seeking a pre-emptive injunction to prevent the movie from being spread to the public for free via the Internet and other means.

The High Court hasn’t published the full details of the application, meaning that the list of sites set to be targeted hasn’t yet been revealed in public. It almost certainly contains one, if not many domains, operated by the notorious torrent site Tamilrockers, but the rest remain open to speculation.

What we do know is that a total of 39 Internet service providers were named as defendants in the order handed down this week by Justice Krishnan Ramasamy in the Madras High Court. The Judge acknowledges that the Bayview LLP production has been the subject of significant investment and is set to be released on more than 2,000 screens worldwide.

While it’s not uncommon to list ISPs as defendants in such cases, often noting that they play an unwitting role in the distribution of infringing content, the wording in the Judge’s order, which cites the plaintiff’s case, seems to go considerably further. Whether that’s entirely intentional is open to question.

“The learned counsel for applicant contended that the various cable and internet services provided by various persons (respondents 1–9) across the world are involved in activities of recording, cam-cording and reproducing the audio songs, audio-visual clips, audio-visual songs and full cinematographic films that are screened in theatres and then copying/reproducing them through various medium including but not limited to CDs, DVDs, VCDs, Blu-ray Discs, computer hard drives, pen drives etc.,and distribute the same for selling at a meager sum to the general public without any leave or authorization of the production houses/copyright holders/right holders such as the applicant herein,” the order reads.

Citing the above and referencing the application, the Judge said that in his opinion a prima facie case had been made for him to award a preliminary injunction which will continue until August 20, 2019.

The order, obtained by TorrentFreak, is available here (pdf). The full list of ISPs is detailed below.

1) BHARAT SANCHAR NIGAM LIMITED
2) Mahanagar Telephone Nigam Ltd.
3) Bharati Airtel Ltd.
4) Vodafone Idea Ltd. (Formerly Idea Cellular Ltd.)
5) Reliance Jio Infocomm Ltd.
6) Atria Convergence Technologies Pvt. Ltd.
7) Hathway Cable and Datacom Ltd.
8) Tata Docomo
9) Asianet Satellite Communications
10) Tikona Digital Networks Pvt. Ltd.
11) You Broadband And Cable India Ltd.
12) Reliance Communications Infrastructure Ltd.
13) Rail Tel Corporation of India Ltd.
14) Shyam Spectra Pvt. Ltd.
15) Sify Technologies Ltd.
16) AT And T Global Network Service India Pvt. Ltd.
17) Peak Air Pvt. Ltd.
18) Knet Solutions Pvt. Ltd. (Cherrinet)
19) Limras Eronet Broadband Services Pvt. Ltd.
20) SITI Networks Limited
21) Andhra Pradesh State Fibre Net Ltd.,
22) Raaj Internet (I) Pvt. Ltd.
23) Joister Infoserve Pvt. Ltd.
24) GTPL Hathway Ltd.
25) Ready Link Internet Service Ltd.
26) Nettlinx Limited
27) Excitel Broadband Pvt. Ltd.
28) Southern Online Bio Technologies Ltd.
29) Dawn Supports Pvt. Ltd.
30) Thalainagar Digital Cables (P) Ltd.
31) Cable Cast New Media Pvt. Ltd.
32) C32 Cable Net Pvt. Ltd.
33) Team 5 Network
34) SND Satellite Vision
35) Kerala Communicators Cable Ltd.
36) Asianet Digital Network Pvt. Ltd.
37) DEN Networks Ltd.
38) Starvision Cable TV Network
39) Telecom Regulatory Authority of India (TRAI)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Deeper Connection with the Local Tech Community in India

Post Syndicated from Tingting (Teresa) Huang original https://blog.cloudflare.com/deeper-connection-with-the-local-tech-community-in-india/

Deeper Connection with the Local Tech Community in India

On June 6th 2019, Cloudflare hosted the first ever customer event in a beautiful and green district of Bangalore, India. More than 60 people, including executives, developers, engineers, and even university students, have attended the half day forum.

Deeper Connection with the Local Tech Community in India

The forum kicked off with a series of presentations on the current DDoS landscape, the cyber security trends, the Serverless computing and Cloudflare’s Workers. Trey Quinn, Cloudflare Global Head of Solution Engineering, gave a brief introduction on the evolution of edge computing.

Deeper Connection with the Local Tech Community in India

We also invited business and thought leaders across various industries to share their insights and best practices on cyber security and performance strategy. Some of the keynote and penal sessions included live demos from our customers.

Deeper Connection with the Local Tech Community in India

At this event, the guests had gained first-hand knowledge on the latest technology. They also learned some insider tactics that will help them to protect their business, to accelerate the performance and to identify the quick-wins in a complex internet environment.

Deeper Connection with the Local Tech Community in India

To conclude the event, we arrange some dinner for the guests to network and to enjoy a cool summer night.

Deeper Connection with the Local Tech Community in India

Through this event, Cloudflare has strengthened the connection with the local tech community. The success of the event cannot be separated from the constant improvement from Cloudflare and the continuous support from our customers in India.

As the old saying goes, भारत महान है (India is great). India is such an important market in the region. Cloudflare will enhance the investment and engagement in providing better services and user experience for India customers.

Deeper Connection with the Local Tech Community in India

Post Syndicated from Tingting (Teresa) Huang original https://blog.cloudflare.com/deeper-connection-with-the-local-tech-community-in-india/

Deeper Connection with the Local Tech Community in India

On June 6th 2019, Cloudflare hosted the first ever customer event in a beautiful and green district of Bangalore, India. More than 60 people, including executives, developers, engineers, and even university students, have attended the half day forum.

Deeper Connection with the Local Tech Community in India

The forum kicked off with a series of presentations on the current DDoS landscape, the cyber security trends, the Serverless computing and Cloudflare’s Workers. Trey Quinn, Cloudflare Global Head of Solution Engineering, gave a brief introduction on the evolution of edge computing.

Deeper Connection with the Local Tech Community in India

We also invited business and thought leaders across various industries to share their insights and best practices on cyber security and performance strategy. Some of the keynote and penal sessions included live demos from our customers.

Deeper Connection with the Local Tech Community in India

At this event, the guests had gained first-hand knowledge on the latest technology. They also learned some insider tactics that will help them to protect their business, to accelerate the performance and to identify the quick-wins in a complex internet environment.

Deeper Connection with the Local Tech Community in India

To conclude the event, we arrange some dinner for the guests to network and to enjoy a cool summer night.

Deeper Connection with the Local Tech Community in India

Through this event, Cloudflare has strengthened the connection with the local tech community. The success of the event cannot be separated from the constant improvement from Cloudflare and the continuous support from our customers in India.

As the old saying goes, भारत महान है (India is great). India is such an important market in the region. Cloudflare will enhance the investment and engagement in providing better services and user experience for India customers.

India Court Hands Down Cricket World Cup Piracy Blocking Order

Post Syndicated from Andy original https://torrentfreak.com/india-court-hands-down-cricket-world-cup-piracy-blocking-order-190611/

Over the past decade there have been dozens of orders requiring Internet service providers around the world to block access to copyright-infringing content.

The majority of these orders have attempted to protect the movie and music industries but more recently live sports broadcasters have become involved.

In most if not all sports-blocking cases, the content has been both audio and visual, such as live soccer matches to which the English Premier League owns the rights. However, a new blocking order out of India is attempting to block ‘pirate’ radio streams, delivered via the Internet.

The application was made by Channel 2 Group Corporation. According to the company’s website, its founder is Ajay Sethi, a man with a passion for cricket, who launched “the first ever Radio over the internet – Cricket Radio.”

Channel 2’s application at the Delhi High Court states that the company previously acquired the audio rights to the ICC Men’s World Cup, 2019. The agreement allows it to transmit audio coverage of Cricket World Cup matches (live, delayed, or highlights) via the Internet and private FM radio stations throughout India.

Of course, while Channel 2’s agreement may be exclusive in theory, the company says that other entities are encroaching (or likely to encroach) on its rights as the tournament progresses. As such, it’s seeking protection from the Court to have such broadcasts blocked.

In an ex parte interim order handed down by the Delhi High Court, the broadcaster appears to have been granted permission to do just that.

In total there are 249 defendants in the case, two of which are government departments only present for administrative reasons and 247 for direct involvement. Just one is mentioned by name in the published order, lead defendant live.mycricketlive.net. The remainder are not detailed but are variously described as follows:

  • Defendants 1-64 (URLs/Websites)
  • Defendants 65-68 (private radio platform operators)
  • Defendants 69-105 (Internet service providers)
  • Defendants 105-107 (Government departments) (108 absent from list)
  • Defendants 109-249 (Unknown defendants, to be determined)

“The Plaintiff’s apprehension regarding the likely abuse of Plaintiff’s exclusive Audio Rights and intellectual property rights arises from previous instances of infringement of the Plaintiff’s exclusive broadcasting rights by various interested persons,” the order reads.

“The said instances of infringement caused considerable financial loss to the Plaintiff. The Plaintiff is given to believe from its agencies that the Defendants arrayed herein will infringe the exclusive Audio Rights of the Plaintiff.”

Channel 2 states that defendants 1-105 have no right to offer radio/audio broadcasts of any ICC Event, including those in the World Cup. As a result, they urgently need to be restrained from doing so, since the tournament has already begun. Defendants 69-105 must take measures to block the unlicensed streams provided by the infringing websites/services.

Defendants 109-249 are so-called Ashok Kumars, which are the Indian equivalent of John Does. They are predicted to infringe on Channel 2’s rights so need to be dealt with quickly should they do so, the Court agreed.

“The Plaintiff apprehends that if the Plaintiff were to wait and identify specific parties and collect evidence of infringement by such specific parties, significant time would be lost and the cricket matches may come to an end,” the order reads.

“Irreparable injury, loss and damage, would be caused to the Plaintiff in such a scenario, which would be impossible to quantify in monetary terms alone.”

As part of the interim order, search engines are also required to delete from their results any websites/URLs that provide access to infringing sites/streams. They must do so following a notification from Channel 2 itself.

Given the broad nature of many blocking injunctions in this and other jurisdictions, not much of the above comes as a surprise anymore. However, there is a somewhat unusual addition to the order, which targets services that provide ball-by-ball and/or minute-by-minute updates on the status of matches.

They may only do so “gratuitously only after a time lag of 15 minutes.”

The interim order of the Delhi High Court can be found here (pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Join Cloudflare India Forum in Bangalore on 6 June 2019!

Post Syndicated from Tingting (Teresa) Huang original https://blog.cloudflare.com/cloudflare-india-forum-in-bangaglore/

Join Cloudflare India Forum in Bangalore on 6 June 2019!

Join Cloudflare India Forum in Bangalore on 6 June 2019!

Please join us for an exclusive gathering to discover the latest in cloud solutions for Internet Security and Performance.

Cloudflare Bangalore Meetup

Thursday, 6 June, 2019:  15:30 – 20:00

Location: the Oberoi (37-39, MG Road, Yellappa Garden, Yellappa Chetty Layout, Sivanchetti Gardens, Bengalore)

We will discuss the newest security trends and introduce serverless solutions.

We have invited renowned leaders across industries, including big brands and some of the fastest-growing startups. You will  learn the insider strategies and tactics that will help you to protect your business, to accelerate the performance and to identify the quick-wins in a complex internet environment.

Speakers:

  • Vaidik Kapoor, Head of Engineering, Grofers
  • Nithyanand Mehta, VP of Technical Services & GM India, Catchpoint
  • Viraj Patel, VP of Technology, Bookmyshow
  • Kailash Nadh, CTO, Zerodha
  • Trey Guinn, Global Head of Solution Engineering, Cloudflare

Agenda:

15:30 – 16:00Registration and Refreshment

16:00 – 16:30 DDoS Landscapes and Security Trends

16:30 – 17:15Workers Overview and Demo

17:15 – 18:00 Panel Discussion – Best Practice on Successful Cyber Security and Performance Strategy

18:00 – 18:30 Keynote #1 – Future edge computing

18:30 – 19:00  Keynote # 2 – Cyber attacks are evolving, so should you: How to adopt a quick-win security policy

19:00 – 20:00 Happy Hour

View Event Details & Register Here »

We look forward to meeting you there!

New guide helps explain cloud security with AWS for public sector customers in India

Post Syndicated from Meng Chow Kang original https://aws.amazon.com/blogs/security/new-guide-helps-explain-cloud-security-with-aws-for-public-sector-customers-in-india/

Our teams are continuing to focus on compliance enablement around the world and now that includes a new guide for public sector customers in India. The User Guide for Government Departments and Agencies in India provides information that helps government users at various central, state, district, and municipal agencies understand security and controls available with AWS. It also explains how to implement appropriate information security, risk management, and governance programs using AWS Services, which are offered in India by Amazon Internet Services Private Limited (AISPL).

The guide focuses on the Ministry of Electronics and Information Technology (Meity) requirements that are detailed in Guidelines for Government Departments for Adoption/Procurement of Cloud Services, addressing common issues that public sector customers encounter.

Our newest guide is part of a series diving into customer compliance issues across industries and jurisdictions, such as financial services guides for Singapore, Australia, and Hong Kong. We’ll be publishing additional guides this year to help you understand other regulatory requirements around the world.

Want more AWS Security news? Follow us on Twitter.

Project Floofball and more: Pi pet stuff

Post Syndicated from Janina Ander original https://www.raspberrypi.org/blog/project-floofball-pi-pet-stuff/

It’s a public holiday here today (yes, again). So, while we indulge in the traditional pastime of barbecuing stuff (ourselves, mainly), here’s a little trove of Pi projects that cater for our various furry friends.

Project Floofball

Nicole Horward created Project Floofball for her hamster, Harold. It’s an IoT hamster wheel that uses a Raspberry Pi and a magnetic door sensor to log how far Harold runs.

Project Floofball: an IoT hamster wheel

An IoT Hamsterwheel using a Raspberry Pi and a magnetic door sensor, to see how far my hamster runs.

You can follow Harold’s runs in real time on his ThingSpeak channel, and you’ll find photos of the build on imgur. Nicole’s Python code, as well as her template for the laser-cut enclosure that houses the wiring and LCD display, are available on the hamster wheel’s GitHub repo.

A live-streaming pet feeder

JaganK3 used to work long hours that meant he couldn’t be there to feed his dog on time. He found that he couldn’t buy an automated feeder in India without paying a lot to import one, so he made one himself. It uses a Raspberry Pi to control a motor that turns a dispensing valve in a hopper full of dry food, giving his dog a portion of food at set times.

A transparent cylindrical hopper of dry dog food, with a motor that can turn a dispensing valve at the lower end. The motor is connected to a Raspberry Pi in a plastic case. Hopper, motor, Pi, and wiring are all mounted on a board on the wall.

He also added a web cam for live video streaming, because he could. Find out more in JaganK3’s Instructable for his pet feeder.

Shark laser cat toy

Sam Storino, meanwhile, is using a Raspberry Pi to control a laser-pointer cat toy with a goshdarned SHARK (which is kind of what I’d expect from the guy who made the steampunk-looking cat feeder a few weeks ago). The idea is to keep his cats interested and active within the confines of a compact city apartment.

Raspberry Pi Automatic Cat Laser Pointer Toy

Post with 52 votes and 7004 views. Tagged with cat, shark, lasers, austin powers, raspberry pi; Shared by JeorgeLeatherly. Raspberry Pi Automatic Cat Laser Pointer Toy

If I were a cat, I would definitely be entirely happy with this. Find out more on Sam’s website.

And there’s more

Michel Parreno has written a series of articles to help you monitor and feed your pet with Raspberry Pi.

All of these makers are generous in acknowledging the tutorials and build logs that helped them with their projects. It’s lovely to see the Raspberry Pi and maker community working like this, and I bet their projects will inspire others too.

Now, if you’ll excuse me. I’m late for a barbecue.

The post Project Floofball and more: Pi pet stuff appeared first on Raspberry Pi.

Roku Displays FBI Anti-Piracy Warning to Legitimate YouTube & Netflix Users

Post Syndicated from Andy original https://torrentfreak.com/roku-displays-fbi-anti-piracy-warning-to-legitimate-youtube-netflix-users-180516/

In 2018, dealing with copyright infringement claims is a daily issue for many content platforms. The law in many regions demands swift attention and in order to appease copyright holders, most platforms are happy to oblige.

While it’s not unusual for ‘pirate’ content and services to suddenly disappear in response to a DMCA or similar notice, the same is rarely true for entire legitimate services.

But that’s what appeared to happen on the Roku platform during the night, when YouTube, Netflix and other channels disappeared only to be replaced with an ominous anti-piracy warning.

As the embedded tweet shows, the message caused confusion among Roku users who were only using their devices to access legal content. Messages replacing Netflix and YouTube seemed to have caused the greatest number of complaints but many other services were affected.

FoxSportsGo, FandangoNow, and India-focused YuppTV and Hotstar were also blacked out. As were the yoga and transformational videos specialists over at Gaia, the horror buffs at ChillerFlix, and UK TV service BritBox.

But while users scratched their heads, with some misguidedly blaming Roku for not being diligent enough against piracy, Roku took to Twitter to reveal that rather than anti-piracy complaints against the channels in question, a technical hitch was to blame.

However, a subsequent statement to CNET suggested that while blacking out Netflix and YouTube might have been accidental, Roku appears to have been taking anti-piracy action against another channel or channels at the time, with the measures inadvertently spilling over to innocent parties.

“We use that warning when we detect content that has violated copyright,” Roku said in a statement.

“Some channels in our Channel Store displayed that message and became inaccessible after Roku implemented a targeted anti-piracy measure on the platform.”

The precise nature of the action taken by Roku is unknown but it’s clear that copyright infringement is currently a hot topic for the platform.

Roku is currently fighting legal action in Mexico which ordered its products off the shelves following complaints that its platform is used by pirates. That led to an FBI warning being shown for what was believed to be the first time against the XTV and other channels last year.

This March, Roku took action against the popular USTVNow channel following what was described as a “third party” copyright infringement complaint. Just a couple of weeks later, Roku followed up by removing the controversial cCloud channel.

With Roku currently fighting to have sales reinstated in Mexico against a backdrop of claims that up to 40% of its users are pirates, it’s unlikely that Roku is suddenly going to go soft on piracy, so more channel outages can be expected in the future.

In the meantime, the scary FBI warnings of last evening are beginning to fade away (for legitimate channels at least) after the company issued advice on how to fix the problem.

“The recent outage which affected some channels has been resolved. Go to Settings > System > System update > Check now for a software update. Some channels may require you to log in again. Thank you for your patience,” the company wrote in an update.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Supply-Chain Security

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/05/supply-chain_se.html

Earlier this month, the Pentagon stopped selling phones made by the Chinese companies ZTE and Huawei on military bases because they might be used to spy on their users.

It’s a legitimate fear, and perhaps a prudent action. But it’s just one instance of the much larger issue of securing our supply chains.

All of our computerized systems are deeply international, and we have no choice but to trust the companies and governments that touch those systems. And while we can ban a few specific products, services or companies, no country can isolate itself from potential foreign interference.

In this specific case, the Pentagon is concerned that the Chinese government demanded that ZTE and Huawei add “backdoors” to their phones that could be surreptitiously turned on by government spies or cause them to fail during some future political conflict. This tampering is possible because the software in these phones is incredibly complex. It’s relatively easy for programmers to hide these capabilities, and correspondingly difficult to detect them.

This isn’t the first time the United States has taken action against foreign software suspected to contain hidden features that can be used against us. Last December, President Trump signed into law a bill banning software from the Russian company Kaspersky from being used within the US government. In 2012, the focus was on Chinese-made Internet routers. Then, the House Intelligence Committee concluded: “Based on available classified and unclassified information, Huawei and ZTE cannot be trusted to be free of foreign state influence and thus pose a security threat to the United States and to our systems.”

Nor is the United States the only country worried about these threats. In 2014, China reportedly banned antivirus products from both Kaspersky and the US company Symantec, based on similar fears. In 2017, the Indian government identified 42 smartphone apps that China subverted. Back in 1997, the Israeli company Check Point was dogged by rumors that its government added backdoors into its products; other of that country’s tech companies have been suspected of the same thing. Even al-Qaeda was concerned; ten years ago, a sympathizer released the encryption software Mujahedeen Secrets, claimed to be free of Western influence and backdoors. If a country doesn’t trust another country, then it can’t trust that country’s computer products.

But this trust isn’t limited to the country where the company is based. We have to trust the country where the software is written — and the countries where all the components are manufactured. In 2016, researchers discovered that many different models of cheap Android phones were sending information back to China. The phones might be American-made, but the software was from China. In 2016, researchers demonstrated an even more devious technique, where a backdoor could be added at the computer chip level in the factory that made the chips ­ without the knowledge of, and undetectable by, the engineers who designed the chips in the first place. Pretty much every US technology company manufactures its hardware in countries such as Malaysia, Indonesia, China and Taiwan.

We also have to trust the programmers. Today’s large software programs are written by teams of hundreds of programmers scattered around the globe. Backdoors, put there by we-have-no-idea-who, have been discovered in Juniper firewalls and D-Link routers, both of which are US companies. In 2003, someone almost slipped a very clever backdoor into Linux. Think of how many countries’ citizens are writing software for Apple or Microsoft or Google.

We can go even farther down the rabbit hole. We have to trust the distribution systems for our hardware and software. Documents disclosed by Edward Snowden showed the National Security Agency installing backdoors into Cisco routers being shipped to the Syrian telephone company. There are fake apps in the Google Play store that eavesdrop on you. Russian hackers subverted the update mechanism of a popular brand of Ukrainian accounting software to spread the NotPetya malware.

In 2017, researchers demonstrated that a smartphone can be subverted by installing a malicious replacement screen.

I could go on. Supply-chain security is an incredibly complex problem. US-only design and manufacturing isn’t an option; the tech world is far too internationally interdependent for that. We can’t trust anyone, yet we have no choice but to trust everyone. Our phones, computers, software and cloud systems are touched by citizens of dozens of different countries, any one of whom could subvert them at the demand of their government. And just as Russia is penetrating the US power grid so they have that capability in the event of hostilities, many countries are almost certainly doing the same thing at the consumer level.

We don’t know whether the risk of Huawei and ZTE equipment is great enough to warrant the ban. We don’t know what classified intelligence the United States has, and what it implies. But we do know that this is just a minor fix for a much larger problem. It’s doubtful that this ban will have any real effect. Members of the military, and everyone else, can still buy the phones. They just can’t buy them on US military bases. And while the US might block the occasional merger or acquisition, or ban the occasional hardware or software product, we’re largely ignoring that larger issue. Solving it borders on somewhere between incredibly expensive and realistically impossible.

Perhaps someday, global norms and international treaties will render this sort of device-level tampering off-limits. But until then, all we can do is hope that this particular arms race doesn’t get too far out of control.

This essay previously appeared in the Washington Post.

Continued: the answers to your questions for Eben Upton

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/eben-q-a-2/

Last week, we shared the first half of our Q&A with Raspberry Pi Trading CEO and Raspberry Pi creator Eben Upton. Today we follow up with all your other questions, including your expectations for a Raspberry Pi 4, Eben’s dream add-ons, and whether we really could go smaller than the Zero.

Live Q&A with Eben Upton, creator of the Raspberry Pi

Get your questions to us now using #AskRaspberryPi on Twitter

With internet security becoming more necessary, will there be automated versions of VPN on an SD card?

There are already third-party tools which turn your Raspberry Pi into a VPN endpoint. Would we do it ourselves? Like the power button, it’s one of those cases where there are a million things we could do and so it’s more efficient to let the community get on with it.

Just to give a counterexample, while we don’t generally invest in optimising for particular use cases, we did invest a bunch of money into optimising Kodi to run well on Raspberry Pi, because we found that very large numbers of people were using it. So, if we find that we get half a million people a year using a Raspberry Pi as a VPN endpoint, then we’ll probably invest money into optimising it and feature it on the website as we’ve done with Kodi. But I don’t think we’re there today.

Have you ever seen any Pis running and doing important jobs in the wild, and if so, how does it feel?

It’s amazing how often you see them driving displays, for example in radio and TV studios. Of course, it feels great. There’s something wonderful about the geographic spread as well. The Raspberry Pi desktop is quite distinctive, both in its previous incarnation with the grey background and logo, and the current one where we have Greg Annandale’s road picture.

The PIXEL desktop on Raspberry Pi

And so it’s funny when you see it in places. Somebody sent me a video of them teaching in a classroom in rural Pakistan and in the background was Greg’s picture.

Raspberry Pi 4!?!

There will be a Raspberry Pi 4, obviously. We get asked about it a lot. I’m sticking to the guidance that I gave people that they shouldn’t expect to see a Raspberry Pi 4 this year. To some extent, the opportunity to do the 3B+ was a surprise: we were surprised that we’ve been able to get 200MHz more clock speed, triple the wireless and wired throughput, and better thermals, and still stick to the $35 price point.

We’re up against the wall from a silicon perspective; we’re at the end of what you can do with the 40nm process. It’s not that you couldn’t clock the processor faster, or put a larger processor which can execute more instructions per clock in there, it’s simply about the energy consumption and the fact that you can’t dissipate the heat. So we’ve got to go to a smaller process node and that’s an order of magnitude more challenging from an engineering perspective. There’s more effort, more risk, more cost, and all of those things are challenging.

With 3B+ out of the way, we’re going to start looking at this now. For the first six months or so we’re going to be figuring out exactly what people want from a Raspberry Pi 4. We’re listening to people’s comments about what they’d like to see in a new Raspberry Pi, and I’m hoping by early autumn we should have an idea of what we want to put in it and a strategy for how we might achieve that.

Could you go smaller than the Zero?

The challenge with Zero as that we’re periphery-limited. If you run your hand around the unit, there is no edge of that board that doesn’t have something there. So the question is: “If you want to go smaller than Zero, what feature are you willing to throw out?”

It’s a single-sided board, so you could certainly halve the PCB area if you fold the circuitry and use both sides, though you’d have to lose something. You could give up some GPIO and go back to 26 pins like the first Raspberry Pi. You could give up the camera connector, you could go to micro HDMI from mini HDMI. You could remove the SD card and just do USB boot. I’m inventing a product live on air! But really, you could get down to two thirds and lose a bunch of GPIO – it’s hard to imagine you could get to half the size.

What’s the one feature that you wish you could outfit on the Raspberry Pi that isn’t cost effective at this time? Your dream feature.

Well, more memory. There are obviously technical reasons why we don’t have more memory on there, but there are also market reasons. People ask “why doesn’t the Raspberry Pi have more memory?”, and my response is typically “go and Google ‘DRAM price’”. We’re used to the price of memory going down. And currently, we’re going through a phase where this has turned around and memory is getting more expensive again.

Machine learning would be interesting. There are machine learning accelerators which would be interesting to put on a piece of hardware. But again, they are not going to be used by everyone, so according to our method of pricing what we might add to a board, machine learning gets treated like a $50 chip. But that would be lovely to do.

Which citizen science projects using the Pi have most caught your attention?

I like the wildlife camera projects. We live out in the countryside in a little village, and we’re conscious of being surrounded by nature but we don’t see a lot of it on a day-to-day basis. So I like the nature cam projects, though, to my everlasting shame, I haven’t set one up yet. There’s a range of them, from very professional products to people taking a Raspberry Pi and a camera and putting them in a plastic box. So those are good fun.

Raspberry Shake seismometer

The Raspberry Shake seismometer

And there’s Meteor Pi from the Cambridge Science Centre, that’s a lot of fun. And the seismometer Raspberry Shake – that sort of thing is really nice. We missed the recent South Wales earthquake; perhaps we should set one up at our Californian office.

How does it feel to go to bed every day knowing you’ve changed the world for the better in such a massive way?

What feels really good is that when we started this in 2006 nobody else was talking about it, but now we’re part of a very broad movement.

We were in a really bad way: we’d seen a collapse in the number of applicants applying to study Computer Science at Cambridge and elsewhere. In our view, this reflected a move away from seeing technology as ‘a thing you do’ to seeing it as a ‘thing that you have done to you’. It is problematic from the point of view of the economy, industry, and academia, but most importantly it damages the life prospects of individual children, particularly those from disadvantaged backgrounds. The great thing about STEM subjects is that you can’t fake being good at them. There are a lot of industries where your Dad can get you a job based on who he knows and then you can kind of muddle along. But if your dad gets you a job building bridges and you suck at it, after the first or second bridge falls down, then you probably aren’t going to be building bridges anymore. So access to STEM education can be a great driver of social mobility.

By the time we were launching the Raspberry Pi in 2012, there was this wonderful movement going on. Code Club, for example, and CoderDojo came along. Lots of different ways of trying to solve the same problem. What feels really, really good is that we’ve been able to do this as part of an enormous community. And some parts of that community became part of the Raspberry Pi Foundation – we merged with Code Club, we merged with CoderDojo, and we continue to work alongside a lot of these other organisations. So in the two seconds it takes me to fall asleep after my face hits the pillow, that’s what I think about.

We’re currently advertising a Programme Manager role in New Delhi, India. Did you ever think that Raspberry Pi would be advertising a role like this when you were bringing together the Foundation?

No, I didn’t.

But if you told me we were going to be hiring somewhere, India probably would have been top of my list because there’s a massive IT industry in India. When we think about our interaction with emerging markets, India, in a lot of ways, is the poster child for how we would like it to work. There have already been some wonderful deployments of Raspberry Pi, for example in Kerala, without our direct involvement. And we think we’ve got something that’s useful for the Indian market. We have a product, we have clubs, we have teacher training. And we have a body of experience in how to teach people, so we have a physical commercial product as well as a charitable offering that we think are a good fit.

It’s going to be massive.

What is your favourite BBC type-in listing?

There was a game called Codename: Druid. There is a famous game called Codename: Droid which was the sequel to Stryker’s Run, which was an awesome, awesome game. And there was a type-in game called Codename: Druid, which was at the bottom end of what you would consider a commercial game.

codename druid

And I remember typing that in. And what was really cool about it was that the next month, the guy who wrote it did another article that talks about the memory map and which operating system functions used which bits of memory. So if you weren’t going to do disc access, which bits of memory could you trample on and know the operating system would survive.

babbage versus bugs Raspberry Pi annual

See the full listing for Babbage versus Bugs in the Raspberry Pi 2018 Annual

I still like type-in listings. The Raspberry Pi 2018 Annual has a type-in listing that I wrote for a Babbage versus Bugs game. I will say that’s not the last type-in listing you will see from me in the next twelve months. And if you download the PDF, you could probably copy and paste it into your favourite text editor to save yourself some time.

The post Continued: the answers to your questions for Eben Upton appeared first on Raspberry Pi.

Registrars Suspend 11 Pirate Site Domains, 89 More in the Crosshairs

Post Syndicated from Andy original https://torrentfreak.com/registrars-suspend-11-pirate-site-domains-89-more-in-the-crosshairs-180423/

In addition to website blocking which is running rampant across dozens of countries right now, targeting the domains of pirate sites is considered to be a somewhat effective anti-piracy tool.

The vast majority of websites are found using a recognizable name so when they become inaccessible, site operators have to work quickly to get the message out to fans. That can mean losing visitors, at least in the short term, and also contributes to the rise of copy-cat sites that may not have users’ best interests at heart.

Nevertheless, crime-fighting has always been about disrupting the ability of the enemy to do business so with this in mind, authorities in India began taking advice from the UK’s Police Intellectual Property Crime Unit (PIPCU) a couple of years ago.

After studying the model developed by PIPCU, India formed its Digital Crime Unit (DCU), which follows a multi-stage plan.

Initially, pirate sites and their partners are told to cease-and-desist. Next, complaints are filed with advertisers, who are asked to stop funding site activities. Service providers and domain registrars also receive a written complaint from the DCU, asking them to suspend services to the sites in question.

Last July, the DCU earmarked around 9,000 sites where pirated content was being made available. From there, 1,300 were placed on a shortlist for targeted action. Precisely how many have been contacted thus far is unclear but authorities are now reporting success.

According to local reports, the Maharashtra government’s Digital Crime Unit has managed to have 11 pirate site domains suspended following complaints from players in the entertainment industry.

As is often the case (and to avoid them receiving even more attention) the sites in question aren’t being named but according to Brijesh Singh, special Inspector General of Police in Maharashtra, the sites had a significant number of visitors.

Their domain registrars were sent a notice under Section 149 of the Code Of Criminal Procedure, which grants police the power to take preventative action when a crime is suspected. It’s yet to be confirmed officially but it seems likely that pirate sites utilizing local registrars were targeted by the authorities.

“Responding to our notice, the domain names of all these websites, that had a collective viewership of over 80 million, were suspended,” Singh said.

Laxman Kamble, a police inspector attached to the state government’s Cyber Cell, said the pilot project was launched after the government received complaints from Viacom and Star but back in January there were reports that the MPAA had also become involved.

Using the model pioneered by London’s PIPCU, 19 parameters were applied to list of pirate sites in order to place them on the shortlist. They are reported to include the type of content being uploaded, downloaded, and the number of downloads overall.

Kamble reports that a further 89 websites, that have domains registered abroad but are very popular in India, are now being targeted. Whether overseas registrars will prove as compliant will remain to be seen. After booking initial success, even PIPCU itself experienced problems keeping up the momentum with registrars.

In 2014, information obtained by TorrentFreak following a Freedom of Information request revealed that only five out of 70 domain registrars had complied with police requests to suspend domains.

A year later, PIPCU confirmed that suspending pirate domain names was no longer a priority for them after ICANN ruled that registrars don’t have to suspend domain names without a valid court order.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Welcome Nathan – Our Solutions Engineer

Post Syndicated from Yev original https://www.backblaze.com/blog/welcome-nathan-our-solutions-engineer/

Backblaze is growing, and with it our need to cater to a lot of different use cases that our customers bring to us. We needed a Solutions Engineer to help out, and after a long search we’ve hired our first one! Lets learn a bit more about Nathan shall we?

What is your Backblaze Title?
Solutions Engineer. Our customers bring a thousand different use cases to both B1 and B2, and I’m here to help them figure out how best to make those use cases a reality. Also, any odd jobs that Nilay wants me to do.

Where are you originally from?
I am native to the San Francisco Bay Area, studying mathematics at UC Santa Cruz, and then computer science at California University of Hayward (which has since renamed itself California University of the East Hills. I observe that it’s still in Hayward).

What attracted you to Backblaze?
As a stable, growing company with huge growth and even bigger potential, the business model is attractive, and the team is outstanding. Add to that the strong commitment to transparency, and it’s a hard company to resist. We can store – and restore – data while offering superior reliability at an economic advantage to do-it-yourself, and that’s a great place to be.

What do you expect to learn while being at Backblaze?
Everything I need to, but principally how our customers choose to interact with web storage. Storage isn’t a solution per se, but it’s an important component of any persistent solution. I’m looking forward to working with all the different concepts our customers have to make use of storage.

Where else have you worked?
All sorts of places, but I’ll admit publicly to EMC, Gemalto, and my own little (failed, alas) startup, IC2N. I worked with low-level document imaging.

Where did you go to school?
UC Santa Cruz, BA Mathematics CU Hayward, Master of Science in Computer Science.

What’s your dream job?
Sipping tea in the California redwood forest. However, solutions engineer at Backblaze is a good second choice!

Favorite place you’ve traveled?
Ashland, Oregon, for the Oregon Shakespeare Festival and the marble caves (most caves form from limestone).

Favorite hobby?
Theater. Pathfinder. Writing. Baking cookies and cakes.

Of what achievement are you most proud?
Marrying the most wonderful man in the world.

Star Trek or Star Wars?
Star Trek’s utopian science fiction vision of humanity and science resonates a lot more strongly with me than the dystopian science fantasy of Star Wars.

Coke or Pepsi?
Neither. I’d much rather have a cup of jasmine tea.

Favorite food?
It varies, but I love Indian and Thai cuisine. Truly excellent Italian food is marvelous – wood fired pizza, if I had to pick only one, but the world would be a boring place with a single favorite food.

Why do you like certain things?
If I knew that, I’d be in marketing.

Anything else you’d like you’d like to tell us?
If you haven’t already encountered the amazing authors Patricia McKillip and Lois McMasters Bujold – go encounter them. Be happy.

There’s nothing wrong with a nice cup of tea and a long game of Pathfinder. Sign us up! Welcome to the team Nathan!

The post Welcome Nathan – Our Solutions Engineer appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Big Birthday Weekend 2018: find a Jam near you!

Post Syndicated from Ben Nuttall original https://www.raspberrypi.org/blog/big-birthday-weekend-2018-find-a-jam-near-you/

We’re just over three weeks away from the Raspberry Jam Big Birthday Weekend 2018, our community celebration of Raspberry Pi’s sixth birthday. Instead of an event in Cambridge, as we’ve held in the past, we’re coordinating Raspberry Jam events to take place around the world on 3–4 March, so that as many people as possible can join in. Well over 100 Jams have been confirmed so far.

Raspberry Pi Big Birthday Weekend Jam

Find a Jam near you

There are Jams planned in Argentina, Australia, Bolivia, Brazil, Bulgaria, Cameroon, Canada, Colombia, Dominican Republic, France, Germany, Greece, Hungary, India, Iran, Ireland, Italy, Japan, Kenya, Malaysia, Malta, Mexico, Netherlands, Norway, Papua New Guinea, Peru, Philippines, Poland, South Africa, Spain, Taiwan, Turkey, United Kingdom, United States, and Zimbabwe.

Take a look at the events map and the full list (including those who haven’t added their event to the map quite yet).

Raspberry Jam Big Birthday Weekend 2018 event map

We will have Raspberry Jams in 35 countries across six continents

Birthday kits

We had some special swag made especially for the birthday, including these T-shirts, which we’ve sent to Jam organisers:

Raspberry Jam Big Birthday Weekend 2018 T-shirt

There is also a poster with a list of participating Jams, which you can download:

Raspberry Jam Big Birthday Weekend 2018 list

Raspberry Jam photo booth

I created a Raspberry Jam photo booth that overlays photos with the Big Birthday Weekend logo and then tweets the picture from your Jam’s account — you’ll be seeing plenty of those if you follow the #PiParty hashtag on 3–4 March.

Check out the project on GitHub, and feel free to set up your own booth, or modify it to your own requirements. We’ve included text annotations in several languages, and more contributions are very welcome.

There’s still time…

If you can’t find a Jam near you, there’s still time to organise one for the Big Birthday Weekend. All you need to do is find a venue — a room in a school or library will do — and think about what you’d like to do at the event. Some Jams have Raspberry Pis set up for workshops and practical activities, some arrange tech talks, some put on show-and-tell — it’s up to you. To help you along, there’s the Raspberry Jam Guidebook full of advice and tips from Jam organisers.

Raspberry Pi on Twitter

The packed. And they packed. And they packed some more. Who’s expecting one of these #rjam kits for the Raspberry Jam Big Birthday Weekend?

Download the Raspberry Jam branding pack, and the special birthday branding pack, where you’ll find logos, graphical assets, flyer templates, worksheets, and more. When you’re ready to announce your event, create a webpage for it — you can use a site like Eventbrite or Meetup — and submit your Jam to us so it will appear on the Jam map!

We are six

We’re really looking forward to celebrating our birthday with thousands of people around the world. Over 48 hours, people of all ages will come together at more than 100 events to learn, share ideas, meet people, and make things during our Big Birthday Weekend.

Raspberry Jam Manchester
Raspberry Jam Manchester
Raspberry Jam Manchester

Since we released the first Raspberry Pi in 2012, we’ve sold 17 million of them. We’re also reaching almost 200000 children in 130 countries around the world through Code Club and CoderDojo, we’ve trained over 1500 Raspberry Pi Certified Educators, and we’ve sent code written by more than 6800 children into space. Our magazines are read by a quarter of a million people, and millions more use our free online learning resources. There’s plenty to celebrate and even more still to do: we really hope you’ll join us from a Jam near you on 3–4 March.

The post Big Birthday Weekend 2018: find a Jam near you! appeared first on Raspberry Pi.

Success at Apache: A Newbie’s Narrative

Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/170536010891

yahoodevelopers:

Kuhu Shukla (bottom center) and team at the 2017 DataWorks Summit


By Kuhu Shukla

This post first appeared here on the Apache Software Foundation blog as part of ASF’s “Success at Apache” monthly blog series.

As I sit at my desk on a rather frosty morning with my coffee, looking up new JIRAs from the previous day in the Apache Tez project, I feel rather pleased. The latest community release vote is complete, the bug fixes that we so badly needed are in and the new release that we tested out internally on our many thousand strong cluster is looking good. Today I am looking at a new stack trace from a different Apache project process and it is hard to miss how much of the exceptional code I get to look at every day comes from people all around the globe. A contributor leaves a JIRA comment before he goes on to pick up his kid from soccer practice while someone else wakes up to find that her effort on a bug fix for the past two months has finally come to fruition through a binding +1.

Yahoo – which joined AOL, HuffPost, Tumblr, Engadget, and many more brands to form the Verizon subsidiary Oath last year – has been at the frontier of open source adoption and contribution since before I was in high school. So while I have no historical trajectories to share, I do have a story on how I found myself in an epic journey of migrating all of Yahoo jobs from Apache MapReduce to Apache Tez, a then-new DAG based execution engine.

Oath grid infrastructure is through and through driven by Apache technologies be it storage through HDFS, resource management through YARN, job execution frameworks with Tez and user interface engines such as Hive, Hue, Pig, Sqoop, Spark, Storm. Our grid solution is specifically tailored to Oath’s business-critical data pipeline needs using the polymorphic technologies hosted, developed and maintained by the Apache community.

On the third day of my job at Yahoo in 2015, I received a YouTube link on An Introduction to Apache Tez. I watched it carefully trying to keep up with all the questions I had and recognized a few names from my academic readings of Yarn ACM papers. I continued to ramp up on YARN and HDFS, the foundational Apache technologies Oath heavily contributes to even today. For the first few weeks I spent time picking out my favorite (necessary) mailing lists to subscribe to and getting started on setting up on a pseudo-distributed Hadoop cluster. I continued to find my footing with newbie contributions and being ever more careful with whitespaces in my patches. One thing was clear – Tez was the next big thing for us. By the time I could truly call myself a contributor in the Hadoop community nearly 80-90% of the Yahoo jobs were now running with Tez. But just like hiking up the Grand Canyon, the last 20% is where all the pain was. Being a part of the solution to this challenge was a happy prospect and thankfully contributing to Tez became a goal in my next quarter.

The next sprint planning meeting ended with me getting my first major Tez assignment – progress reporting. The progress reporting in Tez was non-existent – “Just needs an API fix,”  I thought. Like almost all bugs in this ecosystem, it was not easy. How do you define progress? How is it different for different kinds of outputs in a graph? The questions were many.

I, however, did not have to go far to get answers. The Tez community actively came to a newbie’s rescue, finding answers and posing important questions. I started attending the bi-weekly Tez community sync up calls and asking existing contributors and committers for course correction. Suddenly the team was much bigger, the goals much more chiseled. This was new to anyone like me who came from the networking industry, where the most open part of the code are the RFCs and the implementation details are often hidden. These meetings served as a clean room for our coding ideas and experiments. Ideas were shared, to the extent of which data structure we should pick and what a future user of Tez would take from it. In between the usual status updates and extensive knowledge transfers were made.

Oath uses Apache Pig and Apache Hive extensively and most of the urgent requirements and requests came from Pig and Hive developers and users. Each issue led to a community JIRA and as we started running Tez at Oath scale, new feature ideas and bugs around performance and resource utilization materialized. Every year most of the Hadoop team at Oath travels to the Hadoop Summit where we meet our cohorts from the Apache community and we stand for hours discussing the state of the art and what is next for the project. One such discussion set the course for the next year and a half for me.

We needed an innovative way to shuffle data. Frameworks like MapReduce and Tez have a shuffle phase in their processing lifecycle wherein the data from upstream producers is made available to downstream consumers. Even though Apache Tez was designed with a feature set corresponding to optimization requirements in Pig and Hive, the Shuffle Handler Service was retrofitted from MapReduce at the time of the project’s inception. With several thousands of jobs on our clusters leveraging these features in Tez, the Shuffle Handler Service became a clear performance bottleneck. So as we stood talking about our experience with Tez with our friends from the community, we decided to implement a new Shuffle Handler for Tez. All the conversation points were tracked now through an umbrella JIRA TEZ-3334 and the to-do list was long. I picked a few JIRAs and as I started reading through I realized, this is all new code I get to contribute to and review. There might be a better way to put this, but to be honest it was just a lot of fun! All the whiteboards were full, the team took walks post lunch and discussed how to go about defining the API. Countless hours were spent debugging hangs while fetching data and looking at stack traces and Wireshark captures from our test runs. Six months in and we had the feature on our sandbox clusters. There were moments ranging from sheer frustration to absolute exhilaration with high fives as we continued to address review comments and fixing big and small issues with this evolving feature.

As much as owning your code is valued everywhere in the software community, I would never go on to say “I did this!” In fact, “we did!” It is this strong sense of shared ownership and fluid team structure that makes the open source experience at Apache truly rewarding. This is just one example. A lot of the work that was done in Tez was leveraged by the Hive and Pig community and cross Apache product community interaction made the work ever more interesting and challenging. Triaging and fixing issues with the Tez rollout led us to hit a 100% migration score last year and we also rolled the Tez Shuffle Handler Service out to our research clusters. As of last year we have run around 100 million Tez DAGs with a total of 50 billion tasks over almost 38,000 nodes.

In 2018 as I move on to explore Hadoop 3.0 as our future release, I hope that if someone outside the Apache community is reading this, it will inspire and intrigue them to contribute to a project of their choice. As an astronomy aficionado, going from a newbie Apache contributor to a newbie Apache committer was very much like looking through my telescope - it has endless possibilities and challenges you to be your best.

About the Author:

Kuhu Shukla is a software engineer at Oath and did her Masters in Computer Science at North Carolina State University. She works on the Big Data Platforms team on Apache Tez, YARN and HDFS with a lot of talented Apache PMCs and Committers in Champaign, Illinois. A recent Apache Tez Committer herself she continues to contribute to YARN and HDFS and spoke at the 2017 Dataworks Hadoop Summit on “Tez Shuffle Handler: Shuffling At Scale With Apache Hadoop”. Prior to that she worked on Juniper Networks’ router and switch configuration APIs. She likes to participate in open source conferences and women in tech events. In her spare time she loves singing Indian classical and jazz, laughing, whale watching, hiking and peering through her Dobsonian telescope.

Meet India’s women Open Source warriors (Factor Daily)

Post Syndicated from corbet original https://lwn.net/Articles/746546/rss

The Factor Daily site has a
look at work to increase the diversity
of open-source contributors in
India. “Over past two months, we interviewed at least two dozen
people from within and outside the open source community to identify a set
of women open source contributors from India. While the list is not
conclusive by any measure, it’s a good starting point in identifying the
women who are quietly shaping the future of open source from this part of
the world and how they dealt with gender biases.

2017 Weather Station round-up

Post Syndicated from Richard Hayler original https://www.raspberrypi.org/blog/2017-weather-station/

As we head into 2018 and start looking forward to longer days in the Northern hemisphere, I thought I’d take a look back at last year’s weather using data from Raspberry Pi Oracle Weather Stations. One of the great things about the kit is that as well as uploading all its readings to the shared online Oracle database, it stores them locally on the Pi in a MySQL or MariaDB database. This means you can use the power of SQL queries coupled with Python code to do automatic data analysis.

Soggy Surrey

My Weather Station has only been installed since May, so I didn’t have a full 52 weeks of my own data to investigate. Still, my station recorded more than 70000 measurements. Living in England, the first thing I wanted to know was: which was the wettest month? Unsurprisingly, both in terms of average daily rainfall and total rainfall, the start of the summer period — exactly when I went on a staycation — was the soggiest:

What about the global Weather Station community?

Even soggier Bavaria

Here things get slightly trickier. Although we have a shiny Oracle database full of all participating schools’ sensor readings, some of the data needs careful interpretation. Many kits are used as part of the school curriculum and do not always record genuine outdoor conditions. Nevertheless, it appears that Adalbert Stifter Gymnasium in Bavaria, Germany, had an even wetter 2017 than my home did:


View larger map

Where the wind blows

The records Robert-Dannemann Schule in Westerstede, Germany, is a good example of data which was most likely collected while testing and investigating the weather station sensors, rather than in genuine external conditions. Unless this school’s Weather Station was transported to a planet which suffers from extreme hurricanes, it wasn’t actually subjected to wind speeds above 1000km/h in November. Dismissing these and all similarly suspect records, I decided to award the ‘Windiest location of the year’ prize to CEIP Noalla-Telleiro, Spain.


View larger map

This school is right on the coast, and is subject to some strong and squally weather systems.

Weather Station at CEIP Noalla - Telleiro

Weather Station at CEIP Noalla-Telleiro

They’ve mounted their wind vane and anemometer nice and high, so I can see how they were able to record such high wind velocities.

A couple of Weather Stations have recently been commissioned in equally exposed places — it will be interesting to see whether they will record even higher speeds during 2018.

Highs and lows

After careful analysis and a few disqualifications (a couple of Weather Stations in contention for this category were housed indoors), the ‘Hottest location’ award went to High School of Chalastra in Thessaloniki, Greece. There were a couple of Weather Stations (the one at The Marwadi Education Foundation in India, for example) that reported higher average temperatures than Chalastra’s 24.54 ºC. However, they had uploaded far fewer readings and their data coverage of 2017 was only partial.


View larger map

At the other end of the thermometer, the location with the coldest average temperature is École de la Rose Sauvage in Calgary, Canada, with a very chilly 9.9 ºC.

Ecole de la Rose sauvage Weather Station

Weather Station at École de la Rose Sauvage

I suspect this school has a good chance of retaining the title: their lowest 2017 temperature of -24 ºC is likely to be beaten in 2018 due to extreme weather currently bringing a freezing start to the year in that part of the world.


View larger map

Analyse your own Weather Station data

If you have an Oracle Raspberry Pi Weather Station and would like to perform an annual review of your local data, you can use this Python script as a starting point. It will display a monthly summary of the temperature and rainfall for 2017, and you should be able to customise the code to focus on other sensor data or on a particular time of year. We’d love to see your results, so please share your findings with [email protected], and we’ll send you some limited-edition Weather Station stickers.

The post 2017 Weather Station round-up appeared first on Raspberry Pi.

Top 8 Best Practices for High-Performance ETL Processing Using Amazon Redshift

Post Syndicated from Thiyagarajan Arumugam original https://aws.amazon.com/blogs/big-data/top-8-best-practices-for-high-performance-etl-processing-using-amazon-redshift/

An ETL (Extract, Transform, Load) process enables you to load data from source systems into your data warehouse. This is typically executed as a batch or near-real-time ingest process to keep the data warehouse current and provide up-to-date analytical data to end users.

Amazon Redshift is a fast, petabyte-scale data warehouse that enables you easily to make data-driven decisions. With Amazon Redshift, you can get insights into your big data in a cost-effective fashion using standard SQL. You can set up any type of data model, from star and snowflake schemas, to simple de-normalized tables for running any analytical queries.

To operate a robust ETL platform and deliver data to Amazon Redshift in a timely manner, design your ETL processes to take account of Amazon Redshift’s architecture. When migrating from a legacy data warehouse to Amazon Redshift, it is tempting to adopt a lift-and-shift approach, but this can result in performance and scale issues long term. This post guides you through the following best practices for ensuring optimal, consistent runtimes for your ETL processes:

  • COPY data from multiple, evenly sized files.
  • Use workload management to improve ETL runtimes.
  • Perform table maintenance regularly.
  • Perform multiple steps in a single transaction.
  • Loading data in bulk.
  • Use UNLOAD to extract large result sets.
  • Use Amazon Redshift Spectrum for ad hoc ETL processing.
  • Monitor daily ETL health using diagnostic queries.

1. COPY data from multiple, evenly sized files

Amazon Redshift is an MPP (massively parallel processing) database, where all the compute nodes divide and parallelize the work of ingesting data. Each node is further subdivided into slices, with each slice having one or more dedicated cores, equally dividing the processing capacity. The number of slices per node depends on the node type of the cluster. For example, each DS2.XLARGE compute node has two slices, whereas each DS2.8XLARGE compute node has 16 slices.

When you load data into Amazon Redshift, you should aim to have each slice do an equal amount of work. When you load the data from a single large file or from files split into uneven sizes, some slices do more work than others. As a result, the process runs only as fast as the slowest, or most heavily loaded, slice. In the example shown below, a single large file is loaded into a two-node cluster, resulting in only one of the nodes, “Compute-0”, performing all the data ingestion:

When splitting your data files, ensure that they are of approximately equal size – between 1 MB and 1 GB after compression. The number of files should be a multiple of the number of slices in your cluster. Also, I strongly recommend that you individually compress the load files using gzip, lzop, or bzip2 to efficiently load large datasets.

When loading multiple files into a single table, use a single COPY command for the table, rather than multiple COPY commands. Amazon Redshift automatically parallelizes the data ingestion. Using a single COPY command to bulk load data into a table ensures optimal use of cluster resources, and quickest possible throughput.

2. Use workload management to improve ETL runtimes

Use Amazon Redshift’s workload management (WLM) to define multiple queues dedicated to different workloads (for example, ETL versus reporting) and to manage the runtimes of queries. As you migrate more workloads into Amazon Redshift, your ETL runtimes can become inconsistent if WLM is not appropriately set up.

I recommend limiting the overall concurrency of WLM across all queues to around 15 or less. This WLM guide helps you organize and monitor the different queues for your Amazon Redshift cluster.

When managing different workloads on your Amazon Redshift cluster, consider the following for the queue setup:

  • Create a queue dedicated to your ETL processes. Configure this queue with a small number of slots (5 or fewer). Amazon Redshift is designed for analytics queries, rather than transaction processing. The cost of COMMIT is relatively high, and excessive use of COMMIT can result in queries waiting for access to the commit queue. Because ETL is a commit-intensive process, having a separate queue with a small number of slots helps mitigate this issue.
  • Claim extra memory available in a queue. When executing an ETL query, you can take advantage of the wlm_query_slot_count to claim the extra memory available in a particular queue. For example, a typical ETL process might involve COPYing raw data into a staging table so that downstream ETL jobs can run transformations that calculate daily, weekly, and monthly aggregates. To speed up the COPY process (so that the downstream tasks can start in parallel sooner), the wlm_query_slot_count can be increased for this step.
  • Create a separate queue for reporting queries. Configure query monitoring rules on this queue to further manage long-running and expensive queries.
  • Take advantage of the dynamic memory parameters. They swap the memory from your ETL to your reporting queue after the ETL job has completed.

3. Perform table maintenance regularly

Amazon Redshift is a columnar database, which enables fast transformations for aggregating data. Performing regular table maintenance ensures that transformation ETLs are predictable and performant. To get the best performance from your Amazon Redshift database, you must ensure that database tables regularly are VACUUMed and ANALYZEd. The Analyze & Vacuum schema utility helps you automate the table maintenance task and have VACUUM & ANALYZE executed in a regular fashion.

  • Use VACUUM to sort tables and remove deleted blocks

During a typical ETL refresh process, tables receive new incoming records using COPY, and unneeded data (cold data) is removed using DELETE. New rows are added to the unsorted region in a table. Deleted rows are simply marked for deletion.

DELETE does not automatically reclaim the space occupied by the deleted rows. Adding and removing large numbers of rows can therefore cause the unsorted region and the number of deleted blocks to grow. This can degrade the performance of queries executed against these tables.

After an ETL process completes, perform VACUUM to ensure that user queries execute in a consistent manner. The complete list of tables that need VACUUMing can be found using the Amazon Redshift Util’s table_info script.

Use the following approaches to ensure that VACCUM is completed in a timely manner:

  • Use wlm_query_slot_count to claim all the memory allocated in the ETL WLM queue during the VACUUM process.
  • DROP or TRUNCATE intermediate or staging tables, thereby eliminating the need to VACUUM them.
  • If your table has a compound sort key with only one sort column, try to load your data in sort key order. This helps reduce or eliminate the need to VACUUM the table.
  • Consider using time series This helps reduce the amount of data you need to VACUUM.
  • Use ANALYZE to update database statistics

Amazon Redshift uses a cost-based query planner and optimizer using statistics about tables to make good decisions about the query plan for the SQL statements. Regular statistics collection after the ETL completion ensures that user queries run fast, and that daily ETL processes are performant. The Amazon Redshift utility table_info script provides insights into the freshness of the statistics. Keeping the statistics off (pct_stats_off) less than 20% ensures effective query plans for the SQL queries.

4. Perform multiple steps in a single transaction

ETL transformation logic often spans multiple steps. Because commits in Amazon Redshift are expensive, if each ETL step performs a commit, multiple concurrent ETL processes can take a long time to execute.

To minimize the number of commits in a process, the steps in an ETL script should be surrounded by a BEGIN…END statement so that a single commit is performed only after all the transformation logic has been executed. For example, here is an example multi-step ETL script that performs one commit at the end:

Begin
CREATE temporary staging_table;
INSERT INTO staging_table SELECT .. FROM source (transformation logic);
DELETE FROM daily_table WHERE dataset_date =?;
INSERT INTO daily_table SELECT .. FROM staging_table (daily aggregate);
DELETE FROM weekly_table WHERE weekending_date=?;
INSERT INTO weekly_table SELECT .. FROM staging_table(weekly aggregate);
Commit

5. Loading data in bulk

Amazon Redshift is designed to store and query petabyte-scale datasets. Using Amazon S3 you can stage and accumulate data from multiple source systems before executing a bulk COPY operation. The following methods allow efficient and fast transfer of these bulk datasets into Amazon Redshift:

  • Use a manifest file to ingest large datasets that span multiple files. The manifest file is a JSON file that lists all the files to be loaded into Amazon Redshift. Using a manifest file ensures that Amazon Redshift has a consistent view of the data to be loaded from S3, while also ensuring that duplicate files do not result in the same data being loaded more than one time.
  • Use temporary staging tables to hold the data for transformation. These tables are automatically dropped after the ETL session is complete. Temporary tables can be created using the CREATE TEMPORARY TABLE syntax, or by issuing a SELECT … INTO #TEMP_TABLE query. Explicitly specifying the CREATE TEMPORARY TABLE statement allows you to control the DISTRIBUTION KEY, SORT KEY, and compression settings to further improve performance.
  • User ALTER table APPEND to swap data from the staging tables to the target table. Data in the source table is moved to matching columns in the target table. Column order doesn’t matter. After data is successfully appended to the target table, the source table is empty. ALTER TABLE APPEND is much faster than a similar CREATE TABLE AS or INSERT INTO operation because it doesn’t involve copying or moving data.

6. Use UNLOAD to extract large result sets

Fetching a large number of rows using SELECT is expensive and takes a long time. When a large amount of data is fetched from the Amazon Redshift cluster, the leader node has to hold the data temporarily until the fetches are complete. Further, data is streamed out sequentially, which results in longer elapsed time. As a result, the leader node can become hot, which not only affects the SELECT that is being executed, but also throttles resources for creating execution plans and managing the overall cluster resources. Here is an example of a large SELECT statement. Notice that the leader node is doing most of the work to stream out the rows:

Use UNLOAD to extract large results sets directly to S3. After it’s in S3, the data can be shared with multiple downstream systems. By default, UNLOAD writes data in parallel to multiple files according to the number of slices in the cluster. All the compute nodes participate to quickly offload the data into S3.

If you are extracting data for use with Amazon Redshift Spectrum, you should make use of the MAXFILESIZE parameter to and keep files are 150 MB. Similar to item 1 above, having many evenly sized files ensures that Redshift Spectrum can do the maximum amount of work in parallel.

7. Use Redshift Spectrum for ad hoc ETL processing

Events such as data backfill, promotional activity, and special calendar days can trigger additional data volumes that affect the data refresh times in your Amazon Redshift cluster. To help address these spikes in data volumes and throughput, I recommend staging data in S3. After data is organized in S3, Redshift Spectrum enables you to query it directly using standard SQL. In this way, you gain the benefits of additional capacity without having to resize your cluster.

For tips on getting started with and optimizing the use of Redshift Spectrum, see the previous post, 10 Best Practices for Amazon Redshift Spectrum.

8. Monitor daily ETL health using diagnostic queries

Monitoring the health of your ETL processes on a regular basis helps identify the early onset of performance issues before they have a significant impact on your cluster. The following monitoring scripts can be used to provide insights into the health of your ETL processes:

ScriptUse when…Solution
commit_stats.sql – Commit queue statistics from past days, showing largest queue length and queue time firstDML statements such as INSERT/UPDATE/COPY/DELETE operations take several times longer to execute when multiple of these operations are in progressSet up separate WLM queues for the ETL process and limit the concurrency to < 5.
copy_performance.sql –  Copy command statistics for the past days Daily COPY operations take longer to execute• Follow the best practices for the COPY command.
• Analyze data growth with the incoming datasets and consider cluster resize to meet the expected SLA.
table_info.sql – Table skew and unsorted statistics along with storage and key informationTransformation steps take longer to execute• Set up regular VACCUM jobs to address unsorted rows and claim the deleted blocks so that transformation SQL execute optimally.
• Consider a table redesign to avoid data skewness.
v_check_transaction_locks.sql – Monitor transaction locksINSERT/UPDATE/COPY/DELETE operations on particular tables do not respond back in timely manner, compared to when run after the ETLMultiple DML statements are operating on the same target table at the same moment from different transactions. Set up ETL job dependency so that they execute serially for the same target table.
v_get_schema_priv_by_user.sql – Get the schema that the user has access toReporting users can view intermediate tablesSet up separate database groups for reporting and ETL users, and grants access to objects using GRANT.
v_generate_tbl_ddl.sql – Get the table DDLYou need to create an empty table with same structure as target table for data backfillGenerate DDL using this script for data backfill.
v_space_used_per_tbl.sql – monitor space used by individual tablesAmazon Redshift data warehouse space growth is trending upwards more than normal

Analyze the individual tables that are growing at higher rate than normal. Consider data archival using UNLOAD to S3 and Redshift Spectrum for later analysis.

Use unscanned_table_summary.sql to find unused table and archive or drop them.

top_queries.sql – Return the top 50 time consuming statements aggregated by its textETL transformations are taking longer to executeAnalyze the top transformation SQL and use EXPLAIN to find opportunities for tuning the query plan.

There are several other useful scripts available in the amazon-redshift-utils repository. The AWS Lambda Utility Runner runs a subset of these scripts on a scheduled basis, allowing you to automate much of monitoring of your ETL processes.

Example ETL process

The following ETL process reinforces some of the best practices discussed in this post. Consider the following four-step daily ETL workflow where data from an RDBMS source system is staged in S3 and then loaded into Amazon Redshift. Amazon Redshift is used to calculate daily, weekly, and monthly aggregations, which are then unloaded to S3, where they can be further processed and made available for end-user reporting using a number of different tools, including Redshift Spectrum and Amazon Athena.

Step 1:  Extract from the RDBMS source to a S3 bucket

In this ETL process, the data extract job fetches change data every 1 hour and it is staged into multiple hourly files. For example, the staged S3 folder looks like the following:

 [[email protected] ~]$ aws s3 ls s3://<<S3 Bucket>>/batch/2017/07/02/
2017-07-02 01:59:58   81900220 20170702T01.export.gz
2017-07-02 02:59:56   84926844 20170702T02.export.gz
2017-07-02 03:59:54   78990356 20170702T03.export.gz
…
2017-07-02 22:00:03   75966745 20170702T21.export.gz
2017-07-02 23:00:02   89199874 20170702T22.export.gz
2017-07-02 00:59:59   71161715 20170702T23.export.gz

Organizing the data into multiple, evenly sized files enables the COPY command to ingest this data using all available resources in the Amazon Redshift cluster. Further, the files are compressed (gzipped) to further reduce COPY times.

Step 2: Stage data to the Amazon Redshift table for cleansing

Ingesting the data can be accomplished using a JSON-based manifest file. Using the manifest file ensures that S3 eventual consistency issues can be eliminated and also provides an opportunity to dedupe any files if needed. A sample manifest20170702.json file looks like the following:

{
  "entries": [
    {"url":" s3://<<S3 Bucket>>/batch/2017/07/02/20170702T01.export.gz", "mandatory":true},
    {"url":" s3://<<S3 Bucket>>/batch/2017/07/02/20170702T02.export.gz", "mandatory":true},
    …
    {"url":" s3://<<S3 Bucket>>/batch/2017/07/02/20170702T23.export.gz", "mandatory":true}
  ]
}

The data can be ingested using the following command:

SET wlm_query_slot_count TO <<max available concurrency in the ETL queue>>;
COPY stage_tbl FROM 's3:// <<S3 Bucket>>/batch/manifest20170702.json' iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole' manifest;

Because the downstream ETL processes depend on this COPY command to complete, the wlm_query_slot_count is used to claim all the memory available to the queue. This helps the COPY command complete as quickly as possible.

Step 3: Transform data to create daily, weekly, and monthly datasets and load into target tables

Data is staged in the “stage_tbl” from where it can be transformed into the daily, weekly, and monthly aggregates and loaded into target tables. The following job illustrates a typical weekly process:

Begin
INSERT into ETL_LOG (..) values (..);
DELETE from weekly_tbl where dataset_week = <<current week>>;
INSERT into weekly_tbl (..)
  SELECT date_trunc('week', dataset_day) AS week_begin_dataset_date, SUM(C1) AS C1, SUM(C2) AS C2
	FROM   stage_tbl
GROUP BY date_trunc('week', dataset_day);
INSERT into AUDIT_LOG values (..);
COMMIT;
End;

As shown above, multiple steps are combined into one transaction to perform a single commit, reducing contention on the commit queue.

Step 4: Unload the daily dataset to populate the S3 data lake bucket

The transformed results are now unloaded into another S3 bucket, where they can be further processed and made available for end-user reporting using a number of different tools, including Redshift Spectrum and Amazon Athena.

unload ('SELECT * FROM weekly_tbl WHERE dataset_week = <<current week>>’) TO 's3:// <<S3 Bucket>>/datalake/weekly/20170526/' iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole';

Summary

Amazon Redshift lets you easily operate petabyte-scale data warehouses on the cloud. This post summarized the best practices for operating scalable ETL natively within Amazon Redshift. I demonstrated efficient ways to ingest and transform data, along with close monitoring. I also demonstrated the best practices being used in a typical sample ETL workload to transform the data into Amazon Redshift.

If you have questions or suggestions, please comment below.

 


About the Author

Thiyagarajan Arumugam is a Big Data Solutions Architect at Amazon Web Services and designs customer architectures to process data at scale. Prior to AWS, he built data warehouse solutions at Amazon.com. In his free time, he enjoys all outdoor sports and practices the Indian classical drum mridangam.

 

A New Guide to Banking Regulations and Guidelines in India

Post Syndicated from Oliver Bell original https://aws.amazon.com/blogs/security/a-new-guide-to-banking-regulations-and-guidelines-in-india/

Indian flag

The AWS User Guide to Banking Regulations and Guidelines in India was published in December 2017 and includes information that can help banks regulated by the Reserve Bank of India (RBI) assess how to implement an appropriate information security, risk management, and governance program in the AWS Cloud.

The guide focuses on the following key considerations:

  • Outsourcing guidelines – Guidance for banks entering an outsourcing arrangement, including risk-management practices such as conducting due diligence and maintaining effective oversight. Learn how to conduct an assessment of AWS services and align your governance requirements with the AWS Shared Responsibility Model.
  • Information security – Detailed requirements to help banks identify and manage information security in the cloud.

This guide joins the existing Financial Services guides for other jurisdictions, such as Singapore, Australia, and Hong Kong. AWS will publish additional guides in 2018 to help you understand regulatory requirements in other markets around the world.

– Oliver