Tag Archives: Star

Rightscorp Bleeds Another Million, Borrows $200K From Customer BMG

Post Syndicated from Andy original https://torrentfreak.com/rightscorp-bleeds-another-million-borrows-200k-from-customer-bmg-170819/

Anti-piracy outfit Rightscorp is one of the many companies trying to turn Internet piracy into profit. The company has a somewhat novel approach but has difficulty balancing the books.

Essentially, Rightscorp operates like other so-called copyright-trolling operations, in that it monitors alleged offenders on BitTorrent networks, tracks them to their ISPs, then attempts to extract a cash settlement. Rightscorp does this by sending DMCA notices with settlement agreements attached, in the hope that at-this-point-anonymous Internet users break cover in panic. This can lead to a $20 or $30 ‘fine’ or in some cases dozens of multiples of that.

But despite settling hundreds of thousands of these cases, profit has thus far proven elusive, with the company hemorrhaging millions in losses. The company has just filed its results for the first half of 2017 and they contain more bad news.

In the six months ended June 2017, revenues obtained from copyright settlements reached just $138,514, that’s 35% down on the $214,326 generated in the same period last year. However, the company did manage to book $148,332 in “consulting revenue” in the first half of this year, a business area that generated no revenue in 2016.

Overall then, total revenue for the six month period was $286,846 – up from $214,326 last year. While that’s a better picture in its own right, Rightscorp has a lot of costs attached to its business.

After paying out $69,257 to copyright holders and absorbing $1,190,696 in general and administrative costs, among other things, the company’s total operating expenses topped out at $1,296,127 for the first six months of the year.

To make a long story short, the company made a net loss of $1,068,422, which was more than the $995,265 loss it made last year and despite improved revenues. The company ended June with just $1,725 in cash.

“These factors raise substantial doubt about the Company’s ability to continue as a going concern within one year after the date that the financial statements are issued,” the company’s latest statement reads.

This hanging-by-a-thread narrative has followed Rightscorp for the past few years but there’s information in the latest accounts which indicates how bad things were at the start of the year.

In January 2016, Rightscorp and several copyright holders, including Hollywood studio Warner Bros, agreed to settle a class-action lawsuit over intimidating robo-calls that were made to alleged infringers. The defendants agreed to set aside $450,000 to cover the costs, and it appears that Rightscorp was liable for at least $200,000 of that.

Rightscorp hasn’t exactly been flush with cash, so it was interesting to read that its main consumer piracy settlement client, music publisher BMG, actually stepped in to pay off the class-action settlement.

“At December 31, 2016, the Company had accrued $200,000 related to the settlement of a class action complaint. On January 7, 2017, BMG Rights Management (US) LLC (“BMG”) advanced the Company $200,000, which was used to pay off the settlement. The advance from BMG is to be applied to future billings from the Company to BMG for consulting services,” Rightscorp’s filing reads.

With Rightscorp’s future BMG revenue now being gobbled up by what appears to be loan repayments, it becomes difficult to see how the anti-piracy outfit can make enough money to pay off the $200,000 debt. However, its filing notes that on July 21, 2017, the company issued “an aggregate of 10,000,000 shares of common stock to an investor for a purchase price of $200,000.” While that amount matches the BMG debt, the filing doesn’t reveal who the investor is.

The filing also reveals that on July 31, Rightscorp entered into two agreements to provide services “to a holder of multiple copyrights.” The copyright holder isn’t named, but the deal reveals that it’s in Rightscorp’s best interests to get immediate payment from people to whom it sends cash settlement demands.

“[Rightscorp] will receive 50% of all gross proceeds of any settlement revenue received by the Client from pre-lawsuit ‘advisory notices,’ and 37.5% of all gross proceeds received by the Client from ‘final warning’ notices sent immediately prior to a lawsuit,” the filing notes.

Also of interest is that Rightscorp has offered not to work with any of the copyright holders’ direct competitors, providing certain thresholds are met – $10,000 revenue in the first month to $100,000 after 12 months. But there’s more to the deal.

Rightscorp will also provide a number of services to this client including detecting and verifying copyright works on P2P networks, providing information about infringers, plus reporting, litigation support, and copyright protection advisory services.

For this, Rightscorp will earn $10,000 for the first three months, rising to $85,000 per month after 16 months, valuable revenue for a company fighting for its life.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Friday Squid Blogging: Brittle Star Catches a Squid

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/08/friday_squid_bl_589.html

Watch a brittle star catch a squid, and then lose it to another brittle star.

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

Read my blog posting guidelines here.

Announcement: IPS code

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/08/announcement-ips-code.html

So after 20 years, IBM is killing off my BlackICE code created in April 1998. So it’s time that I rewrite it.

BlackICE was the first “inline” intrusion-detection system, aka. an “intrusion prevention system” or IPS. ISS purchased my company in 2001 and replaced their RealSecure engine with it, and later renamed it Proventia. Then IBM purchased ISS in 2006. Now, they are formally canceling the project and moving customers onto Cisco’s products, which are based on Snort.

So now is a good time to write a replacement. The reason is that BlackICE worked fundamentally differently than Snort, using protocol analysis rather than pattern-matching. In this way, it worked more like Bro than Snort. The biggest benefit of protocol-analysis is speed, making it many times faster than Snort. The second benefit is better detection ability, as I describe in this post on Heartbleed.

So my plan is to create a new project. I’ll be checking in the starter bits into GitHub starting a couple weeks from now. I need to figure out a new name for the project, so I don’t have to rip off a name from William Gibson like I did last time :).

Some notes:

  • Yes, it’ll be GNU open source. I’m a capitalist, so I’ll earn money like snort/nmap dual-licensing it, charging companies who don’t want to open-source their addons. All capitalists GNU license their code.
  • C, not Rust. Sorry, I’m going for extreme scalability. We’ll re-visit this decision later when looking at building protocol parsers.
  • It’ll be 95% compatible with Snort signatures. Their language definition leaves so much ambiguous it’ll be hard to be 100% compatible.
  • It’ll support Snort output as well, though really, Snort’s events suck.
  • Protocol parsers in Lua, so you can use it as a replacement for Bro, writing parsers to extract data you are interested in.
  • Protocol state machine parsers in C, like you see in my Masscan project for X.509.
  • First version IDS only. These days, “inline” means also being able to MitM the SSL stack, so I’m gong to have to think harder on that.
  • Mutli-core worker threads off PF_RING/DPDK/netmap receive queues. Should handle 10gbps, tracking 10 million concurrent connections, with quad-core CPU.
So if you want to contribute to the project, here’s what I need:
  • Requirements from people who work daily with IDS/IPS today. I need you to write up what your products do well that you really like. I need to you write up what they suck at that needs to be fixed. These need to be in some detail.
  • Testing environment to play with. This means having a small server plugged into a real-world link running at a minimum of several gigabits-per-second available for the next year. I’ll sign NDAs related to the data I might see on the network.
  • Coders. I’ll be doing the basic architecture, but protocol parsers, output plugins, etc. will need work. Code will be in C and Lua for the near term. Unfortunately, since I’m going to dual-license, I’ll need waivers before accepting pull requests.
Anyway, follow me on Twitter @erratarob if you want to contribute.

Michael Reeves and the ridiculous Subscriber Robot

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/michael-reeves-subscriber-robot/

At the beginning of his new build’s video, YouTuber Michael Reeves discusses a revelation he had about why some people don’t subscribe to his channel:

The real reason some people don’t subscribe is that when you hit this button, that’s all, that’s it, it’s done. It’s not special, it’s not enjoyable. So how do we make subscribing a fun, enjoyable process? Well, we do it by slowly chipping away at the content creator’s psyche every time someone subscribes.

His fix? The ‘fun’ interactive Subscriber Robot that is the subject of the video.

Be aware that Michael uses a couple of mild swears in this video, so maybe don’t watch it with a child.

The Subscriber Robot

Just showing that subscriber dedication My Patreon Page: https://www.patreon.com/michaelreeves Personal Site: https://michaelreeves.us/ Twitter: https://twitter.com/michaelreeves08 Song: Summer Salt – Sweet To Me

Who is Michael Reeves?

Software developer and student Michael Reeves started his YouTube account a mere four months ago, with the premiere of his robot that shines lasers into your eyes – now he has 110k+ subscribers. At only 19, Michael co-owns and manages a company together with friends, and is set on his career path in software and computing. So when he is not making videos, he works a nine-to-five job “to pay for college and, y’know, live”.

The Subscriber Robot

Michael shot to YouTube fame with the aforementioned laser robot built around an Arduino. But by now he has also be released videos for a few Raspberry Pi-based contraptions.

Michael Reeves Raspberry Pi Subscriber Robot

Michael, talking us through the details of one of the worst ideas ever made

His Subscriber Robot uses a series of Python scripts running on a Raspberry Pi to check for new subscribers to Michael’s channel via the YouTube API. When it identifies one, the Pi uses a relay to make the ceiling lights in Michael’s office flash ten times a second while ear-splitting noise is emitted by a 102-decibel-rated buzzer. Needless to say, this buzzer is not recommended for home use, work use, or any use whatsoever! Moreover, the Raspberry Pi also connects to a speaker that announces the name of the new subscriber, so Michael knows who to thank.

Michael Reeves Raspberry Pi Subscriber Robot

Subscriber Robot: EEH! EEH! EEH! MoistPretzels has subscribed.
Michael: Thank you, MoistPretzels…

Given that Michael has gained a whopping 30,000 followers in the ten days since the release of this video, it’s fair to assume he is currently curled up in a ball on the office floor, quietly crying to himself.

If you think Michael only makes videos about ridiculous builds, you’re mistaken. He also uses YouTube to provide educational content, because he believes that “it’s super important for people to teach themselves how to program”. For example, he has just released a new C# beginners tutorial, the third in the series.

Support Michael

If you’d like to help Michael in his mission to fill the world with both tutorials and ridiculous robot builds, make sure to subscribe to his channel. You can also follow him on Twitter and support him on Patreon.

You may also want to check out the Useless Duck Company and Simone Giertz if you’re in the mood for more impractical, yet highly amusing, robot builds.

Good luck with your channel, Michael! We are looking forward to, and slightly dreading, more videos from one of our favourite new YouTubers.

The post Michael Reeves and the ridiculous Subscriber Robot appeared first on Raspberry Pi.

Cloudflare Kicking ‘Daily Stormer’ is Bad News For Pirate Sites

Post Syndicated from Ernesto original https://torrentfreak.com/cloudflare-kicking-daily-stormer-is-bad-news-for-pirate-sites-170817/

“I woke up this morning in a bad mood and decided to kick them off the Internet.”

Those are the words of Cloudflare CEO Matthew Prince, who decided to terminate the account of controversial Neo-Nazi site Daily Stormer.

Bam. Gone. At least for a while.

Although many people are happy to see the site go offline, the decision is not without consequence. It goes directly against what many saw as the core values of the company.

For years on end, Cloudflare has been asked to remove terrorist propaganda, pirate sites, and other possibly unacceptable content. Each time, Cloudflare replied that it doesn’t take action without a court order. No exceptions.

“Even if it were able to, Cloudfare does not monitor, evaluate, judge or store content appearing on a third party website,” the company wrote just a few weeks ago, in its whitepaper on intermediary liability.

“We’re the plumbers of the internet. We make the pipes work but it’s not right for us to inspect what is or isn’t going through the pipes,” Cloudflare CEO Matthew Prince himself said not too long ago.

“If companies like ours or ISPs start censoring there would be an uproar. It would lead us down a path of internet censors and controls akin to a country like China,” he added.

The same arguments were repeated in different contexts, over and over.

This strong position was also one of the reasons why Cloudflare was dragged into various copyright infringement court cases. In these cases, the company repeatedly stressed that removing a site from Cloudflare’s service would not make infringing content disappear.

Pirate sites would just require a simple DNS reconfiguration to continue their operation, after all.

“[T]here are no measures of any kind that CloudFlare could take to prevent this alleged infringement, because the termination of CloudFlare’s CDN services would have no impact on the existence and ability of these allegedly infringing websites to continue to operate,” it said.

That comment looks rather misplaced now that the CEO of the same company has decided to “kick” a website “off the Internet” after an emotional, but deliberate, decision.

Taking a page from Cloudflare’s (old) playbook we’re not going to make any judgments here. Just search Twitter or any social media site and you’ll see plenty of opinions, both for and against the company’s actions.

We do have a prediction though. During the months and years to come, Cloudflare is likely to be dragged into many more copyright lawsuits, and when they are, their counterparts are going to bring up Cloudflare’s voluntary decision to kick a website off the Internet.

Unless Cloudflare suddenly decides to pull all pirate sites from its service tomorrow, of course.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Raspbian Stretch has arrived for Raspberry Pi

Post Syndicated from Simon Long original https://www.raspberrypi.org/blog/raspbian-stretch/

It’s now just under two years since we released the Jessie version of Raspbian. Those of you who know that Debian run their releases on a two-year cycle will therefore have been wondering when we might be releasing the next version, codenamed Stretch. Well, wonder no longer – Raspbian Stretch is available for download today!

Disney Pixar Toy Story Raspbian Stretch Raspberry Pi

Debian releases are named after characters from Disney Pixar’s Toy Story trilogy. In case, like me, you were wondering: Stretch is a purple octopus from Toy Story 3. Hi, Stretch!

The differences between Jessie and Stretch are mostly under-the-hood optimisations, and you really shouldn’t notice any differences in day-to-day use of the desktop and applications. (If you’re really interested, the technical details are in the Debian release notes here.)

However, we’ve made a few small changes to our image that are worth mentioning.

New versions of applications

Version 3.0.1 of Sonic Pi is included – this includes a lot of new functionality in terms of input/output. See the Sonic Pi release notes for more details of exactly what has changed.

Raspbian Stretch Raspberry Pi

The Chromium web browser has been updated to version 60, the most recent stable release. This offers improved memory usage and more efficient code, so you may notice it running slightly faster than before. The visual appearance has also been changed very slightly.

Raspbian Stretch Raspberry Pi

Bluetooth audio

In Jessie, we used PulseAudio to provide support for audio over Bluetooth, but integrating this with the ALSA architecture used for other audio sources was clumsy. For Stretch, we are using the bluez-alsa package to make Bluetooth audio work with ALSA itself. PulseAudio is therefore no longer installed by default, and the volume plugin on the taskbar will no longer start and stop PulseAudio. From a user point of view, everything should still work exactly as before – the only change is that if you still wish to use PulseAudio for some other reason, you will need to install it yourself.

Better handling of other usernames

The default user account in Raspbian has always been called ‘pi’, and a lot of the desktop applications assume that this is the current user. This has been changed for Stretch, so now applications like Raspberry Pi Configuration no longer assume this to be the case. This means, for example, that the option to automatically log in as the ‘pi’ user will now automatically log in with the name of the current user instead.

One other change is how sudo is handled. By default, the ‘pi’ user is set up with passwordless sudo access. We are no longer assuming this to be the case, so now desktop applications which require sudo access will prompt for the password rather than simply failing to work if a user without passwordless sudo uses them.

Scratch 2 SenseHAT extension

In the last Jessie release, we added the offline version of Scratch 2. While Scratch 2 itself hasn’t changed for this release, we have added a new extension to allow the SenseHAT to be used with Scratch 2. Look under ‘More Blocks’ and choose ‘Add an Extension’ to load the extension.

This works with either a physical SenseHAT or with the SenseHAT emulator. If a SenseHAT is connected, the extension will control that in preference to the emulator.

Raspbian Stretch Raspberry Pi

Fix for Broadpwn exploit

A couple of months ago, a vulnerability was discovered in the firmware of the BCM43xx wireless chipset which is used on Pi 3 and Pi Zero W; this potentially allows an attacker to take over the chip and execute code on it. The Stretch release includes a patch that addresses this vulnerability.

There is also the usual set of minor bug fixes and UI improvements – I’ll leave you to spot those!

How to get Raspbian Stretch

As this is a major version upgrade, we recommend using a clean image; these are available from the Downloads page on our site as usual.

Upgrading an existing Jessie image is possible, but is not guaranteed to work in every circumstance. If you wish to try upgrading a Jessie image to Stretch, we strongly recommend taking a backup first – we can accept no responsibility for loss of data from a failed update.

To upgrade, first modify the files /etc/apt/sources.list and /etc/apt/sources.list.d/raspi.list. In both files, change every occurrence of the word ‘jessie’ to ‘stretch’. (Both files will require sudo to edit.)

Then open a terminal window and execute

sudo apt-get update
sudo apt-get -y dist-upgrade

Answer ‘yes’ to any prompts. There may also be a point at which the install pauses while a page of information is shown on the screen – hold the ‘space’ key to scroll through all of this and then hit ‘q’ to continue.

Finally, if you are not using PulseAudio for anything other than Bluetooth audio, remove it from the image by entering

sudo apt-get -y purge pulseaudio*

The post Raspbian Stretch has arrived for Raspberry Pi appeared first on Raspberry Pi.

New – VPC Endpoints for DynamoDB

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/new-vpc-endpoints-for-dynamodb/

Starting today Amazon Virtual Private Cloud (VPC) Endpoints for Amazon DynamoDB are available in all public AWS regions. You can provision an endpoint right away using the AWS Management Console or the AWS Command Line Interface (CLI). There are no additional costs for a VPC Endpoint for DynamoDB.

Many AWS customers run their applications within a Amazon Virtual Private Cloud (VPC) for security or isolation reasons. Previously, if you wanted your EC2 instances in your VPC to be able to access DynamoDB, you had two options. You could use an Internet Gateway (with a NAT Gateway or assigning your instances public IPs) or you could route all of your traffic to your local infrastructure via VPN or AWS Direct Connect and then back to DynamoDB. Both of these solutions had security and throughput implications and it could be difficult to configure NACLs or security groups to restrict access to just DynamoDB. Here is a picture of the old infrastructure.

Creating an Endpoint

Let’s create a VPC Endpoint for DynamoDB. We can make sure our region supports the endpoint with the DescribeVpcEndpointServices API call.


aws ec2 describe-vpc-endpoint-services --region us-east-1
{
    "ServiceNames": [
        "com.amazonaws.us-east-1.dynamodb",
        "com.amazonaws.us-east-1.s3"
    ]
}

Great, so I know my region supports these endpoints and I know what my regional endpoint is. I can grab one of my VPCs and provision an endpoint with a quick call to the CLI or through the console. Let me show you how to use the console.

First I’ll navigate to the VPC console and select “Endpoints” in the sidebar. From there I’ll click “Create Endpoint” which brings me to this handy console.

You’ll notice the AWS Identity and Access Management (IAM) policy section for the endpoint. This supports all of the fine grained access control that DynamoDB supports in regular IAM policies and you can restrict access based on IAM policy conditions.

For now I’ll give full access to my instances within this VPC and click “Next Step”.

This brings me to a list of route tables in my VPC and asks me which of these route tables I want to assign my endpoint to. I’ll select one of them and click “Create Endpoint”.

Keep in mind the note of warning in the console: if you have source restrictions to DynamoDB based on public IP addresses the source IP of your instances accessing DynamoDB will now be their private IP addresses.

After adding the VPC Endpoint for DynamoDB to our VPC our infrastructure looks like this.

That’s it folks! It’s that easy. It’s provided at no cost. Go ahead and start using it today. If you need more details you can read the docs here.

[$] Reducing Python’s startup time

Post Syndicated from jake original https://lwn.net/Articles/730915/rss

The startup time for the Python interpreter has been discussed by the core
developers and others numerous times over the years; optimization efforts
are made periodically as well.
Startup time can dominate the execution time of command-line programs
written in Python,
especially if they import a lot of other modules. Python startup time is
worse than some other scripting languages and more recent versions of the
language are taking more than twice as long to start up when compared to
earlier versions (e.g. 3.7 versus 2.7).
The most recent iteration of the startup time
discussion has played out in the python-dev and python-ideas mailing lists
since mid-July. This time, the focus has been on the collections.namedtuple()
data structure that is used in multiple places throughout the standard
library and in other Python modules, but the discussion has been more
wide-ranging than simply that.

What’s the Diff: Programs, Processes, and Threads

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/whats-the-diff-programs-processes-and-threads/

let's talk about Threads

How often have you heard the term threading in relation to a computer program, but you weren’t exactly sure what it meant? How about processes? You likely understand that a thread is somehow closely related to a program and a process, but if you’re not a computer science major, maybe that’s as far as your understanding goes.

Knowing what these terms mean is absolutely essential if you are a programmer, but an understanding of them also can be useful to the average computer user. Being able to look at and understand the Activity Monitor on the Macintosh, the Task Manager on Windows, or Top on Linux can help you troubleshoot which programs are causing problems on your computer, or whether you might need to install more memory to make your system run better.

Let’s take a few minutes to delve into the world of computer programs and sort out what these terms mean. We’ll simplify and generalize some of the ideas, but the general concepts we cover should help clarify the difference between the terms.

Programs

First of all, you probably are aware that a program is the code that is stored on your computer that is intended to fulfill a certain task. There are many types of programs, including programs that help your computer function and are part of the operating system, and other programs that fulfill a particular job. These task-specific programs are also known as “applications,” and can include programs such as word processing, web browsing, or emailing a message to another computer.

Program

Programs are typically stored on disk or in non-volatile memory in a form that can be executed by your computer. Prior to that, they are created using a programming language such as C, Lisp, Pascal, or many others using instructions that involve logic, data and device manipulation, recurrence, and user interaction. The end result is a text file of code that is compiled into binary form (1’s and 0’s) in order to run on the computer. Another type of program is called “interpreted,” and instead of being compiled in advance in order to run, is interpreted into executable code at the time it is run. Some common, typically interpreted programming languages, are Python, PHP, JavaScript, and Ruby.

The end result is the same, however, in that when a program is run, it is loaded into memory in binary form. The computer’s CPU (Central Processing Unit) understands only binary instructions, so that’s the form the program needs to be in when it runs.

Perhaps you’ve heard the programmer’s joke, “There are only 10 types of people in the world, those who understand binary, and those who don’t.”

Binary is the native language of computers because an electrical circuit at its basic level has two states, on or off, represented by a one or a zero. In the common numbering system we use every day, base 10, each digit position can be anything from 0 to 9. In base 2 (or binary), each position is either a 0 or a 1. (In a future blog post we might cover quantum computing, which goes beyond the concept of just 1’s and 0’s in computing.)

Decimal—Base 10 Binary—Base 2
0 0000
1 0001
2 0010
3 0011
4 0100
5 0101
6 0110
7 0111
8 1000
9 1001

How Processes Work

The program has been loaded into the computer’s memory in binary form. Now what?

An executing program needs more than just the binary code that tells the computer what to do. The program needs memory and various operating system resources that it needs in order to run. A “process” is what we call a program that has been loaded into memory along with all the resources it needs to operate. The “operating system” is the brains behind allocating all these resources, and comes in different flavors such as macOS, iOS, Microsoft Windows, Linux, and Android. The OS handles the task of managing the resources needed to turn your program into a running process.

Some essential resources every process needs are registers, a program counter, and a stack. The “registers” are data holding places that are part of the computer processor (CPU). A register may hold an instruction, a storage address, or other kind of data needed by the process. The “program counter,” also called the “instruction pointer,” keeps track of where a computer is in its program sequence. The “stack” is a data structure that stores information about the active subroutines of a computer program and is used as scratch space for the process. It is distinguished from dynamically allocated memory for the process that is known as “the heap.”

diagram of how processes work

There can be multiple instances of a single program, and each instance of that running program is a process. Each process has a separate memory address space, which means that a process runs independently and is isolated from other processes. It cannot directly access shared data in other processes. Switching from one process to another requires some time (relatively) for saving and loading registers, memory maps, and other resources.

This independence of processes is valuable because the operating system tries its best to isolate processes so that a problem with one process doesn’t corrupt or cause havoc with another process. You’ve undoubtedly run into the situation in which one application on your computer freezes or has a problem and you’ve been able to quit that program without affecting others.

How Threads Work

So, are you still with us? We finally made it to threads!

A thread is the unit of execution within a process. A process can have anywhere from just one thread to many threads.

Process vs. Thread

diagram of threads in a process over time

When a process starts, it is assigned memory and resources. Each thread in the process shares that memory and resources. In single-threaded processes, the process contains one thread. The process and the thread are one and the same, and there is only one thing happening.

In multithreaded processes, the process contains more than one thread, and the process is accomplishing a number of things at the same time (technically, it’s almost at the same time—read more on that in the “What about Parallelism and Concurrency?” section below).

diagram of single and multi-treaded process

We talked about the two types of memory available to a process or a thread, the stack and the heap. It is important to distinguish between these two types of process memory because each thread will have its own stack, but all the threads in a process will share the heap.

Threads are sometimes called lightweight processes because they have their own stack but can access shared data. Because threads share the same address space as the process and other threads within the process, the operational cost of communication between the threads is low, which is an advantage. The disadvantage is that a problem with one thread in a process will certainly affect other threads and the viability of the process itself.

Threads vs. Processes

So to review:

  1. The program starts out as a text file of programming code,
  2. The program is compiled or interpreted into binary form,
  3. The program is loaded into memory,
  4. The program becomes one or more running processes.
  5. Processes are typically independent of each other,
  6. While threads exist as the subset of a process.
  7. Threads can communicate with each other more easily than processes can,
  8. But threads are more vulnerable to problems caused by other threads in the same process.

Processes vs. Threads — Advantages and Disadvantages

Process Thread
Processes are heavyweight operations Threads are lighter weight operations
Each process has its own memory space Threads use the memory of the process they belong to
Inter-process communication is slow as processes have different memory addresses Inter-thread communication can be faster than inter-process communication because threads of the same process share memory with the process they belong to
Context switching between processes is more expensive Context switching between threads of the same process is less expensive
Processes don’t share memory with other processes Threads share memory with other threads of the same process

What about Concurrency and Parallelism?

A question you might ask is whether processes or threads can run at the same time. The answer is: it depends. On a system with multiple processors or CPU cores (as is common with modern processors), multiple processes or threads can be executed in parallel. On a single processor, though, it is not possible to have processes or threads truly executing at the same time. In this case, the CPU is shared among running processes or threads using a process scheduling algorithm that divides the CPU’s time and yields the illusion of parallel execution. The time given to each task is called a “time slice.” The switching back and forth between tasks happens so fast it is usually not perceptible. The terms parallelism (true operation at the same time) and concurrency (simulated operation at the same time), distinguish between the two type of real or approximate simultaneous operation.

diagram of concurrency and parallelism

Why Choose Process over Thread, or Thread over Process?

So, how would a programmer choose between a process and a thread when creating a program in which she wants to execute multiple tasks at the same time? We’ve covered some of the differences above, but let’s look at a real world example with a program that many of us use, Google Chrome.

When Google was designing the Chrome browser, they needed to decide how to handle the many different tasks that needed computer, communications, and network resources at the same time. Each browser window or tab communicates with multiple servers on the internet to retrieve text, programs, graphics, audio, video, and other resources, and renders that data for display and interaction with the user. In addition, the browser can open many windows, each with many tasks.

Google had to decide how to handle that separation of tasks. They chose to run each browser window in Chrome as a separate process rather than a thread or many threads, as is common with other browsers. Doing that brought Google a number of benefits. Running each window as a process protects the overall application from bugs and glitches in the rendering engine and restricts access from each rendering engine process to others and to the rest of the system. Isolating JavaScript programs in a process prevents them from running away with too much CPU time and memory, and making the entire browser non-responsive.

Google made the calculated trade-off with a multi-processing design as starting a new process for each browser window has a higher fixed cost in memory and resources than using threads. They were betting that their approach would end up with less memory bloat overall.

Using processes instead of threads provides better memory usage when memory gets low. An inactive window is treated as a lower priority by the operating system and becomes eligible to be swapped to disk when memory is needed for other processes, helping to keep the user-visible windows more responsive. If the windows were threaded, it would be more difficult to separate the used and unused memory as cleanly, wasting both memory and performance.

You can read more about Google’s design decisions on Google’s Chromium Blog or on the Chrome Introduction Comic.

The screen capture below shows the Google Chrome processes running on a MacBook Air with many tabs open. Some Chrome processes are using a fair amount of CPU time and resources, and some are using very little. You can see that each process also has many threads running as well.

activity monitor of Google Chrome

The Activity Monitor or Task Manager on your system can be a valuable ally in helping fine-tune your computer or troubleshooting problems. If your computer is running slowly, or a program or browser window isn’t responding for a while, you can check its status using the system monitor. Sometimes you’ll see a process marked as “Not Responding.” Try quitting that process and see if your system runs better. If an application is a memory hog, you might consider choosing a different application that will accomplish the same task.

Windows Task Manager view

Made it This Far?

We hope this Tron-like dive into the fascinating world of computer programs, processes, and threads has helped clear up some questions you might have had.

The next time your computer is running slowly or an application is acting up, you know your assignment. Fire up the system monitor and take a look under the hood to see what’s going on. You’re in charge now.

We love to hear from you

Are you still confused? Have questions? If so, please let us know in the comments. And feel free to suggest topics for future blog posts.

The post What’s the Diff: Programs, Processes, and Threads appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Game of Thrones Episode “S07E06” Leaks Online Early

Post Syndicated from Ernesto original https://torrentfreak.com/game-of-thrones-episode-s07e06-leaks-online-early-170816/

Trouble continues for HBO as another episode of the popular Game of Thrones series has just leaked online, days ahead of the official premiere.

Copies of the sixth episode of the current season, titled ‘Death is the Enemy,’ are currently circulating on various streaming portals, direct download, and torrent sites.

The first copy only just appeared on the Pirate Bay, but others were shared elsewhere earlier. One of the leaked videos is 64 minutes long and of high quality, and there are also versions that consist of two separate parts.

Early on, the two parts were circulating on the video streaming site Dailymotion, but these were swiftly removed.

At the moment it’s still unclear how the leak came about but some suggest that it was leaked by HBO itself in Spain. TorrentFreak has not been able to confirm this, but there are no visible watermarks that point elsewhere.

Game of Thrones “S07E06” leak screenshot

This isn’t the first time that a Game of Thrones episode has leaked online early. Two years ago the same happened with the first four episodes of season 5. Nonetheless, that season still broke previous viewership records.

Two weeks ago the fourth episode of the current season was also pirated before its official release. This leak, which carried a prominent “Star India Pvt Ltd” watermark, triggered a lot of interest from impatient Game of Thrones fans as well.

Earlier this week, news broke that four men had been arrested in connection with the breach, which is still being investigated. The arrested men all worked for the local media processing company Prime Focus Technologies, where the leak reportedly originated.

The current leak is not in any way related to the hack on HBO’s system, which occurred earlier and revealed several preliminary Game of Thrones scripts.

This hack has also resulted in leaks of various high profile shows, including the upcoming ninth season of ‘Curb Your Enthusiasm.’ Initially, these were hard to find online, but they are now widely available on the usual pirate sites.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Spinrilla Refuses to Share Its Source Code With the RIAA

Post Syndicated from Ernesto original https://torrentfreak.com/spinrilla-refuses-to-share-its-source-code-with-the-riaa-170815/

Earlier this year, a group of well-known labels targeted Spinrilla, a popular hip-hop mixtape site and accompanying app with millions of users.

The coalition of record labels including Sony Music, Warner Bros. Records, and Universal Music Group, filed a lawsuit accusing the service of alleged copyright infringements.

Both sides have started the discovery process and recently asked the court to rule on several unresolved matters. The parties begin with their statements of facts, clearly from opposite angles.

The RIAA remains confident that the mixtape site is ripping off music creators and wants its operators to be held accountable.

“Since Spinrilla launched, Defendants have facilitated millions of unauthorized downloads and streams of thousands of Plaintiffs’ sound recordings without Plaintiffs’ permission,” RIAA writes, complaining about “rampant” infringement on the site.

However, Spinrilla itself believes that the claims are overblown. The company points out that the RIAA’s complaint only lists a tiny fraction of all the songs uploaded by its users. These somehow slipped through its Audible Magic anti-piracy filter.

Where the RIAA paints a picture of rampant copyright infringement, the mixtape site stresses that the record labels are complaining about less than 0.001% of all the tracks they ever published.

“From 2013 to the present, Spinrilla users have uploaded about 1 million songs to Spinrilla’s servers and Spinrilla published about 850,000 of those. Plaintiffs are complaining that 210 of those songs are owned by them and published on Spinrilla without permission,” Spinrilla’s lawyers write.

“That means that Plaintiffs make no claim to 99.9998% of the songs on Spinrilla. Plaintiffs’ shouting of ‘rampant infringement on Spinrilla’, an accusation that Spinrilla was designed to allow easy and open access to infringing material, and assertion that ‘Defendants have facilitated millions of unauthorized downloads’ of those 210 songs is untrue – it is nothing more than a wish and a dream.”

The company reiterates that it’s a platform for independent musicians and that it doesn’t want to feature the Eminem’s and Bieber’s of this world, especially not without permission.

As for the discovery process, there are still several outstanding issues they need the Court’s advice on. Spinrilla has thus far produced 12,000 pages of documents and answered all RIAA interrogatories, but refuses to hand over certain information, including its source code.

According to Spinrilla, there is no reason for the RIAA to have access to its “crown jewel.”

“The source code is the crown jewel of any software based business, including Spinrilla. Even worse, Plaintiffs want an ‘executable’ version of Spinrilla’s source code, which would literally enable them to replicate Spinrilla’s entire website. Any Plaintiff could, in hours, delete all references to ‘Spinrilla,’ add its own brand and launch Spinrilla’s exact website.

“If we sued YouTube for hosting 210 infringing videos, would I be entitled to the source code for YouTube? There is simply no justification for Spinrilla sharing its source code with Plaintiffs,” Spinrilla adds.

The RIAA, on the other hand, argues that the source code will provide insight into several critical issues, including Spinrilla’s knowledge about infringing activity and its ability to terminate repeat copyright infringers.

In addition to the source code, the RIAA has also requested detailed information about the site’s users, including their download and streaming history. This request is too broad, the mixtape site argues, and has offered to provide information on the uploaders of the 210 infringing tracks instead.

It’s clear that the RIAA and Spinrilla disagree on various fronts and it will be up to the court to decide what information must be handed over. So far, however, the language used clearly shows that both parties are far from reaching some kind of compromise.

The first joint discovery statement is available in full here (pdf).

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Roku Gets Tough on Pirate Channels, Warns Users

Post Syndicated from Ernesto original https://torrentfreak.com/roku-gets-tough-on-pirate-channels-warns-users-170815/

In recent years it has become much easier to stream movies and TV-shows over the Internet.

Legal services such as Netflix and HBO are flourishing, but there’s also a darker side to this streaming epidemic. Millions of people are streaming from unauthorized sources, often paired with perfectly legal streaming platforms and devices.

Hollywood insiders have dubbed this trend “Piracy 3.0” are actively working with stakeholders to address the threat. One of the companies rightsholders are working with is Roku, known for its easy-to-use media players.

Earlier this year Roku was harshly confronted with this new piracy crackdown when a Mexican court ordered local retailers to take its media player off the shelves. While this legal battle isn’t over yet, it was clear to Roku that misuse of its platform wasn’t without consequences.

While Roku never permitted any infringing content, it appears that the company has recently made some adjustments to better deal with the problem, or at least clarify its stance.

Pirate content generally doesn’t show up in the official Roku Channel Store but is directly loaded onto the device through third-party “private” channels. A few weeks ago, Roku renamed these “private” channels to “non-certified” channels, while making it very clear that copyright infringement is not allowed.

A “WARNING!” message that pops up during the installation of these third-party channels stresses that Roku has no control over the content. In addition, the company notes that these channels may be removed if it links to copyright infringing content.

Roku Warning

“By continuing, you acknowledge you are accessing a non-certified channel that may include content that is offensive or inappropriate for some audiences,” Roku’s warning reads.

“Moreover, if Roku determines that this channel violates copyright, contains illegal content, or otherwise violates Roku’s terms and conditions, then ROKU MAY REMOVE THIS CHANNEL WITHOUT PRIOR NOTICE.”

TorrentFreak reached out to Roku to find out how they plan to enforce this policy, but we have yet to hear back. According to Cord Cutters News, several piracy channels have already been removed recently, with other developers opting to leave the platform.

Roku’s General Counsel Steve Kay previously informed us that the company is taking the piracy problem seriously. Together with various stakeholders, they are working hard to address the problem.

“We actively work to prevent third-parties from using our platform to distribute copyright infringing content. Moreover, we have been actively working with other industry stakeholders on a wide range of anti-piracy initiatives,” Kay said.

Roku is not the only platform dealing with the piracy epidemic, the popular media player software Kodi is in the same boat. Kodi has also taken an active anti-piracy stance but they’re not banning any add-ons. They believe it would be pointless due to the open source nature of their software.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Wirzenius: Retiring Obnam

Post Syndicated from corbet original https://lwn.net/Articles/730986/rss

Lars Wirzenius announces
that he is ending development of the Obnam backup system. “After
some careful thought, I fear that the maintainability problems of Obnam can
realistically only be solved by a complete rewrite from scratch, and I’m
not up to doing that. If you use Obnam, you should migrate to some other
backup solution. Don’t worry, you have until the end of the year. I will be
around and I intend to fix any serious bugs in Obnam; in particular,
security flaws. But you should start looking for a replacement sooner
rather than later.
” LWN looked at
Obnam
in 2012.

BREIN is Taking Infamous ‘Piracy’ Hosting Provider Ecatel to Court

Post Syndicated from Andy original https://torrentfreak.com/brein-is-taking-infamous-piracy-hosting-provider-ecatel-to-court-170815/

A regular website can be easily hosted in most countries of the world but when the nature of the project begins to step on toes, opportunities begin to reduce. Openly hosting The Pirate Bay, for example, is something few providers want to get involved with.

There are, however, providers out there who specialize in hosting services that others won’t touch. They develop a reputation of turning a blind eye to their customers’ activities, only reacting when a crisis looms on the horizon. Despite the problems, there are a few that are surprisingly resilient.

One such host is Netherlands-based Ecatel, which has hit the headlines many times over the years for allegedly having customers involved in warez, torrents, and streaming, not to mention spam and malware. For hosting the former group, it’s now in the crosshairs of Dutch anti-piracy group BREIN.

According to an application for a witness hearing filed with The Court of the Hague by BREIN, Ecatel has repeatedly hosted websites dealing in infringing content over recent years. While this is nothing particularly out of the ordinary, BREIN claims that complaints filed against the sites were dealt with slowly by Ecatel or not at all.

Ecatel Ltd is a company incorporated in the UK with servers in the Netherlands but since 2015, another hosting company called Novogara has appeared in tandem. Court documents suggest that Novogara is associated with Ecatel, something that was confirmed early 2016 in an email sent out by Ecatel itself.

“We’d like to inform you that all services of Ecatel Ltd are taken over by a new brand called Novogara Ltd with immediate effect. The take-over includes Ecatel and all her subsidiaries,” the email read.

Muddying the waters a little more, in 2015 Ecatel’s IP addresses were apparently taken over by Quasi Networks Ltd, a Seychelles-based company whose business is described locally as being conducted entirely overseas.

“Stichting BREIN has found several websites in the network of Quasi Networks with obviously infringing content. Quasi Networks, however, does not respond structurally to requests for closing those websites. This involves unlawful acts against the parties associated with the BREIN Foundation,” a ruling from the Court reads.

As a result, BREIN wants a witness hearing with three defendants connected to the Ecatel/Novgara/Quasi group of companies in order to establish the relationship between the businesses, where their servers are, and who is behind Quasi Networks.

“Stichting BREIN is interested in this information in order to be able to judge who it can appeal to and whether it is useful to start a legal procedure,” the Court adds.

Two of the defendants failed to lodge a defense against BREIN’s application but one objected to the request for a hearing. He said that since Quasi Networks, Ecatel and Novogara are all incorporated outside the Netherlands, a trial must also be conducted abroad and therefore a Dutch judge would not have jurisdiction.

He also argued that BREIN would use the witness hearing as a “fishing expedition” in order to gather information it currently does not have, in order to formulate some kind of case against the defendants, in one way or another.

In a decision published this week, The Court of the Hague rejected that argument, noting that the basis for the claim is copyright infringement through Netherlands-hosted websites. Furthermore, the majority of the witnesses are resident in the district of The Hague. It also underlined the importance of a hearing.

“The request for holding a preliminary witness hearing opens an independent petition procedure, which does not address the eligibility of any claim that may be lodged. An investigation must be made by the judge who has to deal with and decide the main case – if it comes.

“The court points out that a preliminary witness hearing is now (partly) necessary to clarify whether and to what extent a claim has any chance of success,” the decision reads.

According to documents published by Companies House in the UK, Ecatel Ltd ceased to exist this morning, having been dissolved at the request of its directors.

The hearing of the witnesses is set to take place on Tuesday, September 26, 2017 at 9.30 in the Palace of Justice at Prince Claus 60 in The Hague.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

AWS Summit New York – Summary of Announcements

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-summit-new-york-summary-of-announcements/

Whew – what a week! Tara, Randall, Ana, and I have been working around the clock to create blog posts for the announcements that we made at the AWS Summit in New York. Here’s a summary to help you to get started:

Amazon Macie – This new service helps you to discover, classify, and secure content at scale. Powered by machine learning and making use of Natural Language Processing (NLP), Macie looks for patterns and alerts you to suspicious behavior, and can help you with governance, compliance, and auditing. You can read Tara’s post to see how to put Macie to work; you select the buckets of interest, customize the classification settings, and review the results in the Macie Dashboard.

AWS GlueRandall’s post (with deluxe animated GIFs) introduces you to this new extract, transform, and load (ETL) service. Glue is serverless and fully managed, As you can see from the post, Glue crawls your data, infers schemas, and generates ETL scripts in Python. You define jobs that move data from place to place, with a wide selection of transforms, each expressed as code and stored in human-readable form. Glue uses Development Endpoints and notebooks to provide you with a testing environment for the scripts you build. We also announced that Amazon Athena now integrates with Amazon Glue, as does Apache Spark and Hive on Amazon EMR.

AWS Migration Hub – This new service will help you to migrate your application portfolio to AWS. My post outlines the major steps and shows you how the Migration Hub accelerates, tracks,and simplifies your migration effort. You can begin with a discovery step, or you can jump right in and migrate directly. Migration Hub integrates with tools from our migration partners and builds upon the Server Migration Service and the Database Migration Service.

CloudHSM Update – We made a major upgrade to AWS CloudHSM, making the benefits of hardware-based key management available to a wider audience. The service is offered on a pay-as-you-go basis, and is fully managed. It is open and standards compliant, with support for multiple APIs, programming languages, and cryptography extensions. CloudHSM is an integral part of AWS and can be accessed from the AWS Management Console, AWS Command Line Interface (CLI), and through API calls. Read my post to learn more and to see how to set up a CloudHSM cluster.

Managed Rules to Secure S3 Buckets – We added two new rules to AWS Config that will help you to secure your S3 buckets. The s3-bucket-public-write-prohibited rule identifies buckets that have public write access and the s3-bucket-public-read-prohibited rule identifies buckets that have global read access. As I noted in my post, you can run these rules in response to configuration changes or on a schedule. The rules make use of some leading-edge constraint solving techniques, as part of a larger effort to use automated formal reasoning about AWS.

CloudTrail for All Customers – Tara’s post revealed that AWS CloudTrail is now available and enabled by default for all AWS customers. As a bonus, Tara reviewed the principal benefits of CloudTrail and showed you how to review your event history and to deep-dive on a single event. She also showed you how to create a second trail, for use with CloudWatch CloudWatch Events.

Encryption of Data at Rest for EFS – When you create a new file system, you now have the option to select a key that will be used to encrypt the contents of the files on the file system. The encryption is done using an industry-standard AES-256 algorithm. My post shows you how to select a key and to verify that it is being used.

Watch the Keynote
My colleagues Adrian Cockcroft and Matt Wood talked about these services and others on the stage, and also invited some AWS customers to share their stories. Here’s the video:

Jeff;

 

Game of Thrones Pirates Arrested For Leaking Episode Early

Post Syndicated from Andy original https://torrentfreak.com/game-of-thrones-pirates-arrested-for-leaking-episode-early-170814/

Over the past several years, Game of Thrones has become synonymous with fantastic drama and story telling on the one hand, and Internet piracy on the other. It’s the most pirated TV show in history, hands down.

With the new season well underway, another GoT drama began to unfold early August when the then-unaired episode “The Spoils of War” began to circulate on various file-sharing and streaming sites. The leak only trumped the official release by a few days, but that didn’t stop people downloading in droves.

As previously reported, the leaked episode stated that it was “For Internal Viewing Only” at the top of the screen and on the bottom right sported a “Star India Pvt Ltd” watermark. The company commented shortly after.

“We take this breach very seriously and have immediately initiated forensic investigations at our and the technology partner’s end to swiftly determine the cause. This is a grave issue and we are taking appropriate legal remedial action,” a spokesperson said.

Now, just ten days later, that investigation has already netted its first victims. Four people have reportedly been arrested in India for leaking the episode before it aired.

“We investigated the case and have arrested four individuals for unauthorized publication of the fourth episode from season seven,” Deputy Commissioner of Police Akbar Pathan told AFP.

The report indicates that a complaint was filed by a Mumbai-based company that was responsible for storing and processing the TV episodes for an app. It has been named locally as Prime Focus Technologies, which markets itself as a Netflix “Preferred Vendor”.

It’s claimed that at least some of the men had access to login credentials for Game of Thrones episodes which were then abused for the purposes of leaking.

Local media identified the men as Bhaskar Joshi, Alok Sharma and Abhishek Ghadiyal, who were employed by Prime Focus, and Mohamad Suhail, a former employee, who was responsible for leaking the episode onto the Internet.

All of the men were based in Bangalore and were interrogated “throughout the night” at their workplace on August 11. Star India welcomed the arrests and thanked the authorities for their swift action.

“We are deeply grateful to the police for their swift and prompt action. We believe that valuable intellectual property is a critical part of the development of the creative industry and strict enforcement of the law is essential to protecting it,” the company said in a statement.

“We at Star India and Novi Digital Entertainment Private Limited stand committed and ready to help the law enforcement agencies with any technical assistance and help they may require in taking the investigation to its logical conclusion.”

The men will be held in custody until August 21 while investigations continue.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Launch – AWS Glue Now Generally Available

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/launch-aws-glue-now-generally-available/

Today we’re excited to announce the general availability of AWS Glue. Glue is a fully managed, serverless, and cloud-optimized extract, transform and load (ETL) service. Glue is different from other ETL services and platforms in a few very important ways.

First, Glue is “serverless” – you don’t need to provision or manage any resources and you only pay for resources when Glue is actively running. Second, Glue provides crawlers that can automatically detect and infer schemas from many data sources, data types, and across various types of partitions. It stores these generated schemas in a centralized Data Catalog for editing, versioning, querying, and analysis. Third, Glue can automatically generate ETL scripts (in Python!) to translate your data from your source formats to your target formats. Finally, Glue allows you to create development endpoints that allow your developers to use their favorite toolchains to construct their ETL scripts. Ok, let’s dive deep with an example.

In my job as a Developer Evangelist I spend a lot of time traveling and I thought it would be cool to play with some flight data. The Bureau of Transportations Statistics is kind enough to share all of this data for anyone to use here. We can easily download this data and put it in an Amazon Simple Storage Service (S3) bucket. This data will be the basis of our work today.

Crawlers

First, we need to create a Crawler for our flights data from S3. We’ll select Crawlers in the Glue console and follow the on screen prompts from there. I’ll specify s3://crawler-public-us-east-1/flight/2016/csv/ as my first datasource (we can add more later if needed). Next, we’ll create a database called flights and give our tables a prefix of flights as well.

The Crawler will go over our dataset, detect partitions through various folders – in this case months of the year, detect the schema, and build a table. We could add additonal data sources and jobs into our crawler or create separate crawlers that push data into the same database but for now let’s look at the autogenerated schema.

I’m going to make a quick schema change to year, moving it from BIGINT to INT. Then I can compare the two versions of the schema if needed.

Now that we know how to correctly parse this data let’s go ahead and do some transforms.

ETL Jobs

Now we’ll navigate to the Jobs subconsole and click Add Job. Will follow the prompts from there giving our job a name, selecting a datasource, and an S3 location for temporary files. Next we add our target by specifying “Create tables in your data target” and we’ll specify an S3 location in Parquet format as our target.

After clicking next, we’re at screen showing our various mappings proposed by Glue. Now we can make manual column adjustments as needed – in this case we’re just going to use the X button to remove a few columns that we don’t need.

This brings us to my favorite part. This is what I absolutely love about Glue.

Glue generated a PySpark script to transform our data based on the information we’ve given it so far. On the left hand side we can see a diagram documenting the flow of the ETL job. On the top right we see a series of buttons that we can use to add annotated data sources and targets, transforms, spigots, and other features. This is the interface I get if I click on transform.

If we add any of these transforms or additional data sources, Glue will update the diagram on the left giving us a useful visualization of the flow of our data. We can also just write our own code into the console and have it run. We can add triggers to this job that fire on completion of another job, a schedule, or on demand. That way if we add more flight data we can reload this same data back into S3 in the format we need.

I could spend all day writing about the power and versatility of the jobs console but Glue still has more features I want to cover. So, while I might love the script editing console, I know many people prefer their own development environments, tools, and IDEs. Let’s figure out how we can use those with Glue.

Development Endpoints and Notebooks

A Development Endpoint is an environment used to develop and test our Glue scripts. If we navigate to “Dev endpoints” in the Glue console we can click “Add endpoint” in the top right to get started. Next we’ll select a VPC, a security group that references itself and then we wait for it to provision.


Once it’s provisioned we can create an Apache Zeppelin notebook server by going to actions and clicking create notebook server. We give our instance an IAM role and make sure it has permissions to talk to our data sources. Then we can either SSH into the server or connect to the notebook to interactively develop our script.

Pricing and Documentation

You can see detailed pricing information here. Glue crawlers, ETL jobs, and development endpoints are all billed in Data Processing Unit Hours (DPU) (billed by minute). Each DPU-Hour costs $0.44 in us-east-1. A single DPU provides 4vCPU and 16GB of memory.

We’ve only covered about half of the features that Glue has so I want to encourage everyone who made it this far into the post to go read the documentation and service FAQs. Glue also has a rich and powerful API that allows you to do anything console can do and more.

We’re also releasing two new projects today. The aws-glue-libs provide a set of utilities for connecting, and talking with Glue. The aws-glue-samples repo contains a set of example jobs.

I hope you find that using Glue reduces the time it takes to start doing things with your data. Look for another post from me on AWS Glue soon because I can’t stop playing with this new service.
Randall