Tag Archives: history

In 1983, This Bell Labs Computer Was the First Machine to Become a Chess Master

Post Syndicated from Allison Marsh original https://spectrum.ieee.org/tech-history/silicon-revolution/in-1983-this-bell-labs-computer-was-the-first-machine-to-become-a-chess-master

Belle used a brute-force approach to best other computers and humans

Chess is a complicated game. It’s a game of strategy between two opponents, but with no hidden information and all of the potential moves known by both players at the outset. With each turn, players communicate their intent and try to anticipate the possible countermoves. The ability to envision several moves in advance is a recipe for victory, and one that mathematicians and logicians have long found intriguing.

Despite some early mechanical chess-playing machines—and at least one chess-playing hoax—mechanized chess play remained hypothetical until the advent of digital computing. While working on his Ph.D. in the early 1940s, the German computer pioneer Konrad Zuse used computer chess as an example for the high-level programming language he was developing, called Plankalkül. Due to World War II, however, his work wasn’t published until 1972. With Zuse’s work unknown to engineers in Britain and the United States, Norbert Wiener, Alan Turing, and notably Claude Shannon (with his 1950 paper “Programming a Computer for Playing Chess” [PDF]) paved the way for thinking about computer chess.

Beginning in the early 1970s, Bell Telephone Laboratories researchers Ken Thompson and Joe Condon developed Belle, a chess-playing computer. Thompson is cocreator of the Unix operating system, and he’s also a great lover of chess. He grew up in the era of Bobby Fischer, and as a youth he played in chess tournaments. He joined Bell Labs in 1966, after earning a master’s in electrical engineering and computer science from the University of California, Berkeley.

Joe Condon was a physicist by training who worked in the Metallurgy Division at Bell Labs. His research contributed to the understanding of the electronic band structure of metals, and his interests evolved with the rise of digital computing. Thompson got to know Condon when he and his Unix collaborator, Dennis Ritchie, began collaborating on a game called Space Travel, using a PDP-7 minicomputer that was under Condon’s purview. Thompson and Condon went on to collaborate on numerous projects, including promoting the use of C as the language for AT&T’s switching system.

Belle began as a software approach—Thompson had written a sample chess program in an early Unix manual. But after Cordon joined the team, the program morphed into a hybrid computer chess-playing machine, with Thompson handling the programming and Condon designing the hardware.

Belle consisted of three main parts: a move generator, a board evaluator, and a transposition table. The move generator identified the highest-value piece under attack and the lowest-value piece attacking, and it sorted potential moves based on that information. The board evaluator noted the king’s position and its relative safety during different stages of the game. The transposition table contained a memory cache of potential moves, and it made the evaluation more efficient.

Belle employed a brute-force approach. It looked at all of the possible moves a player could make with the current configuration of the board, and then considered all of the moves that the opponent could make. In chess, a turn taken by one player is called a ply. Initially, Belle could compute moves four plies deep. When Belle debuted at the Association for Computing Machinery’s North American Computer Chess Championship in 1978, where it claimed its first title, it had a search depth of eight plies. Belle went on to win the championship four more times. In 1983, it also become the first computer to earn the title of chess “master.”

Computer chess programmers were often treated with hostility when they pitted their systems against human competitors, some of whom were suspicious of potential cheating, while others were simply apprehensive. When Thompson wanted to test out Belle at his local chess club, he took pains to build up personal relationships. He offered his opponents a printout of the computer’s analysis of the match. If Belle won in mixed human/computer tournaments, he refused the prize money, offering it to the next person in line. Belle went on to play weekly at the Westfield Chess Club, in Westfield, N.J., for almost 10 years.

In contrast to human-centered chess competitions, where silence reigns so as not to disturb a player’s concentration, computer chess tournaments could be noisy affairs, with people discussing and debating different algorithms and game strategies. In a 2005 oral history, Thompson remembers them fondly. After a tournament, he would be invigorated and head back to the lab, ready to tackle a new problem.

For a computer, Belle led a colorful life, at one point becoming the object of a corporate practical joke. One day in 1978, Bell Labs computer scientist Mike Lesk, another member of the Unix team, stole some letterhead from AT&T chairman John D. deButts and wrote a fake memo, calling for the suspension of the “T. Belle Computer” project.

At the heart of the fake memo was a philosophical question: Is a game between a person and a computer a form of communication or of data processing? The memo claimed that it was the latter and that Belle therefore violated the 1956 antitrust decision barring the company from engaging in the computer business. In fact, though, AT&T’s top executives never pressured Belle’s creators to stop playing or inventing games at work, likely because the diversions led to economically productive research. The hoax became more broadly known after Dennis Ritchie featured it in a 2001 article, for a special issue of the International Computer Games Association Journal that was dedicated to Thompson’s contributions to computer chess.

In his oral history, Thompson describes how Belle also became the object of international intrigue. In the early 1980s, Soviet electrical engineer, computer scientist, and chess grandmaster Mikhail Botvinnik invited Thompson to bring Belle to Moscow for a series of demonstrations. He departed from New York’s John F. Kennedy International Airport, only to discover that Belle was not on the same plane.

Thompson learned of the machine’s fate after he’d been in Moscow for several days. A Bell Labs security guard who was moonlighting at JFK airport happened to see a Bell Labs box labeled “computer” that was roped off in the customs area. The guard alerted his friends at Bell Labs, and word eventually reached Condon, who lost no time in calling Thompson.

Condon warned Thompson to throw out the spare parts for Belle that he’d brought with him. “You’re probably going to be arrested when you get back,” he said. Why? Thompson asked. “For smuggling computers into Russia,” Condon replied.

In his oral history, Thompson speculates that Belle had fallen victim to the Reagan administration’s rhetoric concerning the “hemorrhage of technology” to the Soviet Union. Overzealous U.S. Customs agents had spotted Thompson’s box and confiscated it, but never alerted him or Bell Labs. His Moscow hosts seemed to agree that Reagan was to blame. When Thompson met with them to explain that Belle had been detained, the head of the Soviet chess club pointed out that the Ayatollah Khomeini had outlawed chess in Iran because it was against God. “Do you suppose Reagan did this to outlaw chess in the United States?” he asked Thompson.

Returning to the states, Thompson took Condon’s advice and dumped the spare parts in Germany. Arriving back home, he wasn’t arrested, for smuggling or anything else. But when he attempted to retrieve Belle at JFK, he was told that he was in violation of the Export Act—Belle’s old, outdated Hewlett-Packard monitor was on a list of banned items. Bell Labs paid a fine, and Belle was eventually returned.

After Belle had dominated the computer chess world for several years, its star began to fade, as more powerful computers with craftier algorithms came along. Chief among them was IBM’s Deep Blue, which captured international attention in 1996 when it won a game against world champion Garry Kasparov. Kasparov went on to win the match, but the ground was laid for a rematch. The following year, after extensive upgrades, Deep Blue defeated Kasparov, becoming the first computer to beat a human world champion in a tournament under regulation time controls.

Photographer Peter Adams brought Belle to my attention, and his story shows the value of being friendly to archivists. Adams had photographed Thompson and many of his Bell Labs colleagues for his portrait series “Faces of Open Source.” During Adams’s research for the series, Bell Labs corporate archivist Ed Eckert granted him permission to photograph some of the artifacts associated with the Unix research lab. Adams put Belle on his wish list, but he assumed that it was now in some museum collection. To his astonishment, he learned that the machine was still at Nokia Bell Labs in Murray Hill, N.J. As Adams wrote to me in an email, “It still had all the wear on it from the epic chess games it had played… :).”

An abridged version of this article appears in the May 2019 print issue as “Cold War Chess.”

Part of a continuing series looking at photographs of historical artifacts that embrace the boundless potential of technology.

About the Author

Allison Marsh is an associate professor of history at the University of South Carolina and codirector of the university’s Ann Johnson Institute for Science, Technology & Society.

Untold History of AI: How Amazon’s Mechanical Turkers Got Squeezed Inside the Machine

Post Syndicated from Oscar Schwartz original https://spectrum.ieee.org/tech-talk/tech-history/dawn-of-electronics/untold-history-of-ai-mechanical-turk-revisited-tktkt

Today’s unseen digital laborers resemble the human who powered the 18th-century Mechanical Turk

The history of AI is often told as the story of machines getting smarter over time. What’s lost is the human element in the narrative, how intelligent machines are designed, trained, and powered by human minds and bodies.

In this six-part series, we explore that human history of AI—how innovators, thinkers, workers, and sometimes hucksters have created algorithms that can replicate human thought and behavior (or at least appear to). While it can be exciting to be swept up by the idea of superintelligent computers that have no need for human input, the true history of smart machines shows that our AI is only as good as we are.

Part 6: Mechanical Turk Revisited

At the turn of millennium, Amazon began expanding its services beyond bookselling. As the variety of products on the site grew, the company had to figure out new ways to categorize and organize them. Part of this task was removing tens of thousands of duplicate products that were popping up on the website. 

Engineers at the company tried to write software that could automatically eliminate all duplicates across the site. Identifying and deleting objects seemed to be a simple task, one well within the capacities of a machine. Yet the engineers soon gave up, describing the data-processing challenge as “insurmountable.” This task, which presupposed the ability to notice subtle differences and similarities between pictures and text, actually required human intelligence. 

Amazon was left with a conundrum. Deleting duplicate products from the site was a trivial task for humans, but the sheer number of duplicates would require a huge workforce. Coordinating that many workers on one task was not a trivial problem.

An Amazon manager named Venky Harinarayan came up with a solution. His patent described a “hybrid machine/human computing arrangement,” which would break down tasks into small units, or “subtasks” and distribute them to a network of human workers.

In the case of deleting duplicates, a central computer could divide Amazon’s site into small sections—say, 100 product pages for can openers—and send the sections to human workers over the Internet. The workers could then identify duplicates in these small units and send their pieces of the puzzle back. 

This distributed system offered a crucial advantage: The workers didn’t have to be centralized in one place but could instead complete the subtasks on their own personal computers wherever they happened to be, whenever they chose. Essentially, what Harinaryran developed was an effective way to distribute low-skill yet difficult-to-automate work to a broad network of humans who could work in parallel.

The method proved so effective in Amazon’s internal operations, Jeff Bezos decided this system could be sold as a service to other companies. Bezos turned Harinaryan’s technology into a marketplace for laborers. There, businesses that had tasks that were easy for humans (but hard to automate) could be matched with a network of freelance workers, who would do the tasks for small amounts of money.

Thus was born Amazon Mechanical Turk, or mTurk for short. The service launched in 2005, and the user base quickly grew. Businesses and researchers around the globe began uploading thousands of so-called “human intelligence tasks” onto the platform, such as transcribing audio or captioning images. These tasks were dutifully carried out by an internationally dispersed and anonymous group of workers for a small fee (one aggrieved worker reported an average fee of 20 cents per task). 

The name of this new service was a wink at the chess-playing machine of the 18th century, the Mechanical Turk invented by the huckster Wolfgang von Kempelen. And just like that faux automaton, inside which hid a human chess player, the mTurk platform was designed to make human labor invisible. Workers on the platform are not represented with names, but with numbers, and communication between the requester and the worker is entirely depersonalized. Bezos himself has called these dehumanized workers “artificial artificial intelligence.”

Today, mTurk is a thriving marketplace with hundreds of thousands of workers around the world. While the online platform provides a source of income for people who otherwise might not have access to jobs, the labor conditions are highly questionable. Some critics have argued that by keeping the workers invisible and atomized, Amazon has made it easier for them to be exploited. A research paper [PDF] published in December 2017 found that workers earned a median wage of approximately US $2 per hour, and only 4 percent earned more than $7.25 per hour.

Interestingly, mTurk has also become crucial for the development of machine-learning applications. In machine learning, an AI program is given a large data set, then learns on its own how to find patterns and draw conclusions. MTurk workers are frequently used to build and label these training data sets, yet their role in machine learning is often overlooked.

The dynamic now playing out between the AI community and mTurk is one that has been ever-present throughout the history of machine intelligence. We eagerly admire the visage of the autonomous “intelligent machine,” while ignoring, or even actively concealing, the human labor that makes it possible.

Perhaps we can take a lesson from the author Edgar Allan Poe. When he viewed von Kempelen’s Mechanical Turk, he was not fooled by the illusion. Instead, he wondered what it would be like for the chess player trapped inside, the concealed laborer “tightly compressed” among cogs and levers in “exceedingly painful and awkward positions.”

In our current moment, when headlines about AI breakthroughs populate our news feeds, it’s important to remember Poe’s forensic attitude. It can be entertaining—if sometimes alarming—to be swept up in the hype over AI, and to be carried away by the vision of machines that have no need for mere mortals. But if you look closer, you’ll likely see the traces of human labor.

This is the final installment of a six-part series on the untold history of AI. Part 5 told a story of algorithmic bias—from the 1980s. 

Eight years, 2000 blog posts

Post Syndicated from Liz Upton original https://www.raspberrypi.org/blog/eight-years-2000-blog-posts/

Today’s a bit of a milestone for us: this is the 2000th post on this blog.

Why does a computer company have a blog? When did it start, who writes it, and where does the content come from? And don’t you have sore fingers? All of these are good questions: I’m here to answer them for you.

The first ever Raspberry Pi blog post

Marital circumstances being what they are, I had a front-row view of everything that was going on at Raspberry Pi, right from the original conversations that kicked the project off in 2009. In 2011, when development was still being done on Eben’s and my kitchen table, we met with sudden and slightly alarming fame when Rory Cellan Jones from the BBC shot a short video of a prototype Raspberry Pi and blogged about it – his post went viral. I was working as a freelance journalist and editor at the time, but realised that we weren’t going to get a better chance to kickstart a community, so I dropped my freelance work and came to work full-time for Raspberry Pi.

Setting up an instantiation of WordPress so we could talk to all Rory’s readers, each of whom decided we’d promised we’d make them a $25 computer, was one of the first orders of business. We could use the WordPress site to announce news, and to run a sort of devlog, which is what became this blog; back then, many of our blog posts were about the development of the original Raspberry Pi.

It was a lovely time to be writing about what we do, because we could be very open about the development process and how we were moving towards launch in a way that sadly, is closed to us today. (If we’d blogged about the development of Raspberry Pi 3 in the detail we’d blogged about Raspberry Pi 1, we’d not only have been handing sensitive and helpful commercial information to the large number of competitor organisations that have sprung up like mushrooms since that original launch; but you’d also all have stopped buying Pi 2 in the run-up, starving us of the revenue we need to do the development work.)

Once Raspberry Pis started making their way into people’s hands in early 2012, I realised there was something else that it was important to share: news about what new users were doing with their Pis. And I will never, ever stop being shocked at the applications of Raspberry Pi that you come up with. Favourites from over the years? The paludarium’s still right up there (no, I didn’t know what a paludarium was either when I found out about it); the cucumber sorter’s brilliant; and the home-brew artificial pancreas blows my mind. I’ve a particular soft spot for musical projects (which I wish you guys would comment on a bit more so I had an excuse to write about more of them).



As we’ve grown, my job has grown too, so I don’t write all the posts here like I used to. I oversee press, communications, marketing and PR for Raspberry Pi Trading now, working with a team of writers, editors, designers, illustrators, photographers, videographers and managers – it’s very different from the days when the office was that kitchen table. Alex Bate, our magisterial Head of Social Media, now writes a lot of what you see on this blog, but it’s always a good day for me when I have time to pitch in and write a post.

I’d forgotten some of the early stuff before looking at 2011’s blog posts to jog my memory as I wrote today’s. What were we thinking when we decided to ship without GPIO pins soldered on? (Happily for the project and for the 25,000,000 Pi owners all over the world in 2019, we changed our minds before we finally launched.) Just how many days in aggregate did I spend stuffing envelopes with stickers at £1 a throw to raise some early funds to get the first PCBs made? (I still have nightmares about the paper cuts.) And every time I think I’m having a bad day, I need to remember that this thing happened, and yet everything was OK again in the end. (The backs of my hands have gone all prickly just thinking about it.) Now I think about it, the Xenon Death Flash happened too. We also survived that.

At the bottom of it all, this blog has always been about community. It’s about sharing what we do, what you do, and making links between people all over the world who have this little machine in common. The work you do telling people about Raspberry Pi, putting it into your own projects, and supporting us by buying the product doesn’t just help us make hardware: every penny we make funds the Raspberry Pi Foundation’s charitable work, helps kids on every continent to learn the skills they need to make their own futures better, and, we think, makes the world a better place. So thank you. As long as you keep reading, we’ll keep writing.

The post Eight years, 2000 blog posts appeared first on Raspberry Pi.

Storing Encrypted Credentials In Git

Post Syndicated from Bozho original https://techblog.bozho.net/storing-encrypted-credentials-in-git/

We all know that we should not commit any passwords or keys to the repo with our code (no matter if public or private). Yet, thousands of production passwords can be found on GitHub (and probably thousands more in internal company repositories). Some have tried to fix that by removing the passwords (once they learned it’s not a good idea to store them publicly), but passwords have remained in the git history.

Knowing what not to do is the first and very important step. But how do we store production credentials. Database credentials, system secrets (e.g. for HMACs), access keys for 3rd party services like payment providers or social networks. There doesn’t seem to be an agreed upon solution.

I’ve previously argued with the 12-factor app recommendation to use environment variables – if you have a few that might be okay, but when the number of variables grow (as in any real application), it becomes impractical. And you can set environment variables via a bash script, but you’d have to store it somewhere. And in fact, even separate environment variables should be stored somewhere.

This somewhere could be a local directory (risky), a shared storage, e.g. FTP or S3 bucket with limited access, or a separate git repository. I think I prefer the git repository as it allows versioning (Note: S3 also does, but is provider-specific). So you can store all your environment-specific properties files with all their credentials and environment-specific configurations in a git repo with limited access (only Ops people). And that’s not bad, as long as it’s not the same repo as the source code.

Such a repo would look like this:

project
└─── production
|   |   application.properites
|   |   keystore.jks
└─── staging
|   |   application.properites
|   |   keystore.jks
└─── on-premise-client1
|   |   application.properites
|   |   keystore.jks
└─── on-premise-client2
|   |   application.properites
|   |   keystore.jks

Since many companies are using GitHub or BitBucket for their repositories, storing production credentials on a public provider may still be risky. That’s why it’s a good idea to encrypt the files in the repository. A good way to do it is via git-crypt. It is “transparent” encryption because it supports diff and encryption and decryption on the fly. Once you set it up, you continue working with the repo as if it’s not encrypted. There’s even a fork that works on Windows.

You simply run git-crypt init (after you’ve put the git-crypt binary on your OS Path), which generates a key. Then you specify your .gitattributes, e.g. like that:

secretfile filter=git-crypt diff=git-crypt
*.key filter=git-crypt diff=git-crypt
*.properties filter=git-crypt diff=git-crypt
*.jks filter=git-crypt diff=git-crypt

And you’re done. Well, almost. If this is a fresh repo, everything is good. If it is an existing repo, you’d have to clean up your history which contains the unencrypted files. Following these steps will get you there, with one addition – before calling git commit, you should call git-crypt status -f so that the existing files are actually encrypted.

You’re almost done. We should somehow share and backup the keys. For the sharing part, it’s not a big issue to have a team of 2-3 Ops people share the same key, but you could also use the GPG option of git-crypt (as documented in the README). What’s left is to backup your secret key (that’s generated in the .git/git-crypt directory). You can store it (password-protected) in some other storage, be it a company shared folder, Dropbox/Google Drive, or even your email. Just make sure your computer is not the only place where it’s present and that it’s protected. I don’t think key rotation is necessary, but you can devise some rotation procedure.

git-crypt authors claim to shine when it comes to encrypting just a few files in an otherwise public repo. And recommend looking at git-remote-gcrypt. But as often there are non-sensitive parts of environment-specific configurations, you may not want to encrypt everything. And I think it’s perfectly fine to use git-crypt even in a separate repo scenario. And even though encryption is an okay approach to protect credentials in your source code repo, it’s still not necessarily a good idea to have the environment configurations in the same repo. Especially given that different people/teams manage these credentials. Even in small companies, maybe not all members have production access.

The outstanding questions in this case is – how do you sync the properties with code changes. Sometimes the code adds new properties that should be reflected in the environment configurations. There are two scenarios here – first, properties that could vary across environments, but can have default values (e.g. scheduled job periods), and second, properties that require explicit configuration (e.g. database credentials). The former can have the default values bundled in the code repo and therefore in the release artifact, allowing external files to override them. The latter should be announced to the people who do the deployment so that they can set the proper values.

The whole process of having versioned environment-speific configurations is actually quite simple and logical, even with the encryption added to the picture. And I think it’s a good security practice we should try to follow.

The post Storing Encrypted Credentials In Git appeared first on Bozho's tech blog.

Hiring a Director of Sales

Post Syndicated from Yev original https://www.backblaze.com/blog/hiring-a-director-of-sales/

Backblaze is hiring a Director of Sales. This is a critical role for Backblaze as we continue to grow the team. We need a strong leader who has experience in scaling a sales team and who has an excellent track record for exceeding goals by selling Software as a Service (SaaS) solutions. In addition, this leader will need to be highly motivated, as well as able to create and develop a highly-motivated, success oriented sales team that has fun and enjoys what they do.

The History of Backblaze from our CEO
In 2007, after a friend’s computer crash caused her some suffering, we realized that with every photo, video, song, and document going digital, everyone would eventually lose all of their information. Five of us quit our jobs to start a company with the goal of making it easy for people to back up their data.

Like many startups, for a while we worked out of a co-founder’s one-bedroom apartment. Unlike most startups, we made an explicit agreement not to raise funding during the first year. We would then touch base every six months and decide whether to raise or not. We wanted to focus on building the company and the product, not on pitching and slide decks. And critically, we wanted to build a culture that understood money comes from customers, not the magical VC giving tree. Over the course of 5 years we built a profitable, multi-million dollar revenue business — and only then did we raise a VC round.

Fast forward 10 years later and our world looks quite different. You’ll have some fantastic assets to work with:

  • A brand millions recognize for openness, ease-of-use, and affordability.
  • A computer backup service that stores over 500 petabytes of data, has recovered over 30 billion files for hundreds of thousands of paying customers — most of whom self-identify as being the people that find and recommend technology products to their friends.
  • Our B2 service that provides the lowest cost cloud storage on the planet at 1/4th the price Amazon, Google or Microsoft charges. While being a newer product on the market, it already has over 100,000 IT and developers signed up as well as an ecosystem building up around it.
  • A growing, profitable and cash-flow positive company.
  • And last, but most definitely not least: a great sales team.

You might be saying, “sounds like you’ve got this under control — why do you need me?” Don’t be misled. We need you. Here’s why:

  • We have a great team, but we are in the process of expanding and we need to develop a structure that will easily scale and provide the most success to drive revenue.
  • We just launched our outbound sales efforts and we need someone to help develop that into a fully successful program that’s building a strong pipeline and closing business.
  • We need someone to work with the marketing department and figure out how to generate more inbound opportunities that the sales team can follow up on and close.
  • We need someone who will work closely in developing the skills of our current sales team and build a path for career growth and advancement.
  • We want someone to manage our Customer Success program.

So that’s a bit about us. What are we looking for in you?

Experience: As a sales leader, you will strategically build and drive the territory’s sales pipeline by assembling and leading a skilled team of sales professionals. This leader should be familiar with generating, developing and closing software subscription (SaaS) opportunities. We are looking for a self-starter who can manage a team and make an immediate impact of selling our Backup and Cloud Storage solutions. In this role, the sales leader will work closely with the VP of Sales, marketing staff, and service staff to develop and implement specific strategic plans to achieve and exceed revenue targets, including new business acquisition as well as build out our customer success program.

Leadership: We have an experienced team who’s brought us to where we are today. You need to have the people and management skills to get them excited about working with you. You need to be a strong leader and compassionate about developing and supporting your team.

Data driven and creative: The data has to show something makes sense before we scale it up. However, without creativity, it’s easy to say “the data shows it’s impossible” or to find a local maximum. Whether it’s deciding how to scale the team, figuring out what our outbound sales efforts should look like or putting a plan in place to develop the team for career growth, we’ve seen a bit of creativity get us places a few extra dollars couldn’t.

Jive with our culture: Strong leaders affect culture and the person we hire for this role may well shape, not only fit into, ours. But to shape the culture you have to be accepted by the organism, which means a certain set of shared values. We default to openness with our team, our customers, and everyone if possible. We love initiative — without arrogance or dictatorship. We work to create a place people enjoy showing up to work. That doesn’t mean ping pong tables and foosball (though we do try to have perks & fun), but it means people are friendly, non-political, working to build a good service but also a good place to work.

Do the work: Ideas and strategy are critical, but good execution makes them happen. We’re looking for someone who can help the team execute both from the perspective of being capable of guiding and organizing, but also someone who is hands-on themselves.

Additional Responsibilities needed for this role:

  • Recruit, coach, mentor, manage and lead a team of sales professionals to achieve yearly sales targets. This includes closing new business and expanding upon existing clientele.
  • Expand the customer success program to provide the best customer experience possible resulting in upsell opportunities and a high retention rate.
  • Develop effective sales strategies and deliver compelling product demonstrations and sales pitches.
  • Acquire and develop the appropriate sales tools to make the team efficient in their daily work flow.
  • Apply a thorough understanding of the marketplace, industry trends, funding developments, and products to all management activities and strategic sales decisions.
  • Ensure that sales department operations function smoothly, with the goal of facilitating sales and/or closings; operational responsibilities include accurate pipeline reporting and sales forecasts.
  • This position will report directly to the VP of Sales and will be staffed in our headquarters in San Mateo, CA.

Requirements:

  • 7 – 10+ years of successful sales leadership experience as measured by sales performance against goals.
    Experience in developing skill sets and providing career growth and opportunities through advancement of team members.
  • Background in selling SaaS technologies with a strong track record of success.
  • Strong presentation and communication skills.
  • Must be able to travel occasionally nationwide.
  • BA/BS degree required

Think you want to join us on this adventure?
Send an email to jobscontact@backblaze.com with the subject “Director of Sales.” (Recruiters and agencies, please don’t email us.) Include a resume and answer these two questions:

  1. How would you approach evaluating the current sales team and what is your process for developing a growth strategy to scale the team?
  2. What are the goals you would set for yourself in the 3 month and 1-year timeframes?

Thank you for taking the time to read this and I hope that this sounds like the opportunity for which you’ve been waiting.

Backblaze is an Equal Opportunity Employer.

The post Hiring a Director of Sales appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Japan’s Directorate for Signals Intelligence

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/05/japans_director.html

The Intercept has a long article on Japan’s equivalent of the NSA: the Directorate for Signals Intelligence. Interesting, but nothing really surprising.

The directorate has a history that dates back to the 1950s; its role is to eavesdrop on communications. But its operations remain so highly classified that the Japanese government has disclosed little about its work ­ even the location of its headquarters. Most Japanese officials, except for a select few of the prime minister’s inner circle, are kept in the dark about the directorate’s activities, which are regulated by a limited legal framework and not subject to any independent oversight.

Now, a new investigation by the Japanese broadcaster NHK — produced in collaboration with The Intercept — reveals for the first time details about the inner workings of Japan’s opaque spy community. Based on classified documents and interviews with current and former officials familiar with the agency’s intelligence work, the investigation shines light on a previously undisclosed internet surveillance program and a spy hub in the south of Japan that is used to monitor phone calls and emails passing across communications satellites.

The article includes some new documents from the Snowden archive.

The Software Freedom Conservancy on Tesla’s GPL compliance

Post Syndicated from corbet original https://lwn.net/Articles/754919/rss

The Software Freedom Conservancy has put out a
blog posting
on the history and current status of Tesla’s GPL
compliance issues. “We’re thus glad that, this week, Tesla has acted
publicly regarding its current GPL violations and has announced that
they’ve taken their first steps toward compliance. While Tesla acknowledges
that they still have more work to do, their recent actions show progress
toward compliance and a commitment to getting all the way there.

EC2 Instance Update – C5 Instances with Local NVMe Storage (C5d)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/ec2-instance-update-c5-instances-with-local-nvme-storage-c5d/

As you can see from my EC2 Instance History post, we add new instance types on a regular and frequent basis. Driven by increasingly powerful processors and designed to address an ever-widening set of use cases, the size and diversity of this list reflects the equally diverse group of EC2 customers!

Near the bottom of that list you will find the new compute-intensive C5 instances. With a 25% to 50% improvement in price-performance over the C4 instances, the C5 instances are designed for applications like batch and log processing, distributed and or real-time analytics, high-performance computing (HPC), ad serving, highly scalable multiplayer gaming, and video encoding. Some of these applications can benefit from access to high-speed, ultra-low latency local storage. For example, video encoding, image manipulation, and other forms of media processing often necessitates large amounts of I/O to temporary storage. While the input and output files are valuable assets and are typically stored as Amazon Simple Storage Service (S3) objects, the intermediate files are expendable. Similarly, batch and log processing runs in a race-to-idle model, flushing volatile data to disk as fast as possible in order to make full use of compute resources.

New C5d Instances with Local Storage
In order to meet this need, we are introducing C5 instances equipped with local NVMe storage. Available for immediate use in 5 regions, these instances are a great fit for the applications that I described above, as well as others that you will undoubtedly dream up! Here are the specs:

Instance NamevCPUsRAMLocal StorageEBS BandwidthNetwork Bandwidth
c5d.large24 GiB1 x 50 GB NVMe SSDUp to 2.25 GbpsUp to 10 Gbps
c5d.xlarge48 GiB1 x 100 GB NVMe SSDUp to 2.25 GbpsUp to 10 Gbps
c5d.2xlarge816 GiB1 x 225 GB NVMe SSDUp to 2.25 GbpsUp to 10 Gbps
c5d.4xlarge1632 GiB1 x 450 GB NVMe SSD2.25 GbpsUp to 10 Gbps
c5d.9xlarge3672 GiB1 x 900 GB NVMe SSD4.5 Gbps10 Gbps
c5d.18xlarge72144 GiB2 x 900 GB NVMe SSD9 Gbps25 Gbps

Other than the addition of local storage, the C5 and C5d share the same specs. Both are powered by 3.0 GHz Intel Xeon Platinum 8000-series processors, optimized for EC2 and with full control over C-states on the two largest sizes, giving you the ability to run two cores at up to 3.5 GHz using Intel Turbo Boost Technology.

You can use any AMI that includes drivers for the Elastic Network Adapter (ENA) and NVMe; this includes the latest Amazon Linux, Microsoft Windows (Server 2008 R2, Server 2012, Server 2012 R2 and Server 2016), Ubuntu, RHEL, SUSE, and CentOS AMIs.

Here are a couple of things to keep in mind about the local NVMe storage:

Naming – You don’t have to specify a block device mapping in your AMI or during the instance launch; the local storage will show up as one or more devices (/dev/nvme*1 on Linux) after the guest operating system has booted.

Encryption – Each local NVMe device is hardware encrypted using the XTS-AES-256 block cipher and a unique key. Each key is destroyed when the instance is stopped or terminated.

Lifetime – Local NVMe devices have the same lifetime as the instance they are attached to, and do not stick around after the instance has been stopped or terminated.

Available Now
C5d instances are available in On-Demand, Reserved Instance, and Spot form in the US East (N. Virginia), US West (Oregon), EU (Ireland), US East (Ohio), and Canada (Central) Regions. Prices vary by Region, and are just a bit higher than for the equivalent C5 instances.

Jeff;

PS – We will be adding local NVMe storage to other EC2 instance types in the months to come, so stay tuned!

Pirate IPTV Service Gave Customer Details to Premier League, But What’s the Risk?

Post Syndicated from Andy original https://torrentfreak.com/pirate-iptv-service-gave-customer-details-to-premier-league-but-whats-the-risk-180515/

In a report last weekend, we documented what appear to be the final days of pirate IPTV provider Ace Hosting.

From information provided by several sources including official liquidation documents, it became clear that a previously successful and profitable Ace had succumbed to pressure from the Premier League, which accused the service of copyright infringement.

The company had considerable funds in the bank – £255,472.00 to be exact – but it also had debts of £717,278.84, including £260,000 owed to HMRC and £100,000 to the Premier League as part of a settlement agreement.

Information received by TF late Sunday suggested that £100K was the tip of the iceberg as far as the Premier League was concerned and in a statement yesterday, the football outfit confirmed that was the case.

“A renowned pirate of Premier League content to consumers has been forced to liquidate after agreeing to pay £600,000 for breaching the League’s copyright,” the Premier League announced.

“Ace IPTV, run by Craig Driscoll and Ian Isaac, was selling subscriptions to illegal Premier League streams directly to consumers which allowed viewing on a range of devices, including notorious Kodi-type boxes, as well as to smaller resellers in the UK and abroad.”

Sources familiar with the case suggest that while Ace Hosting Limited didn’t have the funds to pay the Premier League the full £600K, Ace’s operators agreed to pay (and have already paid, to some extent at least) what were essentially their own funds to cover amounts above the final £100K, which is due to be paid next year.

But that’s not the only thing that’s been handed over to the Premier League.

“Ace voluntarily disclosed the personal details of their customers, which the League will now review in compliance with data protection legislation. Further investigations will be conducted, and action taken where appropriate,” the Premier League added.

So, the big question now is how exposed Ace’s former subscribers are.

The truth is that only the Premier League knows for sure but TF has been able to obtain information from several sources which indicate that former subscribers probably aren’t the Premier League’s key interest and even if they were, information obtained on them would be of limited use.

According to a source with knowledge of how a system like Ace’s works, there is a separation of data which appears to help (at least to some degree) with the subscriber’s privacy.

“The system used to manage accounts and take payment is actually completely separate from the software used to manage streams and the lines themselves. They are never usually even on the same server so are two very different databases,” he told TF.

“So at best the only information that has voluntarily been provided to the [Premier League], is just your email, name and address (assuming you even used real details) and what hosting package or credits you bought.”

While this information is bad enough, the action against Ace is targeted, in that it focuses on the Premier League’s content and how Ace (and therefore its users) infringed on the football outfit’s copyrights. So, proving that subscribers actually watched any Premier League content would be an ideal position but it’s not straightforward, despite the potential for detailed logging.

“The management system contains no history of what you watched, when you watched it, when you signed in and so on. That is all contained in a different database on a different server.

“Because every connection is recorded [on the second server], it can create some two million entries a day and as such most providers either turn off this feature or delete the logs daily as having so many entries slows down the system down used for actual streams,” he explains.

Our source says that this data would likely to have been the first to be deleted and is probably “long gone” by now. However, even if the Premier League had obtained it, it’s unlikely they would be able to do much with it due to data protection laws.

“The information was passed to the [Premier League] voluntarily by ACE which means this information has been given from one entity to another without the end users’ consent, not part of the [creditors’ voluntary liquidation] and without a court order to support it. Data Protection right now is taken very seriously in the EU,” he notes.

At this point, it’s probably worth noting that while the word “voluntarily” has been used several times to explain the manner in which Ace handed over its subscribers’ details to the Premier League, the same word can be used to describe the manner in which the £600K settlement amount will be paid.

No one forces someone to pay or hand something over, that’s what the courts are for, and the aim here was to avoid that eventuality.

Other pieces of information culled from various sources suggest that PayPal payment information, limited to amounts only, was also handed over to the Premier League. And, perhaps most importantly (and perhaps predictably) as far as former subscribers are concerned, the football group was more interested in Ace’s upwards supplier chain (the ‘wholesale’ stream suppliers used, for example) than those buying the service.

Finally, while the Premier League is now seeking to send a message to customers that these services are risky to use, it’s difficult to argue with the assertion that it’s unsafe to hand over personal details to an illegal service.

“Ace IPTV’s collapse also highlighted the risk consumers take with their personal data when they sign up to illegal streaming services,” Premier League notes.

TF spoke with three IPTV providers who all confirmed that they don’t care what names and addresses people use to sign up with and that no checks are carried out to make sure they’re correct. However, one concedes that in order to run as a business, this information has to be requested and once a customer types it in, it’s possible that it could be handed over as part of a settlement.

“I’m not going to tell people to put in dummy details, how can I? It’s up to people to use their common sense. If they’re still worried they should give Sky their money because if our backs are against the wall, what do you think is going to happen?” he concludes.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Pirate IPTV Service Goes Bust After Premier League Deal, Exposing Users

Post Syndicated from Andy original https://torrentfreak.com/pirate-iptv-service-goes-bust-after-premier-league-deal-exposing-users-180913/

For those out of the loop, unauthorized IPTV services offering many thousands of unlicensed channels have been gaining in popularity in recent years. They’re relatively cheap, fairly reliable, and offer acceptable levels of service.

They are, however, a huge thorn in the side of rightsholders who are desperate to bring them to their knees. One such organization is the UK’s Premier League, which has been disrupting IPTV services over the past year, hoping they’ll shut down.

Most have simply ridden the wave of blocks but one provider, Ace Hosting in the UK, showed signs of stress last year, revealing that it would no longer sell new subscriptions. There was little doubt in most people’s minds that the Premier League had gotten uncomfortably close to the IPTV provider.

Now, many months later, the amazing story can be told. It’s both incredible and shocking and will leave many shaking their heads in disbelief. First up, some background.

Doing things ‘properly’ – incorporation of a pirate service…

Considering how most operators of questionable services like to stay in the shade, it may come as a surprise to learn that Ace Hosting Limited is a proper company. Incorporated and registered at Companies House on January 3, 2017, Ace has two registered directors – family team Ian and Judith Isaac.

In common with several other IPTV operators in the UK who are also officially registered with the authorities, Ace Hosting has never filed any meaningful accounts. There’s a theory that the corporate structure is basically one of convenience, one that allows for the handling of large volumes of cash while limiting liability. The downside, of course, is that people are often more easily identified, in part due to the comprehensive paper trail.

Thanks to what can only be described as a slow-motion train wreck, the Ace Hosting debacle is revealing a bewildering set of circumstances. Last December, when Ace said it would stop signing up new members due to legal pressure, a serious copyright threat had already been filed against it.

Premier League v Ace Hosting

Documents seen by TorrentFreak reveal that the Premier League sent legal threats to Ace Hosting on December 15, 2017, just days before the subscription closure announcement. Somewhat surprisingly, Ace apparently felt it could pay the Premier League a damages amount and keep on trading.

But early March 2018, with the Premier League threatening Ace with all kinds of bad things, the company made a strange announcement.

“The ISPs in the UK and across Europe have recently become much more aggressive in blocking our service while football games are in progress,” Ace said in a statement.

“In order to get ourselves off of the ISP blacklist we are going to black out the EPL games for all users (including VPN users) starting on Monday. We believe that this will enable us to rebuild the bypass process and successfully provide you with all EPL games.”

It seems doubtful that Ace really intended to thumb its nose at the Premier League but it had continued to sell subscriptions since receiving threats in December, so all things seemed possible. But on March 24 that all changed, when Ace effectively announced its closure.

Premier League 1, Ace Hosting 0

“It is with sorrow that we announce that we are no longer accepting renewals, upgrades to existing subscriptions or the purchase of new credits. We plan to support existing subscriptions until they expire,” the team wrote.

“EPL games including highlights continue to be blocked and are not expected to be reinstated before the end of the season.”

Indeed, just days later the Premier League demanded a six-figure settlement sum from Ace Hosting, presumably to make a lawsuit disappear. It was the straw that broke the camel’s back.

“When the proposed damages amount was received it was clear that the Company would not be able to cover the cost and that there was a very high probability that even with a negotiated settlement that the Company was insolvent,” documents relating to Ace’s liquidation read.

At this point, Ace says it immediately ceased trading but while torrent sites usually shut down and disappear into the night, Ace’s demise is now a matter of record.

Creditors – the good, the bad, and the ugly

On April 11, 2018, Ace’s directors contacted business recovery and insolvency specialists Begbies Traynor (Central) LLP to obtain advice on the company’s financial position. Begbies Traynor was instructed by Ace on April 23 and on May 8, Ace Hosting director Ian Isaac determined that his company could not pay its debts.

First the good news. According to an official report, Ace Hosting has considerable cash in the bank – £255,472.00 to be exact. Now the bad news – Ace has debts of £717,278.84. – the details of which are intriguing to say the least.

First up, Ace has ‘trade creditors’ to whom it owes £104,356. The vast majority of this sum is a settlement Ace agreed to pay to the Premier League.

“The directors entered into a settlement agreement with the Football Association Premier League Limited prior to placing the Company into liquidation as a result of a purported copyright infringement. However, there is a residual claim from the Football Association Premier League Limited which is included within trade creditors totaling £100,000,” Ace’s statement of affairs reads.

Bizarrely (given the nature of the business, at least) Ace also owes £260,000 to Her Majesty’s Revenue and Customs (HMRC) in unpaid VAT and corporation tax, which is effectively the government’s cut of the pirate IPTV business’s labors.

Former Ace Hosting subscriber? Your cash is as good as gone

Finally – and this is where things get a bit sweaty for Joe Public – there are 15,768 “consumer creditors”, split between ‘retail’ and ‘business’ customers of the service. Together they are owed a staggering £353,000.

Although the documentation isn’t explicit, retail customers appear to be people who have purchased an Ace IPTV subscription that still had time to run when the service closed down. Business customers seem likely to be resellers of the service, who purchased ‘credits’ and didn’t get time to sell them before Ace disappeared.

The poison chalice here is that those who are owed money by Ace can actually apply to get some of it back, but that could be extremely risky.

“Creditor claims have not yet been adjudicated but we estimate that the majority of customers who paid for subscription services will receive less than £3 if there is a distribution to unsecured creditors. Furthermore, customer details will be passed to the relevant authorities if there is any suggestion of unlawful conduct,” documentation reads.

We spoke with a former Ace customer who had this to say about the situation.

“It was generally a good service notwithstanding their half-arsed attempts to evade the EPL block. At its heart there were people who seemed to know how to operate a decent service, although the customer-facing side of things was not the greatest,” he said.

“And no, I won’t be claiming a refund. I went into it with my eyes fully open so I don’t hold anyone responsible, except myself. In any case, anyone who wants a refund has to complete a claim form and provide proof of ID (LOL).”

The bad news for former subscribers continues…potentially

While it’s likely that most people will forgo their £3, the bad news isn’t over for subscribers. Begbies Traynor is warning that the liquidators will decide whether to hand over subscribers’ personal details to the Premier League and/or the authorities.

In any event, sometime in the next couple of weeks the names and addresses of all subscribers will be made “available for inspection” at an address in Wiltshire for two days, meaning that any interested parties could potentially gain access to sensitive information.

The bottom line is that Ace Hosting is in the red to the tune of £461,907 and will eventually disappear into the bowels of history. Whether its operators will have to answer for their conduct will remain to be seen but it seems unimaginable at this stage that things will end well.

Subscribers probably won’t get sucked in but in a story as bizarre as this one, anything could yet happen.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

This is a really lovely Raspberry Pi tricorder

Post Syndicated from Helen Lynn original https://www.raspberrypi.org/blog/raspberry-pi-tricorder-prop/

At the moment I’m spending my evenings watching all of Star Trek in order. Yes, I have watched it before (but with some really big gaps). Yes, including the animated series (I’m up to The Terratin Incident). So I’m gratified to find this beautiful The Original Series–style tricorder build.

Star Trek Tricorder with Working Display!

At this year’s Replica Prop Forum showcase, we meet up once again wtih Brian Mix, who brought his new Star Trek TOS Tricorder. This beautiful replica captures the weight and finish of the filming hand prop, and Brian has taken it one step further with some modern-day electronics!

A what now?

If you don’t know what a tricorder is, which I guess is faintly possible, the easiest way I can explain is to steal words that Liz wrote when Recantha made one back in 2013. It’s “a made-up thing used by the crew of the Enterprise to measure stuff, store data, and scout ahead remotely when exploring strange new worlds, seeking out new life and new civilisations, and all that jazz.”

A brief history of Picorders

We’ve seen other Raspberry Pi–based realisations of this iconic device. Recantha’s LEGO-cased tricorder delivered some authentic functionality, including temperature sensors, an ultrasonic distance sensor, a photosensor, and a magnetometer. Michael Hahn’s tricorder for element14’s Sci-Fi Your Pi competition in 2015 packed some similar functions, along with Original Series audio effects, into a neat (albeit non-canon) enclosure.

Brian Mix’s Original Series tricorder

Brian Mix’s tricorder, seen in the video above from Tested at this year’s Replica Prop Forum showcase, is based on a high-quality kit into which, he discovered, a Raspberry Pi just fits. He explains that the kit is the work of the late Steve Horch, a special effects professional who provided props for later Star Trek series, including the classic Deep Space Nine episode Trials and Tribble-ations.

A still from an episode of Star Trek: Deep Space Nine: Jadzia Dax, holding an Original Series-sylte tricorder, speaks with Benjamin Sisko

Dax, equipped for time travel

This episode’s plot required sets and props — including tricorders — replicating the USS Enterprise of The Original Series, and Steve Horch provided many of these. Thus, a tricorder kit from him is about as close to authentic as you can possibly find unless you can get your hands on a screen-used prop. The Pi allows Brian to drive a real display and a speaker: “Being the geek that I am,” he explains, “I set it up to run every single Original Series Star Trek episode.”

Even more wonderful hypothetical tricorders that I would like someone to make

This tricorder is beautiful, and it makes me think how amazing it would be to squeeze in some of the sensor functionality of the devices depicted in the show. Space in the case is tight, but it looks like there might be a little bit of depth to spare — enough for an IMU, maybe, or a temperature sensor. I’m certain the future will bring more Pi tricorder builds, and I, for one, can’t wait. Please tell us in the comments if you’re planning something along these lines, and, well, I suppose some other sci-fi franchises have decent Pi project potential too, so we could probably stand to hear about those.

If you’re commenting, no spoilers please past The Animated Series S1 E11. Thanks.

The post This is a really lovely Raspberry Pi tricorder appeared first on Raspberry Pi.

Cryptocurrency Security Challenges

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/cryptocurrency-security-challenges/

Physical coins representing cyrptocurrencies

Most likely you’ve read the tantalizing stories of big gains from investing in cryptocurrencies. Someone who invested $1,000 into bitcoins five years ago would have over $85,000 in value now. Alternatively, someone who invested in bitcoins three months ago would have seen their investment lose 20% in value. Beyond the big price fluctuations, currency holders are possibly exposed to fraud, bad business practices, and even risk losing their holdings altogether if they are careless in keeping track of the all-important currency keys.

It’s certain that beyond the rewards and risks, cryptocurrencies are here to stay. We can’t ignore how they are changing the game for how money is handled between people and businesses.

Some Advantages of Cryptocurrency

  • Cryptocurrency is accessible to anyone.
  • Decentralization means the network operates on a user-to-user (or peer-to-peer) basis.
  • Transactions can completed for a fraction of the expense and time required to complete traditional asset transfers.
  • Transactions are digital and cannot be counterfeited or reversed arbitrarily by the sender, as with credit card charge-backs.
  • There aren’t usually transaction fees for cryptocurrency exchanges.
  • Cryptocurrency allows the cryptocurrency holder to send exactly what information is needed and no more to the merchant or recipient, even permitting anonymous transactions (for good or bad).
  • Cryptocurrency operates at the universal level and hence makes transactions easier internationally.
  • There is no other electronic cash system in which your account isn’t owned by someone else.

On top of all that, blockchain, the underlying technology behind cryptocurrencies, is already being applied to a variety of business needs and itself becoming a hot sector of the tech economy. Blockchain is bringing traceability and cost-effectiveness to supply-chain management — which also improves quality assurance in areas such as food, reducing errors and improving accounting accuracy, smart contracts that can be automatically validated, signed and enforced through a blockchain construct, the possibility of secure, online voting, and many others.

Like any new, booming marketing there are risks involved in these new currencies. Anyone venturing into this domain needs to have their eyes wide open. While the opportunities for making money are real, there are even more ways to lose money.

We’re going to cover two primary approaches to staying safe and avoiding fraud and loss when dealing with cryptocurrencies. The first is to thoroughly vet any person or company you’re dealing with to judge whether they are ethical and likely to succeed in their business segment. The second is keeping your critical cryptocurrency keys safe, which we’ll deal with in this and a subsequent post.

Caveat Emptor — Buyer Beware

The short history of cryptocurrency has already seen the demise of a number of companies that claimed to manage, mine, trade, or otherwise help their customers profit from cryptocurrency. Mt. Gox, GAW Miners, and OneCoin are just three of the many companies that disappeared with their users’ money. This is the traditional equivalent of your bank going out of business and zeroing out your checking account in the process.

That doesn’t happen with banks because of regulatory oversight. But with cryptocurrency, you need to take the time to investigate any company you use to manage or trade your currencies. How long have they been around? Who are their investors? Are they affiliated with any reputable financial institutions? What is the record of their founders and executive management? These are all important questions to consider when evaluating a company in this new space.

Would you give the keys to your house to a service or person you didn’t thoroughly know and trust? Some companies that enable you to buy and sell currencies online will routinely hold your currency keys, which gives them the ability to do anything they want with your holdings, including selling them and pocketing the proceeds if they wish.

That doesn’t mean you shouldn’t ever allow a company to keep your currency keys in escrow. It simply means that you better know with whom you’re doing business and if they’re trustworthy enough to be given that responsibility.

Keys To the Cryptocurrency Kingdom — Public and Private

If you’re an owner of cryptocurrency, you know how this all works. If you’re not, bear with me for a minute while I bring everyone up to speed.

Cryptocurrency has no physical manifestation, such as bills or coins. It exists purely as a computer record. And unlike currencies maintained by governments, such as the U.S. dollar, there is no central authority regulating its distribution and value. Cryptocurrencies use a technology called blockchain, which is a decentralized way of keeping track of transactions. There are many copies of a given blockchain, so no single central authority is needed to validate its authenticity or accuracy.

The validity of each cryptocurrency is determined by a blockchain. A blockchain is a continuously growing list of records, called “blocks”, which are linked and secured using cryptography. Blockchains by design are inherently resistant to modification of the data. They perform as an open, distributed ledger that can record transactions between two parties efficiently and in a verifiable, permanent way. A blockchain is typically managed by a peer-to-peer network collectively adhering to a protocol for validating new blocks. Once recorded, the data in any given block cannot be altered retroactively without the alteration of all subsequent blocks, which requires collusion of the network majority. On a scaled network, this level of collusion is impossible — making blockchain networks effectively immutable and trustworthy.

Blockchain process

The other element common to all cryptocurrencies is their use of public and private keys, which are stored in the currency’s wallet. A cryptocurrency wallet stores the public and private “keys” or “addresses” that can be used to receive or spend the cryptocurrency. With the private key, it is possible to write in the public ledger (blockchain), effectively spending the associated cryptocurrency. With the public key, it is possible for others to send currency to the wallet.

What is a cryptocurrency address?

Cryptocurrency “coins” can be lost if the owner loses the private keys needed to spend the currency they own. It’s as if the owner had lost a bank account number and had no way to verify their identity to the bank, or if they lost the U.S. dollars they had in their wallet. The assets are gone and unusable.

The Cryptocurrency Wallet

Given the importance of these keys, and lack of recourse if they are lost, it’s obviously very important to keep track of your keys.

If you’re being careful in choosing reputable exchanges, app developers, and other services with whom to trust your cryptocurrency, you’ve made a good start in keeping your investment secure. But if you’re careless in managing the keys to your bitcoins, ether, Litecoin, or other cryptocurrency, you might as well leave your money on a cafe tabletop and walk away.

What Are the Differences Between Hot and Cold Wallets?

Just like other numbers you might wish to keep track of — credit cards, account numbers, phone numbers, passphrases — cryptocurrency keys can be stored in a variety of ways. Those who use their currencies for day-to-day purchases most likely will want them handy in a smartphone app, hardware key, or debit card that can be used for purchases. These are called “hot” wallets. Some experts advise keeping the balances in these devices and apps to a minimal amount to avoid hacking or data loss. We typically don’t walk around with thousands of dollars in U.S. currency in our old-style wallets, so this is really a continuation of the same approach to managing spending money.

Bread mobile app screenshot

A “hot” wallet, the Bread mobile app

Some investors with large balances keep their keys in “cold” wallets, or “cold storage,” i.e. a device or location that is not connected online. If funds are needed for purchases, they can be transferred to a more easily used payment medium. Cold wallets can be hardware devices, USB drives, or even paper copies of your keys.

Trezor hardware wallet

A “cold” wallet, the Trezor hardware wallet

Ledger Nano S hardware wallet

A “cold” wallet, the Ledger Nano S

Bitcoin paper wallet

A “cold” Bitcoin paper wallet

Wallets are suited to holding one or more specific cryptocurrencies, and some people have multiple wallets for different currencies and different purposes.

A paper wallet is nothing other than a printed record of your public and private keys. Some prefer their records to be completely disconnected from the internet, and a piece of paper serves that need. Just like writing down an account password on paper, however, it’s essential to keep the paper secure to avoid giving someone the ability to freely access your funds.

How to Keep your Keys, and Cryptocurrency Secure

In a post this coming Thursday, Securing Your Cryptocurrency, we’ll discuss the best strategies for backing up your cryptocurrency so that your currencies don’t become part of the millions that have been lost. We’ll cover the common (and uncommon) approaches to backing up hot wallets, cold wallets, and using paper and metal solutions to keeping your keys safe.

In the meantime, please tell us of your experiences with cryptocurrencies — good and bad — and how you’ve dealt with the issue of cryptocurrency security.

The post Cryptocurrency Security Challenges appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Own your own working Pokémon Pokédex!

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/deep-learning-pokedex/

Squeal with delight as your inner Pokémon trainer witnesses the wonder of Adrian Rosebrock’s deep learning Pokédex.

Creating a real-life Pokedex with a Raspberry Pi, Python, and Deep Learning

This video demos a real-like Pokedex, complete with visual recognition, that I created using a Raspberry Pi, Python, and Deep Learning. You can find the entire blog post, including code, using this link: https://www.pyimagesearch.com/2018/04/30/a-fun-hands-on-deep-learning-project-for-beginners-students-and-hobbyists/ Music credit to YouTube user “No Copyright” for providing royalty free music: https://www.youtube.com/watch?v=PXpjqURczn8

The history of Pokémon in 30 seconds

The Pokémon franchise was created by video game designer Satoshi Tajiri in 1995. In the fictional world of Pokémon, Pokémon Trainers explore the vast landscape, catching and training small creatures called Pokémon. To date, there are 802 different types of Pokémon. They range from the ever recognisable Pikachu, a bright yellow electric Pokémon, to the highly sought-after Shiny Charizard, a metallic, playing-card-shaped Pokémon that your mate Alex claims she has in mint condition, but refuses to show you.

Pokemon GIF

In the world of Pokémon, children as young as ten-year-old protagonist and all-round annoyance Ash Ketchum are allowed to leave home and wander the wilderness. There, they hunt vicious, deadly creatures in the hope of becoming a Pokémon Master.

Adrian’s deep learning Pokédex

Adrian is a bit of a deep learning pro, as demonstrated by his Santa/Not Santa detector, which we wrote about last year. For that project, he also provided a great explanation of what deep learning actually is. In a nutshell:

…a subfield of machine learning, which is, in turn, a subfield of artificial intelligence (AI).While AI embodies a large, diverse set of techniques and algorithms related to automatic reasoning (inference, planning, heuristics, etc), the machine learning subfields are specifically interested in pattern recognition and learning from data.

As with his earlier Raspberry Pi project, Adrian uses the Keras deep learning model and the TensorFlow backend, plus a few other packages such as Adrian’s own imutils functions and OpenCV.

Adrian trained a Convolutional Neural Network using Keras on a dataset of 1191 Pokémon images, obtaining 96.84% accuracy. As Adrian explains, this model is able to identify Pokémon via still image and video. It’s perfect for creating a Pokédex – an interactive Pokémon catalogue that should, according to the franchise, be able to identify and read out information on any known Pokémon when captured by camera. More information on model training can be found on Adrian’s blog.

Adrian Rosebeck deep learning pokemon pokedex

For the physical build, a Raspberry Pi 3 with camera module is paired with the Raspberry Pi 7″ touch display to create a portable Pokédex. And while Adrian comments that the same result can be achieved using your home computer and a webcam, that’s not how Adrian rolls as a Raspberry Pi fan.

Adrian Rosebeck deep learning pokemon pokedex

Plus, the smaller size of the Pi is perfect for one of you to incorporate this deep learning model into a 3D-printed Pokédex for ultimate Pokémon glory, pretty please, thank you.

Adrian Rosebeck deep learning pokemon pokedex

Adrian has gone into impressive detail about how the project works and how you can create your own on his blog, pyimagesearch. So if you’re interested in learning more about deep learning, and making your own Pokédex, be sure to visit.

The post Own your own working Pokémon Pokédex! appeared first on Raspberry Pi.

Ransomware Update: Viruses Targeting Business IT Servers

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/ransomware-update-viruses-targeting-business-it-servers/

Ransomware warning message on computer

As ransomware attacks have grown in number in recent months, the tactics and attack vectors also have evolved. While the primary method of attack used to be to target individual computer users within organizations with phishing emails and infected attachments, we’re increasingly seeing attacks that target weaknesses in businesses’ IT infrastructure.

How Ransomware Attacks Typically Work

In our previous posts on ransomware, we described the common vehicles used by hackers to infect organizations with ransomware viruses. Most often, downloaders distribute trojan horses through malicious downloads and spam emails. The emails contain a variety of file attachments, which if opened, will download and run one of the many ransomware variants. Once a user’s computer is infected with a malicious downloader, it will retrieve additional malware, which frequently includes crypto-ransomware. After the files have been encrypted, a ransom payment is demanded of the victim in order to decrypt the files.

What’s Changed With the Latest Ransomware Attacks?

In 2016, a customized ransomware strain called SamSam began attacking the servers in primarily health care institutions. SamSam, unlike more conventional ransomware, is not delivered through downloads or phishing emails. Instead, the attackers behind SamSam use tools to identify unpatched servers running Red Hat’s JBoss enterprise products. Once the attackers have successfully gained entry into one of these servers by exploiting vulnerabilities in JBoss, they use other freely available tools and scripts to collect credentials and gather information on networked computers. Then they deploy their ransomware to encrypt files on these systems before demanding a ransom. Gaining entry to an organization through its IT center rather than its endpoints makes this approach scalable and especially unsettling.

SamSam’s methodology is to scour the Internet searching for accessible and vulnerable JBoss application servers, especially ones used by hospitals. It’s not unlike a burglar rattling doorknobs in a neighborhood to find unlocked homes. When SamSam finds an unlocked home (unpatched server), the software infiltrates the system. It is then free to spread across the company’s network by stealing passwords. As it transverses the network and systems, it encrypts files, preventing access until the victims pay the hackers a ransom, typically between $10,000 and $15,000. The low ransom amount has encouraged some victimized organizations to pay the ransom rather than incur the downtime required to wipe and reinitialize their IT systems.

The success of SamSam is due to its effectiveness rather than its sophistication. SamSam can enter and transverse a network without human intervention. Some organizations are learning too late that securing internet-facing services in their data center from attack is just as important as securing endpoints.

The typical steps in a SamSam ransomware attack are:

1
Attackers gain access to vulnerable server
Attackers exploit vulnerable software or weak/stolen credentials.
2
Attack spreads via remote access tools
Attackers harvest credentials, create SOCKS proxies to tunnel traffic, and abuse RDP to install SamSam on more computers in the network.
3
Ransomware payload deployed
Attackers run batch scripts to execute ransomware on compromised machines.
4
Ransomware demand delivered requiring payment to decrypt files
Demand amounts vary from victim to victim. Relatively low ransom amounts appear to be designed to encourage quick payment decisions.

What all the organizations successfully exploited by SamSam have in common is that they were running unpatched servers that made them vulnerable to SamSam. Some organizations had their endpoints and servers backed up, while others did not. Some of those without backups they could use to recover their systems chose to pay the ransom money.

Timeline of SamSam History and Exploits

Since its appearance in 2016, SamSam has been in the news with many successful incursions into healthcare, business, and government institutions.

March 2016
SamSam appears

SamSam campaign targets vulnerable JBoss servers
Attackers hone in on healthcare organizations specifically, as they’re more likely to have unpatched JBoss machines.

April 2016
SamSam finds new targets

SamSam begins targeting schools and government.
After initial success targeting healthcare, attackers branch out to other sectors.

April 2017
New tactics include RDP

Attackers shift to targeting organizations with exposed RDP connections, and maintain focus on healthcare.
An attack on Erie County Medical Center costs the hospital $10 million over three months of recovery.
Erie County Medical Center attacked by SamSam ransomware virus

January 2018
Municipalities attacked

• Attack on Municipality of Farmington, NM.
• Attack on Hancock Health.
Hancock Regional Hospital notice following SamSam attack
• Attack on Adams Memorial Hospital
• Attack on Allscripts (Electronic Health Records), which includes 180,000 physicians, 2,500 hospitals, and 7.2 million patients’ health records.

February 2018
Attack volume increases

• Attack on Davidson County, NC.
• Attack on Colorado Department of Transportation.
SamSam virus notification

March 2018
SamSam shuts down Atlanta

• Second attack on Colorado Department of Transportation.
• City of Atlanta suffers a devastating attack by SamSam.
The attack has far-reaching impacts — crippling the court system, keeping residents from paying their water bills, limiting vital communications like sewer infrastructure requests, and pushing the Atlanta Police Department to file paper reports.
Atlanta Ransomware outage alert
• SamSam campaign nets $325,000 in 4 weeks.
Infections spike as attackers launch new campaigns. Healthcare and government organizations are once again the primary targets.

How to Defend Against SamSam and Other Ransomware Attacks

The best way to respond to a ransomware attack is to avoid having one in the first place. If you are attacked, making sure your valuable data is backed up and unreachable by ransomware infection will ensure that your downtime and data loss will be minimal or none if you ever suffer an attack.

In our previous post, How to Recover From Ransomware, we listed the ten ways to protect your organization from ransomware.

  1. Use anti-virus and anti-malware software or other security policies to block known payloads from launching.
  2. Make frequent, comprehensive backups of all important files and isolate them from local and open networks. Cybersecurity professionals view data backup and recovery (74% in a recent survey) by far as the most effective solution to respond to a successful ransomware attack.
  3. Keep offline backups of data stored in locations inaccessible from any potentially infected computer, such as disconnected external storage drives or the cloud, which prevents them from being accessed by the ransomware.
  4. Install the latest security updates issued by software vendors of your OS and applications. Remember to patch early and patch often to close known vulnerabilities in operating systems, server software, browsers, and web plugins.
  5. Consider deploying security software to protect endpoints, email servers, and network systems from infection.
  6. Exercise cyber hygiene, such as using caution when opening email attachments and links.
  7. Segment your networks to keep critical computers isolated and to prevent the spread of malware in case of attack. Turn off unneeded network shares.
  8. Turn off admin rights for users who don’t require them. Give users the lowest system permissions they need to do their work.
  9. Restrict write permissions on file servers as much as possible.
  10. Educate yourself, your employees, and your family in best practices to keep malware out of your systems. Update everyone on the latest email phishing scams and human engineering aimed at turning victims into abettors.

Please Tell Us About Your Experiences with Ransomware

Have you endured a ransomware attack or have a strategy to avoid becoming a victim? Please tell us of your experiences in the comments.

The post Ransomware Update: Viruses Targeting Business IT Servers appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Congratulations to Oracle on MySQL 8.0

Post Syndicated from Michael "Monty" Widenius original http://monty-says.blogspot.com/2018/04/congratulations-to-oracle-on-mysql-80.html

Last week, Oracle announced the general availability of MySQL 8.0. This is good news for database users, as it means Oracle is still developing MySQL.

I decide to celebrate the event by doing a quick test of MySQL 8.0. Here follows a step-by-step description of my first experience with MySQL 8.0.
Note that I did the following without reading the release notes, as is what I have done with every MySQL / MariaDB release up to date; In this case it was not the right thing to do.

I pulled MySQL 8.0 from [email protected]:mysql/mysql-server.git
I was pleasantly surprised that ‘cmake . ; make‘ worked without without any compiler warnings! I even checked the used compiler options and noticed that MySQL was compiled with -Wall + several other warning flags. Good job MySQL team!

I did have a little trouble finding the mysqld binary as Oracle had moved it to ‘runtime_output_directory’; Unexpected, but no big thing.

Now it’s was time to install MySQL 8.0.

I did know that MySQL 8.0 has removed mysql_install_db, so I had to use the mysqld binary directly to install the default databases:
(I have specified datadir=/my/data3 in the /tmp/my.cnf file)

> cd runtime_output_directory
> mkdir /my/data3
> ./mysqld –defaults-file=/tmp/my.cnf –install

2018-04-22T12:38:18.332967Z 1 [ERROR] [MY-011011] [Server] Failed to find valid data directory.
2018-04-22T12:38:18.333109Z 0 [ERROR] [MY-010020] [Server] Data Dictionary initialization failed.
2018-04-22T12:38:18.333135Z 0 [ERROR] [MY-010119] [Server] Aborting

A quick look in mysqld –help –verbose output showed that the right command option is –-initialize. My bad, lets try again,

> ./mysqld –defaults-file=/tmp/my.cnf –initialize

2018-04-22T12:39:31.910509Z 0 [ERROR] [MY-010457] [Server] –initialize specified but the data directory has files in it. Aborting.
2018-04-22T12:39:31.910578Z 0 [ERROR] [MY-010119] [Server] Aborting

Now I used the right options, but still didn’t work.
I took a quick look around:

> ls /my/data3/
binlog.index

So even if the mysqld noticed that the data3 directory was wrong, it still wrote things into it.  This even if I didn’t have –log-binlog enabled in the my.cnf file. Strange, but easy to fix:

> rm /my/data3/binlog.index
> ./mysqld –defaults-file=/tmp/my.cnf –initialize

2018-04-22T12:40:45.633637Z 0 [ERROR] [MY-011071] [Server] unknown variable ‘max-tmp-tables=100’
2018-04-22T12:40:45.633657Z 0 [Warning] [MY-010952] [Server] The privilege system failed to initialize correctly. If you have upgraded your server, make sure you’re executing mysql_upgrade to correct the issue.
2018-04-22T12:40:45.633663Z 0 [ERROR] [MY-010119] [Server] Aborting

The warning about the privilege system confused me a bit, but I ignored it for the time being and removed from my configuration files the variables that MySQL 8.0 doesn’t support anymore. I couldn’t find a list of the removed variables anywhere so this was done with the trial and error method.

> ./mysqld –defaults-file=/tmp/my.cnf

2018-04-22T12:42:56.626583Z 0 [ERROR] [MY-010735] [Server] Can’t open the mysql.plugin table. Please run mysql_upgrade to create it.
2018-04-22T12:42:56.827685Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table ‘mysql.gtid_executed’ cannot be opened.
2018-04-22T12:42:56.838501Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2018-04-22T12:42:56.848375Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables
2018-04-22T12:42:56.848863Z 0 [ERROR] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is attached. Therefore, we’re sending the information to the error-log instead: MY-001146 – Table ‘mysql.component’ doesn’t exist
2018-04-22T12:42:56.848916Z 0 [Warning] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is attached. Therefore, we’re sending the information to the error-log instead: MY-003543 – The mysql.component table is missing or has an incorrect definition.
….
2018-04-22T12:42:56.854141Z 0 [System] [MY-010931] [Server] /home/my/mysql-8.0/runtime_output_directory/mysqld: ready for connections. Version: ‘8.0.11’ socket: ‘/tmp/mysql.sock’ port: 3306 Source distribution.

I figured out that if there is a single wrong variable in the configuration file, running mysqld –initialize will leave the database in an inconsistent state. NOT GOOD! I am happy I didn’t try this in a production system!

Time to start over from the beginning:

> rm -r /my/data3/*
> ./mysqld –defaults-file=/tmp/my.cnf –initialize

2018-04-22T12:44:45.548960Z 5 [Note] [MY-010454] [Server] A temporary password is generated for [email protected]: px)NaaSp?6um
2018-04-22T12:44:51.221751Z 0 [System] [MY-013170] [Server] /home/my/mysql-8.0/runtime_output_directory/mysqld (mysqld 8.0.11) initializing of server has completed

Success!

I wonder why the temporary password is so complex; It could easily have been something that one could easily remember without decreasing security, it’s temporary after all. No big deal, one can always paste it from the logs. (Side note: MariaDB uses socket authentication on many system and thus doesn’t need temporary installation passwords).

Now lets start the MySQL server for real to do some testing:

> ./mysqld –defaults-file=/tmp/my.cnf

2018-04-22T12:45:43.683484Z 0 [System] [MY-010931] [Server] /home/my/mysql-8.0/runtime_output_directory/mysqld: ready for connections. Version: ‘8.0.11’ socket: ‘/tmp/mysql.sock’ port: 3306 Source distribution.

And the lets start the client:

> ./client/mysql –socket=/tmp/mysql.sock –user=root –password=”px)NaaSp?6um”
ERROR 2059 (HY000): Plugin caching_sha2_password could not be loaded: /usr/local/mysql/lib/plugin/caching_sha2_password.so: cannot open shared object file: No such file or directory

Apparently MySQL 8.0 doesn’t work with old MySQL / MariaDB clients by default 🙁

I was testing this in a system with MariaDB installed, like all modern Linux system today, and didn’t want to use the MySQL clients or libraries.

I decided to try to fix this by changing the authentication to the native (original) MySQL authentication method.

> mysqld –skip-grant-tables

> ./client/mysql –socket=/tmp/mysql.sock –user=root
ERROR 1045 (28000): Access denied for user ‘root’@’localhost’ (using password: NO)

Apparently –skip-grant-tables is not good enough anymore. Let’s try again with:

> mysqld –skip-grant-tables –default_authentication_plugin=mysql_native_password

> ./client/mysql –socket=/tmp/mysql.sock –user=root mysql
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 8.0.11 Source distribution

Great, we are getting somewhere, now lets fix “root”  to work with the old authenticaion:

MySQL [mysql]> update mysql.user set plugin=”mysql_native_password”,authentication_string=password(“test”) where user=”root”;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘(“test”) where user=”root”‘ at line 1

A quick look in the MySQL 8.0 release notes told me that the PASSWORD() function is removed in 8.0. Why???? I don’t know how one in MySQL 8.0 is supposed to generate passwords compatible with old installations of MySQL. One could of course start an old MySQL or MariaDB version, execute the password() function and copy the result.

I decided to fix this the easy way and use an empty password:

(Update:: I later discovered that the right way would have been to use: FLUSH PRIVILEGES;  ALTER USER’ root’@’localhost’ identified by ‘test’  ; I however dislike this syntax as it has the password in clear text which is easy to grab and the command can’t be used to easily update the mysql.user table. One must also disable the –skip-grant mode to do use this)

MySQL [mysql]> update mysql.user set plugin=”mysql_native_password”,authentication_string=”” where user=”root”;
Query OK, 1 row affected (0.077 sec)
Rows matched: 1 Changed: 1 Warnings: 0
 
I restarted mysqld:
> mysqld –default_authentication_plugin=mysql_native_password

> ./client/mysql –user=root –password=”” mysql
ERROR 1862 (HY000): Your password has expired. To log in you must change it using a client that supports expired passwords.

Ouch, forgot that. Lets try again:

> mysqld –skip-grant-tables –default_authentication_plugin=mysql_native_password

> ./client/mysql –user=root –password=”” mysql
MySQL [mysql]> update mysql.user set password_expired=”N” where user=”root”;

Now restart and test worked:

> ./mysqld –default_authentication_plugin=mysql_native_password

>./client/mysql –user=root –password=”” mysql

Finally I had a working account that I can use to create other users!

When looking at mysqld –help –verbose again. I noticed the option:

–initialize-insecure
Create the default database and exit. Create a super user
with empty password.

I decided to check if this would have made things easier:

> rm -r /my/data3/*
> ./mysqld –defaults-file=/tmp/my.cnf –initialize-insecure

2018-04-22T13:18:06.629548Z 5 [Warning] [MY-010453] [Server] [email protected] is created with an empty password ! Please consider switching off the –initialize-insecure option.

Hm. Don’t understand the warning as–initialize-insecure is not an option that one would use more than one time and thus nothing one would ‘switch off’.

> ./mysqld –defaults-file=/tmp/my.cnf

> ./client/mysql –user=root –password=”” mysql
ERROR 2059 (HY000): Plugin caching_sha2_password could not be loaded: /usr/local/mysql/lib/plugin/caching_sha2_password.so: cannot open shared object file: No such file or directory

Back to the beginning 🙁

To get things to work with old clients, one has to initialize the database with:
> ./mysqld –defaults-file=/tmp/my.cnf –initialize-insecure –default_authentication_plugin=mysql_native_password

Now I finally had MySQL 8.0 up and running and thought I would take it up for a spin by running the “standard” MySQL/MariaDB sql-bench test suite. This was removed in MySQL 5.7, but as I happened to have MariaDB 10.3 installed, I decided to run it from there.

sql-bench is a single threaded benchmark that measures the “raw” speed for some common operations. It gives you the ‘maximum’ performance for a single query. Its different from other benchmarks that measures the maximum throughput when you have a lot of users, but sql-bench still tells you a lot about what kind of performance to expect from the database.

I tried first to be clever and create the “test” database, that I needed for sql-bench, with
> mkdir /my/data3/test

but when I tried to run the benchmark, MySQL 8.0 complained that the test database didn’t exist.

MySQL 8.0 has gone away from the original concept of MySQL where the user can easily
create directories and copy databases into the database directory. This may have serious
implication for anyone doing backup of databases and/or trying to restore a backup with normal OS commands.

I created the ‘test’ database with mysqladmin and then tried to run sql-bench:

> ./run-all-tests –user=root

The first run failed in test-ATIS:

Can’t execute command ‘create table class_of_service (class_code char(2) NOT NULL,rank tinyint(2) NOT NULL,class_description char(80) NOT NULL,PRIMARY KEY (class_code))’
Error: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘rank tinyint(2) NOT NULL,class_description char(80) NOT NULL,PRIMARY KEY (class_’ at line 1

This happened because ‘rank‘ is now a reserved word in MySQL 8.0. This is also reserved in ANSI SQL, but I don’t know of any other database that has failed to run test-ATIS before. I have in the past run it against Oracle, PostgreSQL, Mimer, MSSQL etc without any problems.

MariaDB also has ‘rank’ as a keyword in 10.2 and 10.3 but one can still use it as an identifier.

I fixed test-ATIS and then managed to run all tests on MySQL 8.0.

I did run the test both with MySQL 8.0 and MariaDB 10.3 with the InnoDB storage engine and by having identical values for all InnoDB variables, table-definition-cache and table-open-cache. I turned off performance schema for both databases. All test are run with a user with an empty password (to keep things comparable and because it’s was too complex to generate a password in MySQL 8.0)

The result are as follows
Results per test in seconds:

Operation         |MariaDB|MySQL-8|

———————————–
ATIS              | 153.00| 228.00|
alter-table       |  92.00| 792.00|
big-tables        | 990.00|2079.00|
connect           | 186.00| 227.00|
create            | 575.00|4465.00|
insert            |4552.00|8458.00|
select            | 333.00| 412.00|
table-elimination |1900.00|3916.00|
wisconsin         | 272.00| 590.00|
———————————–

This is of course just a first view of the performance of MySQL 8.0 in a single user environment. Some reflections about the results:

  • Alter-table test is slower (as expected) in 8.0 as some of the alter tests benefits of the instant add column in MariaDB 10.3.
  • connect test is also better for MariaDB as we put a lot of efforts to speed this up in MariaDB 10.2
  • table-elimination shows an optimization in MariaDB for the  Anchor table model, which MySQL doesn’t have.
  • CREATE and DROP TABLE is almost 8 times slower in MySQL 8.0 than in MariaDB 10.3. I assume this is the cost of ‘atomic DDL’. This may also cause performance problems for any thread using the data dictionary when another thread is creating/dropping tables.
  • When looking at the individual test results, MySQL 8.0 was slower in almost every test, in many significantly slower.
  • The only test where MySQL was faster was “update_with_key_prefix”. I checked this and noticed that there was a bug in the test and the columns was updated to it’s original value (which should be instant with any storage engine). This is an old bug that MySQL has found and fixed and that we have not been aware of in the test or in MariaDB.
  • While writing this, I noticed that MySQL 8.0 is now using utf8mb4 as the default character set instead of latin1. This may affect some of the benchmarks slightly (not much as most tests works with numbers and Oracle claims that utf8mb4 is only 20% slower than latin1), but needs to be verified.
  • Oracle claims that MySQL 8.0 is much faster on multi user benchmarks. The above test indicates that they may have done this by sacrificing single user performance.
  •  We need to do more and many different benchmarks to better understand exactly what is going on. Stay tuned!

Short summary of my first run with MySQL 8.0:

  • Using the new caching_sha2_password authentication as default for new installation is likely to cause a lot of problems for users. No old application will be able to use MySQL 8.0, installed with default options, without moving to MySQL’s client libraries. While working on this blog I saw MySQL users complain on IRC that not even MySQL Workbench can authenticate with MySQL 8.0. This is the first time in MySQL’s history where such an incompatible change has ever been done!
  • Atomic DDL is a good thing (We plan to have this in MariaDB 10.4), but it should not have such a drastic impact on performance. I am also a bit skeptical of MySQL 8.0 having just one copy of the data dictionary as if this gets corrupted you will lose all your data. (Single point of failure)
  • MySQL 8.0 has several new reserved words and has removed a lot of variables, which makes upgrades hard. Before upgrading to MySQL 8.0 one has to check all one’s databases and applications to ensure that there are no conflicts.
  • As my test above shows, if you have a single deprecated variable in your configuration files, the installation of MySQL will abort and can leave the database in inconsistent state. I did of course my tests by installing into an empty data dictionary, but one can assume that some of the problems may also happen when upgrading an old installation.

Conclusions:
In many ways, MySQL 8.0 has caught up with some earlier versions of MariaDB. For instance, in MariaDB 10.0, we introduced roles (four years ago). In MariaDB 10.1, we introduced encrypted redo/undo logs (three years ago). In MariaDB 10.2, we introduced window functions and CTEs (a year ago). However, some catch-up of MariaDB Server 10.2 features still remains for MySQL (such as check constraints, binlog compression, and log-based rollback).

MySQL 8.0 has a few new interesting features (mostly Atomic DDL and JSON TABLE functions), but at the same time MySQL has strayed away from some of the fundamental corner stone principles of MySQL:

From the start of the first version of MySQL in 1995, all development has been focused around 3 core principles:

  • Ease of use
  • Performance
  • Stability

With MySQL 8.0, Oracle has sacrifices 2 of 3 of these.

In addition (as part of ease of use), while I was working on MySQL, we did our best to ensure that the following should hold:

  • Upgrades should be trivial
  • Things should be kept compatible, if possible (don’t remove features/options/functions that are used)
  • Minimize reserved words, don’t remove server variables
  • One should be able to use normal OS commands to create and drop databases, copy and move tables around within the same system or between different systems. With 8.0 and data dictionary taking backups of specific tables will be hard, even if the server is not running.
  • mysqldump should always be usable backups and to move to new releases
  • Old clients and application should be able to use ‘any’ MySQL server version unchanged. (Some Oracle client libraries, like C++, by default only supports the new X protocol and can thus not be used with older MySQL or any MariaDB version)

We plan to add a data dictionary to MariaDB 10.4 or MariaDB 10.5, but in a way to not sacrifice any of the above principles!

The competition between MySQL and MariaDB is not just about a tactical arms race on features. It’s about design philosophy, or strategic vision, if you will.

This shows in two main ways: our respective view of the Storage Engine structure, and of the top-level direction of the roadmap.

On the Storage Engine side, MySQL is converging on InnoDB, even for clustering and partitioning. In doing so, they are abandoning the advantages of multiple ways of storing data. By contrast, MariaDB sees lots of value in the Storage Engine architecture: MariaDB Server 10.3 will see the general availability of MyRocks (for write-intensive workloads) and Spider (for scalable workloads). On top of that, we have ColumnStore for analytical workloads. One can use the CONNECT engine to join with other databases. The use of different storage engines for different workloads and different hardware is a competitive differentiator, now more than ever.

On the roadmap side, MySQL is carefully steering clear of features that close the gap between MySQL and Oracle. MariaDB has no such constraints. With MariaDB 10.3, we are introducing PL/SQL compatibility (Oracle’s stored procedures) and AS OF (built-in system versioned tables with point-in-time querying). For both of those features, MariaDB is the first Open Source database doing so. I don’t except Oracle to provide any of the above features in MySQL!

Also on the roadmap side, MySQL is not working with the ecosystem in extending the functionality. In 2017, MariaDB accepted more code contributions in one year, than MySQL has done during its entire lifetime, and the rate is increasing!

I am sure that the experience I had with testing MySQL 8.0 would have been significantly better if MySQL would have an open development model where the community could easily participate in developing and testing MySQL continuously. Most of the confusing error messages and strange behavior would have been found and fixed long before the GA release.

Before upgrading to MySQL 8.0 please read https://dev.mysql.com/doc/refman/8.0/en/upgrading-from-previous-series.html to see what problems you can run into! Don’t expect that old installations or applications will work out of the box without testing as a lot of features and options has been removed (query cache, partition of myisam tables etc)! You probably also have to revise your backup methods, especially if you want to ever restore just a few tables. (With 8.0, I don’t know how this can be easily done).

According to the MySQL 8.0 release notes, one can’t use mysqldump to copy a database to MySQL 8.0. One has to first to move to a MySQL 5.7 GA version (with mysqldump, as recommended by Oracle) and then to MySQL 8.0 with in-place update. I assume this means that all old mysqldump backups are useless for MySQL 8.0?

MySQL 8.0 seams to be a one way street to an unknown future. Up to MySQL 5.7 it has been trivial to move to MariaDB and one could always move back to MySQL with mysqldump. All MySQL client libraries has worked with MariaDB and all MariaDB client libraries has worked with MySQL. With MySQL 8.0 this has changed in the wrong direction.

As long as you are using MySQL 5.7 and below you have choices for your future, after MySQL 8.0 you have very little choice. But don’t despair, as MariaDB will always be able to load a mysqldump file and it’s very easy to upgrade your old MySQL installation to MariaDB 🙂

I wish you good luck to try MySQL 8.0 (and also the upcoming MariaDB 10.3)!

OMG The Stupid It Burns

Post Syndicated from Robert Graham original https://blog.erratasec.com/2018/04/omg-stupid-it-burns.html

This article, pointed out by @TheGrugq, is stupid enough that it’s worth rebutting.

The article starts with the question “Why did the lessons of Stuxnet, Wannacry, Heartbleed and Shamoon go unheeded?“. It then proceeds to ignore the lessons of those things.
Some of the actual lessons should be things like how Stuxnet crossed air gaps, how Wannacry spread through flat Windows networking, how Heartbleed comes from technical debt, and how Shamoon furthers state aims by causing damage.
But this article doesn’t cover the technical lessons. Instead, it thinks the lesson should be the moral lesson, that we should take these things more seriously. But that’s stupid. It’s the sort of lesson people teach you that know nothing about the topic. When you have nothing of value to contribute to a topic you can always take the moral high road and criticize everyone for being morally weak for not taking it more seriously. Obviously, since doctors haven’t cured cancer yet, it’s because they don’t take the problem seriously.
The article continues to ignore the lesson of these cyber attacks and instead regales us with a list of military lessons from WW I and WW II. This makes the same flaw that many in the military make, trying to understand cyber through analogies with the real world. It’s not that such lessons could have no value, it’s that this article contains a poor list of them. It seems to consist of a random list of events that appeal to the author rather than events that have bearing on cybersecurity.
Then, in case we don’t get the point, the article bullies us with hyperbole, cliches, buzzwords, bombastic language, famous quotes, and citations. It’s hard to see how most of them actually apply to the text. Rather, it seems like they are included simply because he really really likes them.
The article invests much effort in discussing the buzzword “OODA loop”. Most attacks in cyberspace don’t have one. Instead, attackers flail around, trying lots of random things, overcoming defense with brute-force rather than an understanding of what’s going on. That’s obviously the case with Wannacry: it was an accident, with the perpetrator experimenting with what would happen if they added the ETERNALBLUE exploit to their existing ransomware code. The consequence was beyond anybody’s ability to predict.
You might claim that this is just the first stage, that they’ll loop around, observe Wannacry’s effects, orient themselves, decide, then act upon what they learned. Nope. Wannacry burned the exploit. It’s essentially removed any vulnerable systems from the public Internet, thereby making it impossible to use what they learned. It’s still active a year later, with infected systems behind firewalls busily scanning the Internet so that if you put a new system online that’s vulnerable, it’ll be taken offline within a few hours, before any other evildoer can take advantage of it.
See what I’m doing here? Learning the actual lessons of things like Wannacry? The thing the above article fails to do??
The article has a humorous paragraph on “defense in depth”, misunderstanding the term. To be fair, it’s the cybersecurity industry’s fault: they adopted then redefined the term. That’s why there’s two separate articles on Wikipedia: one for the old military term (as used in this article) and one for the new cybersecurity term.
As used in the cybersecurity industry, “defense in depth” means having multiple layers of security. Many organizations put all their defensive efforts on the perimeter, and none inside a network. The idea of “defense in depth” is to put more defenses inside the network. For example, instead of just one firewall at the edge of the network, put firewalls inside the network to segment different subnetworks from each other, so that a ransomware infection in the customer support computers doesn’t spread to sales and marketing computers.
The article talks about exploiting WiFi chips to bypass the defense in depth measures like browser sandboxes. This is conflating different types of attacks. A WiFi attack is usually considered a local attack, from somebody next to you in bar, rather than a remote attack from a server in Russia. Moreover, far from disproving “defense in depth” such WiFi attacks highlight the need for it. Namely, phones need to be designed so that successful exploitation of other microprocessors (namely, the WiFi, Bluetooth, and cellular baseband chips) can’t directly compromise the host system. In other words, once exploited with “Broadpwn”, a hacker would need to extend the exploit chain with another vulnerability in the hosts Broadcom WiFi driver rather than immediately exploiting a DMA attack across PCIe. This suggests that if PCIe is used to interface to peripherals in the phone that an IOMMU be used, for “defense in depth”.
Cybersecurity is a young field. There are lots of useful things that outsider non-techies can teach us. Lessons from military history would be well-received.
But that’s not this story. Instead, this story is by an outsider telling us we don’t know what we are doing, that they do, and then proceeds to prove they don’t know what they are doing. Their argument is based on a moral suasion and bullying us with what appears on the surface to be intellectual rigor, but which is in fact devoid of anything smart.
My fear, here, is that I’m going to be in a meeting where somebody has read this pretentious garbage, explaining to me why “defense in depth” is wrong and how we need to OODA faster. I’d rather nip this in the bud, pointing out if you found anything interesting from that article, you are wrong.

How to retain system tables’ data spanning multiple Amazon Redshift clusters and run cross-cluster diagnostic queries

Post Syndicated from Karthik Sonti original https://aws.amazon.com/blogs/big-data/how-to-retain-system-tables-data-spanning-multiple-amazon-redshift-clusters-and-run-cross-cluster-diagnostic-queries/

Amazon Redshift is a data warehouse service that logs the history of the system in STL log tables. The STL log tables manage disk space by retaining only two to five days of log history, depending on log usage and available disk space.

To retain STL tables’ data for an extended period, you usually have to create a replica table for every system table. Then, for each you load the data from the system table into the replica at regular intervals. By maintaining replica tables for STL tables, you can run diagnostic queries on historical data from the STL tables. You then can derive insights from query execution times, query plans, and disk-spill patterns, and make better cluster-sizing decisions. However, refreshing replica tables with live data from STL tables at regular intervals requires schedulers such as Cron or AWS Data Pipeline. Also, these tables are specific to one cluster and they are not accessible after the cluster is terminated. This is especially true for transient Amazon Redshift clusters that last for only a finite period of ad hoc query execution.

In this blog post, I present a solution that exports system tables from multiple Amazon Redshift clusters into an Amazon S3 bucket. This solution is serverless, and you can schedule it as frequently as every five minutes. The AWS CloudFormation deployment template that I provide automates the solution setup in your environment. The system tables’ data in the Amazon S3 bucket is partitioned by cluster name and query execution date to enable efficient joins in cross-cluster diagnostic queries.

I also provide another CloudFormation template later in this post. This second template helps to automate the creation of tables in the AWS Glue Data Catalog for the system tables’ data stored in Amazon S3. After the system tables are exported to Amazon S3, you can run cross-cluster diagnostic queries on the system tables’ data and derive insights about query executions in each Amazon Redshift cluster. You can do this using Amazon QuickSight, Amazon Athena, Amazon EMR, or Amazon Redshift Spectrum.

You can find all the code examples in this post, including the CloudFormation templates, AWS Glue extract, transform, and load (ETL) scripts, and the resolution steps for common errors you might encounter in this GitHub repository.

Solution overview

The solution in this post uses AWS Glue to export system tables’ log data from Amazon Redshift clusters into Amazon S3. The AWS Glue ETL jobs are invoked at a scheduled interval by AWS Lambda. AWS Systems Manager, which provides secure, hierarchical storage for configuration data management and secrets management, maintains the details of Amazon Redshift clusters for which the solution is enabled. The last-fetched time stamp values for the respective cluster-table combination are maintained in an Amazon DynamoDB table.

The following diagram covers the key steps involved in this solution.

The solution as illustrated in the preceding diagram flows like this:

  1. The Lambda function, invoke_rs_stl_export_etl, is triggered at regular intervals, as controlled by Amazon CloudWatch. It’s triggered to look up the AWS Systems Manager parameter store to get the details of the Amazon Redshift clusters for which the system table export is enabled.
  2. The same Lambda function, based on the Amazon Redshift cluster details obtained in step 1, invokes the AWS Glue ETL job designated for the Amazon Redshift cluster. If an ETL job for the cluster is not found, the Lambda function creates one.
  3. The ETL job invoked for the Amazon Redshift cluster gets the cluster credentials from the parameter store. It gets from the DynamoDB table the last exported time stamp of when each of the system tables was exported from the respective Amazon Redshift cluster.
  4. The ETL job unloads the system tables’ data from the Amazon Redshift cluster into an Amazon S3 bucket.
  5. The ETL job updates the DynamoDB table with the last exported time stamp value for each system table exported from the Amazon Redshift cluster.
  6. The Amazon Redshift cluster system tables’ data is available in Amazon S3 and is partitioned by cluster name and date for running cross-cluster diagnostic queries.

Understanding the configuration data

This solution uses AWS Systems Manager parameter store to store the Amazon Redshift cluster credentials securely. The parameter store also securely stores other configuration information that the AWS Glue ETL job needs for extracting and storing system tables’ data in Amazon S3. Systems Manager comes with a default AWS Key Management Service (AWS KMS) key that it uses to encrypt the password component of the Amazon Redshift cluster credentials.

The following table explains the global parameters and cluster-specific parameters required in this solution. The global parameters are defined once and applicable at the overall solution level. The cluster-specific parameters are specific to an Amazon Redshift cluster and repeat for each cluster for which you enable this post’s solution. The CloudFormation template explained later in this post creates these parameters as part of the deployment process.

Parameter nameTypeDescription
Global parametersdefined once and applied to all jobs
redshift_query_logs.global.s3_prefixStringThe Amazon S3 path where the query logs are exported. Under this path, each exported table is partitioned by cluster name and date.
redshift_query_logs.global.tempdirStringThe Amazon S3 path that AWS Glue ETL jobs use for temporarily staging the data.
redshift_query_logs.global.role>StringThe name of the role that the AWS Glue ETL jobs assume. Just the role name is sufficient. The complete Amazon Resource Name (ARN) is not required.
redshift_query_logs.global.enabled_cluster_listStringListA comma-separated list of cluster names for which system tables’ data export is enabled. This gives flexibility for a user to exclude certain clusters.
Cluster-specific parametersfor each cluster specified in the enabled_cluster_list parameter
redshift_query_logs.<<cluster_name>>.connectionStringThe name of the AWS Glue Data Catalog connection to the Amazon Redshift cluster. For example, if the cluster name is product_warehouse, the entry is redshift_query_logs.product_warehouse.connection.
redshift_query_logs.<<cluster_name>>.userStringThe user name that AWS Glue uses to connect to the Amazon Redshift cluster.
redshift_query_logs.<<cluster_name>>.passwordSecure StringThe password that AWS Glue uses to connect the Amazon Redshift cluster’s encrypted-by key that is managed in AWS KMS.

For example, suppose that you have two Amazon Redshift clusters, product-warehouse and category-management, for which the solution described in this post is enabled. In this case, the parameters shown in the following screenshot are created by the solution deployment CloudFormation template in the AWS Systems Manager parameter store.

Solution deployment

To make it easier for you to get started, I created a CloudFormation template that automatically configures and deploys the solution—only one step is required after deployment.

Prerequisites

To deploy the solution, you must have one or more Amazon Redshift clusters in a private subnet. This subnet must have a network address translation (NAT) gateway or a NAT instance configured, and also a security group with a self-referencing inbound rule for all TCP ports. For more information about why AWS Glue ETL needs the configuration it does, described previously, see Connecting to a JDBC Data Store in a VPC in the AWS Glue documentation.

To start the deployment, launch the CloudFormation template:

CloudFormation stack parameters

The following table lists and describes the parameters for deploying the solution to export query logs from multiple Amazon Redshift clusters.

PropertyDefaultDescription
S3BucketmybucketThe bucket this solution uses to store the exported query logs, stage code artifacts, and perform unloads from Amazon Redshift. For example, the mybucket/extract_rs_logs/data bucket is used for storing all the exported query logs for each system table partitioned by the cluster. The mybucket/extract_rs_logs/temp/ bucket is used for temporarily staging the unloaded data from Amazon Redshift. The mybucket/extract_rs_logs/code bucket is used for storing all the code artifacts required for Lambda and the AWS Glue ETL jobs.
ExportEnabledRedshiftClustersRequires InputA comma-separated list of cluster names from which the system table logs need to be exported.
DataStoreSecurityGroupsRequires InputA list of security groups with an inbound rule to the Amazon Redshift clusters provided in the parameter, ExportEnabledClusters. These security groups should also have a self-referencing inbound rule on all TCP ports, as explained on Connecting to a JDBC Data Store in a VPC.

After you launch the template and create the stack, you see that the following resources have been created:

  1. AWS Glue connections for each Amazon Redshift cluster you provided in the CloudFormation stack parameter, ExportEnabledRedshiftClusters.
  2. All parameters required for this solution created in the parameter store.
  3. The Lambda function that invokes the AWS Glue ETL jobs for each configured Amazon Redshift cluster at a regular interval of five minutes.
  4. The DynamoDB table that captures the last exported time stamps for each exported cluster-table combination.
  5. The AWS Glue ETL jobs to export query logs from each Amazon Redshift cluster provided in the CloudFormation stack parameter, ExportEnabledRedshiftClusters.
  6. The IAM roles and policies required for the Lambda function and AWS Glue ETL jobs.

After the deployment

For each Amazon Redshift cluster for which you enabled the solution through the CloudFormation stack parameter, ExportEnabledRedshiftClusters, the automated deployment includes temporary credentials that you must update after the deployment:

  1. Go to the parameter store.
  2. Note the parameters <<cluster_name>>.user and redshift_query_logs.<<cluster_name>>.password that correspond to each Amazon Redshift cluster for which you enabled this solution. Edit these parameters to replace the placeholder values with the right credentials.

For example, if product-warehouse is one of the clusters for which you enabled system table export, you edit these two parameters with the right user name and password and choose Save parameter.

Querying the exported system tables

Within a few minutes after the solution deployment, you should see Amazon Redshift query logs being exported to the Amazon S3 location, <<S3Bucket_you_provided>>/extract_redshift_query_logs/data/. In that bucket, you should see the eight system tables partitioned by customer name and date: stl_alert_event_log, stl_dlltext, stl_explain, stl_query, stl_querytext, stl_scan, stl_utilitytext, and stl_wlm_query.

To run cross-cluster diagnostic queries on the exported system tables, create external tables in the AWS Glue Data Catalog. To make it easier for you to get started, I provide a CloudFormation template that creates an AWS Glue crawler, which crawls the exported system tables stored in Amazon S3 and builds the external tables in the AWS Glue Data Catalog.

Launch this CloudFormation template to create external tables that correspond to the Amazon Redshift system tables. S3Bucket is the only input parameter required for this stack deployment. Provide the same Amazon S3 bucket name where the system tables’ data is being exported. After you successfully create the stack, you can see the eight tables in the database, redshift_query_logs_db, as shown in the following screenshot.

Now, navigate to the Athena console to run cross-cluster diagnostic queries. The following screenshot shows a diagnostic query executed in Athena that retrieves query alerts logged across multiple Amazon Redshift clusters.

You can build the following example Amazon QuickSight dashboard by running cross-cluster diagnostic queries on Athena to identify the hourly query count and the key query alert events across multiple Amazon Redshift clusters.

How to extend the solution

You can extend this post’s solution in two ways:

  • Add any new Amazon Redshift clusters that you spin up after you deploy the solution.
  • Add other system tables or custom query results to the list of exports from an Amazon Redshift cluster.

Extend the solution to other Amazon Redshift clusters

To extend the solution to more Amazon Redshift clusters, add the three cluster-specific parameters in the AWS Systems Manager parameter store following the guidelines earlier in this post. Modify the redshift_query_logs.global.enabled_cluster_list parameter to append the new cluster to the comma-separated string.

Extend the solution to add other tables or custom queries to an Amazon Redshift cluster

The current solution ships with the export functionality for the following Amazon Redshift system tables:

  • stl_alert_event_log
  • stl_dlltext
  • stl_explain
  • stl_query
  • stl_querytext
  • stl_scan
  • stl_utilitytext
  • stl_wlm_query

You can easily add another system table or custom query by adding a few lines of code to the AWS Glue ETL job, <<cluster-name>_extract_rs_query_logs. For example, suppose that from the product-warehouse Amazon Redshift cluster you want to export orders greater than $2,000. To do so, add the following five lines of code to the AWS Glue ETL job product-warehouse_extract_rs_query_logs, where product-warehouse is your cluster name:

  1. Get the last-processed time-stamp value. The function creates a value if it doesn’t already exist.

salesLastProcessTSValue = functions.getLastProcessedTSValue(trackingEntry=”mydb.sales_2000",job_configs=job_configs)

  1. Run the custom query with the time stamp.

returnDF=functions.runQuery(query="select * from sales s join order o where o.order_amnt > 2000 and sale_timestamp > '{}'".format (salesLastProcessTSValue) ,tableName="mydb.sales_2000",job_configs=job_configs)

  1. Save the results to Amazon S3.

functions.saveToS3(dataframe=returnDF,s3Prefix=s3Prefix,tableName="mydb.sales_2000",partitionColumns=["sale_date"],job_configs=job_configs)

  1. Get the latest time-stamp value from the returned data frame in Step 2.

latestTimestampVal=functions.getMaxValue(returnDF,"sale_timestamp",job_configs)

  1. Update the last-processed time-stamp value in the DynamoDB table.

functions.updateLastProcessedTSValue(“mydb.sales_2000",latestTimestampVal[0],job_configs)

Conclusion

In this post, I demonstrate a serverless solution to retain the system tables’ log data across multiple Amazon Redshift clusters. By using this solution, you can incrementally export the data from system tables into Amazon S3. By performing this export, you can build cross-cluster diagnostic queries, build audit dashboards, and derive insights into capacity planning by using services such as Athena. I also demonstrate how you can extend this solution to other ad hoc query use cases or tables other than system tables by adding a few lines of code.


Additional Reading

If you found this post useful, be sure to check out Using Amazon Redshift Spectrum, Amazon Athena, and AWS Glue with Node.js in Production and Amazon Redshift – 2017 Recap.


About the Author

Karthik Sonti is a senior big data architect at Amazon Web Services. He helps AWS customers build big data and analytical solutions and provides guidance on architecture and best practices.