Tag Archives: blog

Friday Squid Blogging: Squid Prices Rise as Catch Decreases

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/04/friday_squid_bl_621.html

In Japan:

Last year’s haul sank 15% to 53,000 tons, according to the JF Zengyoren national federation of fishing cooperatives. The squid catch has fallen by half in just two years. The previous low was plumbed in 2016.

Lighter catches have been blamed on changing sea temperatures, which impedes the spawning and growth of the squid. Critics have also pointed to overfishing by North Korean and Chinese fishing boats.

Wholesale prices of flying squid have climbed as a result. Last year’s average price per kilogram came to 564 yen, a roughly 80% increase from two years earlier, according to JF Zengyoren.

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

Read my blog posting guidelines here.

Welcome Victoria — Sales Development Representative

Post Syndicated from Yev original https://www.backblaze.com/blog/welcome-victoria-sales-development-representative/

Ever since we introduced our Groups feature, Backblaze for Business has been growing at a rapid rate! We’ve been staffing up in order to support the product and the newest addition to the sales team, Victoria, joins us as a Sales Development Representative! Let’s learn a bit more about Victoria, shall we?

What is your Backblaze Title?
Sales Development Representative.

Where are you originally from?
Harrisburg, North Carolina.

What attracted you to Backblaze?
The leaders and family-style culture.

What do you expect to learn while being at Backblaze?
How to sell, sell, sell!

Where else have you worked?
The North Carolina Autism Society, an ophthalmologist’s office, home health care, and another tech startup.

Where did you go to school?
The University of North Carolina Chapel Hill and Duke University’s Fuqua School of Business.

What’s your dream job?
Fighter pilot, professional snowboarder or killer whale trainer.

Favorite place you’ve traveled?
Hawaii and Banff.

Favorite hobby?
Basketball and cars.

Of what achievement are you most proud?
Missionary work and helping patients feel better.

Star Trek or Star Wars?
Neither, but probably Star Wars.

Coke or Pepsi?
Neither, bubble tea.

Favorite food?
Snow crab legs.

Why do you like certain things?
Because God made me that way.

Anything else you’d like you’d like to tell us?
I’m a germophobe, drink a lot of water and unfortunately, am introverted.

Being on the phones all day is a good way to build up those extroversion skills! Welcome to the team and we hope you enjoy learning how to sell, sell, sell!

The post Welcome Victoria — Sales Development Representative appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Cloudflare Kicks Out Torrent Site For Abuse Reporting Interference

Post Syndicated from Ernesto original https://torrentfreak.com/cloudflare-kicks-out-torrent-site-for-abuse-reporting-interference-180420/

As one of the leading CDN and DDoS protection services, Cloudflare is used by millions of websites across the globe.

The company’s clients include billion dollar companies and national governments, but also personal blogs, and even pirate sites.

Copyright holders are not happy with the latter category and are pressuring Cloudflare to cut their ties with sites like The Pirate Bay, both in and out of court.

Cloudflare, however, maintains that it’s a neutral service provider. They forward copyright infringement notices to their customers, for example, but deny any liability for these sites.

Generally speaking, the company only disconnects a customer in response to a court order, as it did with Sci-Hub earlier this year. That’s why it came as a surprise when the anime torrent site NYAA.si was disconnected this week.

The site, which is a replacement for the original NYAA, has millions of users and is particularly popular in Japan. Without prior warning, it became unavailable for several hours this week, after Cloudflare removed it from its services. So what happened?

TorrentFreak spoke to the operator who said that the exact reason for the termination remains a mystery to him. He reached out to Cloudflare looking for answers, but the comany simply stated that it’s about “avoiding measures taken to avoid abuse complaints,” as can be seen below.

One of Cloudflare’s messages

The operator says he hasn’t done anything out of the ordinary and showed his willingness to resolve any possible issues. However, that hasn’t changed Cloudflare’s stance.

“We asked multiple times for clarification. We also expressed that we were willing to attempt to work with them on whatever the problem actually was, if they would explain what they even mean.

“Naturally, I have been stonewalled by them at every stage. I’ve contacted numerous persons at Cloudflare and nobody will talk about this,” NYAA’s operator adds.

TorrentFreak asked Cloudflare for more details and the company confirmed that the matter was related to interference with its abuse reporting systems, without providing further detail.

“We determined that the customer had taken steps specifically intended to interfere with and thwart the operation of our abuse reporting systems,” Cloudflare’s General Counsel Doug Kramer informed us.

Cloudflare’s statement suggests that the site took active steps to interfere with the abuse process. The company added that it can’t go into detail, but says that the reason for the termination was shared with the website owner.

The website owner, on the other hand, informs us that he has no clue what the exact problem is. NYAA.si occasionally swaps IP addresses and have recently set up some mirror domains, but these were all under the same account. So, he has no idea why that would interfere with any abuse reports.

“I’m honestly unsure of what we could have done that ‘circumvents” their abuse system,” NYAA’s operator says, adding that the only abuse reports received were copyright related.

It’s unlikely, however, that copyright takedown notices alone would warrant account termination, as most of the largest torrent sites use Cloudflare.

NYAA’s operator says he can do little more than speculate at the point. Some have hinted at a secret court order while Japan’s recent crackdown on manga and anime piracy also came to mind, all without a grain of evidence of course.

Whatever the reason, NYAA.si now has to move on without Cloudflare, while the mystery remains.

“Frankly, this whole thing is a joke. I don’t understand why they would willingly host much bigger sites like ThePirateBay without any issue, or even ISIS, or the various hacking groups that have used them over time,” the operator says.

If more information about the abuse process interfere becomes available, we’ll definitely follow it up.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Confused About the Hybrid Cloud? You’re Not Alone

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/confused-about-the-hybrid-cloud-youre-not-alone/

Hybrid Cloud. What is it?

Do you have a clear understanding of the hybrid cloud? If you don’t, it’s not surprising.

Hybrid cloud has been applied to a greater and more varied number of IT solutions than almost any other recent data management term. About the only thing that’s clear about the hybrid cloud is that the term hybrid cloud wasn’t invented by customers, but by vendors who wanted to hawk whatever solution du jour they happened to be pushing.

Let’s be honest. We’re in an industry that loves hype. We can’t resist grafting hyper, multi, ultra, and super and other prefixes onto the beginnings of words to entice customers with something new and shiny. The alphabet soup of cloud-related terms can include various options for where the cloud is located (on-premises, off-premises), whether the resources are private or shared in some degree (private, community, public), what type of services are offered (storage, computing), and what type of orchestrating software is used to manage the workflow and the resources. With so many moving parts, it’s no wonder potential users are confused.

Let’s take a step back, try to clear up the misconceptions, and come up with a basic understanding of what the hybrid cloud is. To be clear, this is our viewpoint. Others are free to do what they like, so bear that in mind.

So, What is the Hybrid Cloud?

The hybrid cloud refers to a cloud environment made up of a mixture of on-premises private cloud resources combined with third-party public cloud resources that use some kind of orchestration between them.

To get beyond the hype, let’s start with Forrester Research‘s idea of the hybrid cloud: “One or more public clouds connected to something in my data center. That thing could be a private cloud; that thing could just be traditional data center infrastructure.”

To put it simply, a hybrid cloud is a mash-up of on-premises and off-premises IT resources.

To expand on that a bit, we can say that the hybrid cloud refers to a cloud environment made up of a mixture of on-premises private cloud[1] resources combined with third-party public cloud resources that use some kind of orchestration[2] between them. The advantage of the hybrid cloud model is that it allows workloads and data to move between private and public clouds in a flexible way as demands, needs, and costs change, giving businesses greater flexibility and more options for data deployment and use.

In other words, if you have some IT resources in-house that you are replicating or augmenting with an external vendor, congrats, you have a hybrid cloud!

Private Cloud vs. Public Cloud

The cloud is really just a collection of purpose built servers. In a private cloud, the servers are dedicated to a single tenant or a group of related tenants. In a public cloud, the servers are shared between multiple unrelated tenants (customers). A public cloud is off-site, while a private cloud can be on-site or off-site — or on-prem or off-prem.

As an example, let’s look at a hybrid cloud meant for data storage, a hybrid data cloud. A company might set up a rule that says all accounting files that have not been touched in the last year are automatically moved off-prem to cloud storage to save cost and reduce the amount of storage needed on-site. The files are still available; they are just no longer stored on your local systems. The rules can be defined to fit an organization’s workflow and data retention policies.

The hybrid cloud concept also contains cloud computing. For example, at the end of the quarter, order processing application instances can be spun up off-premises in a hybrid computing cloud as needed to add to on-premises capacity.

Hybrid Cloud Benefits

If we accept that the hybrid cloud combines the best elements of private and public clouds, then the benefits of hybrid cloud solutions are clear, and we can identify the primary two benefits that result from the blending of private and public clouds.

Diagram of the Components of the Hybrid Cloud

Benefit 1: Flexibility and Scalability

Undoubtedly, the primary advantage of the hybrid cloud is its flexibility. It takes time and money to manage in-house IT infrastructure and adding capacity requires advance planning.

The cloud is ready and able to provide IT resources whenever needed on short notice. The term cloud bursting refers to the on-demand and temporary use of the public cloud when demand exceeds resources available in the private cloud. For example, some businesses experience seasonal spikes that can put an extra burden on private clouds. These spikes can be taken up by a public cloud. Demand also can vary with geographic location, events, or other variables. The public cloud provides the elasticity to deal with these and other anticipated and unanticipated IT loads. The alternative would be fixed cost investments in on-premises IT resources that might not be efficiently utilized.

For a data storage user, the on-premises private cloud storage provides, among other benefits, the highest speed access. For data that is not frequently accessed, or needed with the absolute lowest levels of latency, it makes sense for the organization to move it to a location that is secure, but less expensive. The data is still readily available, and the public cloud provides a better platform for sharing the data with specific clients, users, or with the general public.

Benefit 2: Cost Savings

The public cloud component of the hybrid cloud provides cost-effective IT resources without incurring capital expenses and labor costs. IT professionals can determine the best configuration, service provider, and location for each service, thereby cutting costs by matching the resource with the task best suited to it. Services can be easily scaled, redeployed, or reduced when necessary, saving costs through increased efficiency and avoiding unnecessary expenses.

Comparing Private vs Hybrid Cloud Storage Costs

To get an idea of the difference in storage costs between a purely on-premises solutions and one that uses a hybrid of private and public storage, we’ll present two scenarios. For each scenario we’ll use data storage amounts of 100 terabytes, 1 petabyte, and 2 petabytes. Each table is the same format, all we’ve done is change how the data is distributed: private (on-premises) cloud or public (off-premises) cloud. We are using the costs for our own B2 Cloud Storage in this example. The math can be adapted for any set of numbers you wish to use.

Scenario 1    100% of data on-premises storage

Data Stored
Data stored On-Premises: 100% 100 TB 1,000 TB 2,000 TB
On-premises cost range Monthly Cost
Low — $12/TB/Month $1,200 $12,000 $24,000
High — $20/TB/Month $2,000 $20,000 $40,000

Scenario 2    20% of data on-premises with 80% public cloud storage (B2)

Data Stored
Data stored On-Premises: 20% 20 TB 200 TB 400 TB
Data stored in Cloud: 80% 80 TB 800 TB 1,600 TB
On-premises cost range Monthly Cost
Low — $12/TB/Month $240 $2,400 $4,800
High — $20/TB/Month $400 $4,000 $8,000
Public cloud cost range Monthly Cost
Low — $5/TB/Month (B2) $400 $4,000 $8,000
High — $20/TB/Month $1,600 $16,000 $32,000
On-premises + public cloud cost range Monthly Cost
Low $640 $6,400 $12,800
High $2,000 $20,000 $40,000

As can be seen in the numbers above, using a hybrid cloud solution and storing 80% of the data in the cloud with a provider such as Backblaze B2 can result in significant savings over storing only on-premises. For other cost scenarios, see the B2 Cost Calculator.

When Hybrid Might Not Always Be the Right Fit

There are circumstances where the hybrid cloud might not be the best solution. Smaller organizations operating on a tight IT budget might best be served by a purely public cloud solution. The cost of setting up and running private servers is substantial.

An application that requires the highest possible speed might not be suitable for hybrid, depending on the specific cloud implementation. While latency does play a factor in data storage for some users, it is less of a factor for uploading and downloading data than it is for organizations using the hybrid cloud for computing. Because Backblaze recognized the importance of speed and low-latency for customers wishing to use computing on data stored in B2, we directly connected our data centers with those of our computing partners, ensuring that latency would not be an issue even for a hybrid cloud computing solution.

It is essential to have a good understanding of workloads and their essential characteristics in order to make the hybrid cloud work well for you. Each application needs to be examined for the right mix of private cloud, public cloud, and traditional IT resources that fit the particular workload in order to benefit most from a hybrid cloud architecture.

The Hybrid Cloud Can Be a Win-Win Solution

From the high altitude perspective, any solution that enables an organization to respond in a flexible manner to IT demands is a win. Avoiding big upfront capital expenses for in-house IT infrastructure will appeal to the CFO. Being able to quickly spin up IT resources as they’re needed will appeal to the CTO and VP of Operations.

Should You Go Hybrid?

We’ve arrived at the bottom line and the question is, should you or your organization embrace hybrid cloud infrastructures?

According to 451 Research, by 2019, 69% of companies will operate in hybrid cloud environments, and 60% of workloads will be running in some form of hosted cloud service (up from 45% in 2017). That indicates that the benefits of the hybrid cloud appeal to a broad range of companies.

In Two Years, More Than Half of Workloads Will Run in Cloud

Clearly, depending on an organization’s needs, there are advantages to a hybrid solution. While it might have been possible to dismiss the hybrid cloud in the early days of the cloud as nothing more than a buzzword, that’s no longer true. The hybrid cloud has evolved beyond the marketing hype to offer real solutions for an increasingly complex and challenging IT environment.

If an organization approaches the hybrid cloud with sufficient planning and a structured approach, a hybrid cloud can deliver on-demand flexibility, empower legacy systems and applications with new capabilities, and become a catalyst for digital transformation. The result can be an elastic and responsive infrastructure that has the ability to quickly respond to changing demands of the business.

As data management professionals increasingly recognize the advantages of the hybrid cloud, we can expect more and more of them to embrace it as an essential part of their IT strategy.

Tell Us What You’re Doing with the Hybrid Cloud

Are you currently embracing the hybrid cloud, or are you still uncertain or hanging back because you’re satisfied with how things are currently? Maybe you’ve gone totally hybrid. We’d love to hear your comments below on how you’re dealing with the hybrid cloud.


[1] Private cloud can be on-premises or a dedicated off-premises facility.

[2] Hybrid cloud orchestration solutions are often proprietary, vertical, and task dependent.

The post Confused About the Hybrid Cloud? You’re Not Alone appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Audit Trail Overview

Post Syndicated from Bozho original https://techblog.bozho.net/audit-trail-overview/

As part of my current project (secure audit trail) I decided to make a survey about the use of audit trail “in the wild”.

I haven’t written in details about this project of mine (unlike with some other projects). Mostly because it’s commercial and I don’t want to use my blog as a direct promotion channel (though I am doing that at the moment, ironically). But the aim of this post is to shed some light on how audit trail is used.

The survey can be found here. The questions are basically: does your current project have audit trail functionality, and if yes, is it protected from tampering. If not – do you think you should have such functionality.

The results are interesting (although with only around 50 respondents)

So more than half of the systems (on which respondents are working) don’t have audit trail. While audit trail is recommended by information security and related standards, it may not find place in the “busy schedule” of a software project, even though it’s fairly easy to provide a trivial implementation (e.g. I’ve written how to quickly setup one with Hibernate and Spring)

A trivial implementation might do in many cases but if the audit log is critical (e.g. access to sensitive data, performing financial operations etc.), then relying on a trivial implementation might not be enough. In other words – if the sysadmin can access the database and delete or modify the audit trail, then it doesn’t serve much purpose. Hence the next question – how is the audit trail protected from tampering:

And apparently, from the less than 50% of projects with audit trail, around 50% don’t have technical guarantees that the audit trail can’t be tampered with. My guess is it’s more, because people have different understanding of what technical measures are sufficient. E.g. someone may think that digitally signing your log files (or log records) is sufficient, but in fact it isn’t, as whole files (or records) can be deleted (or fully replaced) without a way to detect that. Timestamping can help (and a good audit trail solution should have that), but it doesn’t guarantee the order of events or prevent a malicious actor from deleting or inserting fake ones. And if timestamping is done on a log file level, then any not-yet-timestamped log file is vulnerable to manipulation.

I’ve written about event logs before and their two flavours – event sourcing and audit trail. An event log can effectively be considered audit trail, but you’d need additional security to avoid the problems mentioned above.

So, let’s see what would various levels of security and usefulness of audit logs look like. There are many papers on the topic (e.g. this and this), and they often go into the intricate details of how logging should be implemented. I’ll try to give an overview of the approaches:

  • Regular logs – rely on regular INFO log statements in the production logs to look for hints of what has happened. This may be okay, but is harder to look for evidence (as there is non-auditable data in those log files as well), and it’s not very secure – usually logs are collected (e.g. with graylog) and whoever has access to the log collector’s database (or search engine in the case of Graylog), can manipulate the data and not be caught
  • Designated audit trail – whether it’s stored in the database or in logs files. It has the proper business-event level granularity, but again doesn’t prevent or detect tampering. With lower risk systems that may is perfectly okay.
  • Timestamped logs – whether it’s log files or (harder to implement) database records. Timestamping is good, but if it’s not an external service, a malicious actor can get access to the local timestamping service and issue fake timestamps to either re-timestamp tampered files. Even if the timestamping is not compromised, whole entries can be deleted. The fact that they are missing can sometimes be deduced based on other factors (e.g. hour of rotation), but regularly verifying that is extra effort and may not always be feasible.
  • Hash chaining – each entry (or sequence of log files) could be chained (just as blockchain transactions) – the next one having the hash of the previous one. This is a good solution (whether it’s local, external or 3rd party), but it has the risk of someone modifying or deleting a record, getting your entire chain and re-hashing it. All the checks will pass, but the data will not be correct
  • Hash chaining with anchoring – the head of the chain (the hash of the last entry/block) could be “anchored” to an external service that is outside the capabilities of a malicious actor. Ideally, a public blockchain, alternatively – paper, a public service (twitter), email, etc. That way a malicious actor can’t just rehash the whole chain, because any check against the external service would fail.
  • WORM storage (write once, ready many). You could send your audit logs almost directly to WORM storage, where it’s impossible to replace data. However, that is not ideal, as WORM storage can be slow and expensive. For example AWS Glacier has rather big retrieval times and searching through recent data makes it impractical. It’s actually cheaper than S3, for example, and you can have expiration policies. But having to support your own WORM storage is expensive. It is a good idea to eventually send the logs to WORM storage, but “fresh” audit trail should probably not be “archived” so that it’s searchable and some actionable insight can be gained from it.
  • All-in-one – applying all of the above “just in case” may be unnecessary for every project out there, but that’s what I decided to do at LogSentinel. Business-event granularity with timestamping, hash chaining, anchoring, and eventually putting to WORM storage – I think that provides both security guarantees and flexibility.

I hope the overview is useful and the results from the survey shed some light on how this aspect of information security is underestimated.

The post Audit Trail Overview appeared first on Bozho's tech blog.

New – Registry of Open Data on AWS (RODA)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-registry-of-open-data-on-aws-roda/

Almost a decade ago, my colleague Deepak Singh introduced the AWS Public Datasets in his post Paging Researchers, Analysts, and Developers. I’m happy to report that Deepak is still an important part of the AWS team and that the Public Datasets program is still going strong!

Today we are announcing a new take on open and public data, the Registry of Open Data on AWS, or RODA. This registry includes existing Public Datasets and allows anyone to add their own datasets so that they can be accessed and analyzed on AWS.

Inside the Registry
The home page lists all of the datasets in the registry:

Entering a search term shrinks the list so that only the matching datasets are displayed:

Each dataset has an associated detail page, including usage examples, license info, and the information needed to locate and access the dataset on AWS:

In this case, I can access the data with a simple CLI command:

I could also access it programmatically, or download data to my EC2 instance.

Adding to the Repository
If you have a dataset that is publicly available and would like to add it to RODA , you can simply send us a pull request. Head over to the open-data-registry repo, read the CONTRIBUTING document, and create a YAML file that describes your dataset, using one of the existing files in the datasets directory as a model:

We’ll review pull requests regularly; you can “star” or watch the repo in order to track additions and changes.

Impress Me
I am looking forward to an inrush of new datasets, along with some blog posts and apps that show how to to use the data in powerful and interesting ways. Let me know what you come up with.

Jeff;

 

Colour sensing with a Raspberry Pi

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/colour-sensing-raspberry-pi/

In their latest video and tutorial, Electronic Hub shows you how to detect colour using a Raspberry Pi and a TCS3200 colour sensor.

Raspberry Pi Color Sensor (TCS3200) Interface | Color Detector

A simple Raspberry Pi based project using TCS3200 Color Sensor. The project demonstrates how to interface a Color Sensor (like TCS3200) with Raspberry Pi and implement a simple Color Detector using Raspberry Pi.

What is a TCS3200 colour sensor?

Colour sensors sense reflected light from nearby objects. The bright light of the TCS3200’s on-board white LEDs hits an object’s surface and is reflected back. The sensor has an 8×8 array of photodiodes, which are covered by either a red, blue, green, or clear filter. The type of filter determines what colour a diode can detect. Then the overall colour of an object is determined by how much light of each colour it reflects. (For example, a red object reflects mostly red light.)

Colour sensing with the TCS3200 Color Sensor and a Raspberry Pi

As Electronics Hub explains:

TCS3200 is one of the easily available colour sensors that students and hobbyists can work on. It is basically a light-to-frequency converter, i.e. based on colour and intensity of the light falling on it, the frequency of its output signal varies.

I’ll save you a physics lesson here, but you can find a detailed explanation of colour sensing and the TCS3200 on the Electronics Hub blog.

Raspberry Pi colour sensor

The TCS3200 colour sensor is connected to several of the onboard General Purpose Input Output (GPIO) pins on the Raspberry Pi.

Colour sensing with the TCS3200 Color Sensor and a Raspberry Pi

These connections allow the Raspberry Pi 3 to run one of two Python scripts that Electronics Hub has written for the project. The first displays the RAW RGB values read by the sensor. The second detects the primary colours red, green, and blue, and it can be expanded for more colours with the help of the first script.

Colour sensing with the TCS3200 Color Sensor and a Raspberry Pi

Electronic Hub’s complete build uses a breadboard for simply prototyping

Use it in your projects

This colour sensing setup is a simple means of adding a new dimension to your builds. Why not build a candy-sorting robot that organises your favourite sweets by colour? Or add colour sensing to your line-following buggy to allow for multiple path options!

If your Raspberry Pi project uses colour sensing, we’d love to see it, so be sure to share it in the comments!

The post Colour sensing with a Raspberry Pi appeared first on Raspberry Pi.

Backblaze at NAB 2018 in Las Vegas

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/backblaze-at-nab-2018-in-las-vegas/

Backblaze B2 Cloud Storage NAB Booth

Backblaze just returned from exhibiting at NAB in Las Vegas, April 9-12, where the response to our recent announcements was tremendous. In case you missed the news, Backblaze B2 Cloud Storage continues to extend its lead as the most affordable, high performance cloud on the planet.

Backblaze’s News at NAB

Backblaze at NAB 2018 in Las Vegas

The Backblaze booth just before opening

What We Were Asked at NAB

Our booth was busy from start to finish with attendees interested in learning more about Backblaze and B2 Cloud Storage. Here are the questions we were asked most often in the booth.

Q. How long has Backblaze been in business?
A. The company was founded in 2007. Today, we have over 500 petabytes of data from customers in over 150 countries.

B2 Partners at NAB 2018

Q. Where is your data stored?
A. We have data centers in California and Arizona and expect to expand to Europe by the end of the year.

Q. How can your services be so inexpensive?
A. Backblaze’s goal from the beginning was to offer cloud backup and storage that was easy to use and affordable. All the existing options were simply too expensive to be viable, so we created our own infrastructure. Our purpose-built storage system — the Backblaze’s Storage Pod — is recognized as one of the most cost efficient storage platforms available.

Q. Tell me about your hardware.
A. Backblaze’s Storage Pods hold 60 HDDs each, containing as much as 720TB data per pod, stored using Reed-Solomon error correction. Storage Pods are arranged in Tomes with twenty Storage Pods making up a Vault.

Q. Where do you fit in the data workflow?
A. People typically use B2 in for archiving completed projects. All data is readily available for download from B2, making it more convenient than off-line storage. In addition, DAM and MAM systems such as CatDV, axle ai, Cantemo, and others have integrated with B2 to store raw images behind the proxies.

Q. Who uses B2 in the M&E business?
A. KLRU-TV, the PBS station in Austin Texas, uses B2 to archive their entire 43 year catalog of Austin City Limits episodes and related materials. WunderVu, the production house for Pixvana, uses B2 to back up and archive their local storage systems on which they build virtual reality experiences for their customers.

Q. You’re the company that publishes the hard drive stats, right?
A. Yes, we are!

Backblaze Case Studies and Swag at NAB 2018 in Las Vegas

Were You at NAB?

If you were, we hope you stopped by the Backblaze booth to say hello. We’d like to hear what you saw at the show that was interesting or exciting. Please tell us in the comments.

The post Backblaze at NAB 2018 in Las Vegas appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

postmarketOS Low-Level

Post Syndicated from ris original https://lwn.net/Articles/751951/rss

Alpine Linux-based postmarketOS is touch-optimized and pre-configured for
installation on smartphones and other mobile devices. The postmarketOS
blog introduces
postmarketOS-lowlevel
which is a community project aimed at creating
free bootloaders and cellular modem firmware, currently focused on MediaTek
phones. “But before we get started, please keep in mind that these
are moon shots. So while there is some little progress, it’s mostly about
letting fellow hackers know what we’ve tried and what we’re up to, in the
hopes of attracting more interested talent to our cause. After all, our
philosophy is to keep the community informed and engaged during the
development phase!

Now You Can Create Encrypted Amazon EBS Volumes by Using Your Custom Encryption Keys When You Launch an Amazon EC2 Instance

Post Syndicated from Nishit Nagar original https://aws.amazon.com/blogs/security/create-encrypted-amazon-ebs-volumes-custom-encryption-keys-launch-amazon-ec2-instance-2/

Amazon Elastic Block Store (EBS) offers an encryption solution for your Amazon EBS volumes so you don’t have to build, maintain, and secure your own infrastructure for managing encryption keys for block storage. Amazon EBS encryption uses AWS Key Management Service (AWS KMS) customer master keys (CMKs) when creating encrypted Amazon EBS volumes, providing you all the benefits associated with using AWS KMS. You can specify either an AWS managed CMK or a customer-managed CMK to encrypt your Amazon EBS volume. If you use a customer-managed CMK, you retain granular control over your encryption keys, such as having AWS KMS rotate your CMK every year. To learn more about creating CMKs, see Creating Keys.

In this post, we demonstrate how to create an encrypted Amazon EBS volume using a customer-managed CMK when you launch an EC2 instance from the EC2 console, AWS CLI, and AWS SDK.

Creating an encrypted Amazon EBS volume from the EC2 console

Follow these steps to launch an EC2 instance from the EC2 console with Amazon EBS volumes that are encrypted by customer-managed CMKs:

  1. Sign in to the AWS Management Console and open the EC2 console.
  2. Select Launch instance, and then, in Step 1 of the wizard, select an Amazon Machine Image (AMI).
  3. In Step 2 of the wizard, select an instance type, and then provide additional configuration details in Step 3. For details about configuring your instances, see Launching an Instance.
  4. In Step 4 of the wizard, specify additional EBS volumes that you want to attach to your instances.
  5. To create an encrypted Amazon EBS volume, first add a new volume by selecting Add new volume. Leave the Snapshot column blank.
  6. In the Encrypted column, select your CMK from the drop-down menu. You can also paste the full Amazon Resource Name (ARN) of your custom CMK key ID in this box. To learn more about finding the ARN of a CMK, see Working with Keys.
  7. Select Review and Launch. Your instance will launch with an additional Amazon EBS volume with the key that you selected. To learn more about the launch wizard, see Launching an Instance with Launch Wizard.

Creating Amazon EBS encrypted volumes from the AWS CLI or SDK

You also can use RunInstances to launch an instance with additional encrypted Amazon EBS volumes by setting Encrypted to true and adding kmsKeyID along with the actual key ID in the BlockDeviceMapping object, as shown in the following command:

$> aws ec2 run-instances –image-id ami-b42209de –count 1 –instance-type m4.large –region us-east-1 –block-device-mappings file://mapping.json

In this example, mapping.json describes the properties of the EBS volume that you want to create:


{
"DeviceName": "/dev/sda1",
"Ebs": {
"DeleteOnTermination": true,
"VolumeSize": 100,
"VolumeType": "gp2",
"Encrypted": true,
"kmsKeyID": "arn:aws:kms:us-east-1:012345678910:key/abcd1234-a123-456a-a12b-a123b4cd56ef"
}
}

You can also launch instances with additional encrypted EBS data volumes via an Auto Scaling or Spot Fleet by creating a launch template with the above BlockDeviceMapping. For example:

$> aws ec2 create-launch-template –MyLTName –image-id ami-b42209de –count 1 –instance-type m4.large –region us-east-1 –block-device-mappings file://mapping.json

To learn more about launching an instance with the AWS CLI or SDK, see the AWS CLI Command Reference.

In this blog post, we’ve demonstrated a single-step, streamlined process for creating Amazon EBS volumes that are encrypted under your CMK when you launch your EC2 instance, thereby streamlining your instance launch workflow. To start using this functionality, navigate to the EC2 console.

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the Amazon EC2 forum or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Notes on setting up Raspberry Pi 3 as WiFi hotspot

Post Syndicated from Robert Graham original https://blog.erratasec.com/2018/04/notes-on-setting-up-raspberry-pi-3-as.html

I want to sniff the packets for IoT devices. There are a number of ways of doing this, but one straightforward mechanism is configuring a “Raspberry Pi 3 B” as a WiFi hotspot, then running tcpdump on it to record all the packets that pass through it. Google gives lots of results on how to do this, but they all demand that you have the precise hardware, WiFi hardware, and software that the authors do, so that’s a pain.

I got it working using the instructions here. There are a few additional notes, which is why I’m writing this blogpost, so I remember them.
https://www.raspberrypi.org/documentation/configuration/wireless/access-point.md

I’m using the RPi-3-B and not the RPi-3-B+, and the latest version of Raspbian at the time of this writing, “Raspbian Stretch Lite 2018-3-13”.

Some things didn’t work as described. The first is that it couldn’t find the package “hostapd”. That solution was to run “apt-get update” a second time.

The second problem was error message about the NAT not working when trying to set the masquerade rule. That’s because the ‘upgrade’ updates the kernel, making the running system out-of-date with the files on the disk. The solution to that is make sure you reboot after upgrading.

Thus, what you do at the start is:

apt-get update
apt-get upgrade
apt-get update
shutdown -r now

Then it’s just “apt-get install tcpdump” and start capturing on wlan0. This will get the non-monitor-mode Ethernet frames, which is what I want.

Friday Squid Blogging: Eating Firefly Squid

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/04/friday_squid_bl_620.html

In Tokama, Japan, you can watch the firefly squid catch and eat them in various ways:

“It’s great to eat hotaruika around when the seasons change, which is when people tend to get sick,” said Ryoji Tanaka, an executive at the Toyama prefectural federation of fishing cooperatives. “In addition to popular cooking methods, such as boiling them in salted water, you can also add them to pasta or pizza.”

Now there is a new addition: eating hotaruika raw as sashimi. However, due to reports that parasites have been found in their internal organs, the Health, Labor and Welfare Ministry recommends eating the squid after its internal organs have been removed, or after it has been frozen for at least four days at minus 30 C or lower.

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

Read my blog posting guidelines here.

AWS AppSync – Production-Ready with Six New Features

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-appsync-production-ready-with-six-new-features/

If you build (or want to build) data-driven web and mobile apps and need real-time updates and the ability to work offline, you should take a look at AWS AppSync. Announced in preview form at AWS re:Invent 2017 and described in depth here, AWS AppSync is designed for use in iOS, Android, JavaScript, and React Native apps. AWS AppSync is built around GraphQL, an open, standardized query language that makes it easy for your applications to request the precise data that they need from the cloud.

I’m happy to announce that the preview period is over and that AWS AppSync is now generally available and production-ready, with six new features that will simplify and streamline your application development process:

Console Log Access – You can now see the CloudWatch Logs entries that are created when you test your GraphQL queries, mutations, and subscriptions from within the AWS AppSync Console.

Console Testing with Mock Data – You can now create and use mock context objects in the console for testing purposes.

Subscription Resolvers – You can now create resolvers for AWS AppSync subscription requests, just as you can already do for query and mutate requests.

Batch GraphQL Operations for DynamoDB – You can now make use of DynamoDB’s batch operations (BatchGetItem and BatchWriteItem) across one or more tables. in your resolver functions.

CloudWatch Support – You can now use Amazon CloudWatch Metrics and CloudWatch Logs to monitor calls to the AWS AppSync APIs.

CloudFormation Support – You can now define your schemas, data sources, and resolvers using AWS CloudFormation templates.

A Brief AppSync Review
Before diving in to the new features, let’s review the process of creating an AWS AppSync API, starting from the console. I click Create API to begin:

I enter a name for my API and (for demo purposes) choose to use the Sample schema:

The schema defines a collection of GraphQL object types. Each object type has a set of fields, with optional arguments:

If I was creating an API of my own I would enter my schema at this point. Since I am using the sample, I don’t need to do this. Either way, I click on Create to proceed:

The GraphQL schema type defines the entry points for the operations on the data. All of the data stored on behalf of a particular schema must be accessible using a path that begins at one of these entry points. The console provides me with an endpoint and key for my API:

It also provides me with guidance and a set of fully functional sample apps that I can clone:

When I clicked Create, AWS AppSync created a pair of Amazon DynamoDB tables for me. I can click Data Sources to see them:

I can also see and modify my schema, issue queries, and modify an assortment of settings for my API.

Let’s take a quick look at each new feature…

Console Log Access
The AWS AppSync Console already allows me to issue queries and to see the results, and now provides access to relevant log entries.In order to see the entries, I must enable logs (as detailed below), open up the LOGS, and check the checkbox. Here’s a simple mutation query that adds a new event. I enter the query and click the arrow to test it:

I can click VIEW IN CLOUDWATCH for a more detailed view:

To learn more, read Test and Debug Resolvers.

Console Testing with Mock Data
You can now create a context object in the console where it will be passed to one of your resolvers for testing purposes. I’ll add a testResolver item to my schema:

Then I locate it on the right-hand side of the Schema page and click Attach:

I choose a data source (this is for testing and the actual source will not be accessed), and use the Put item mapping template:

Then I click Select test context, choose Create New Context, assign a name to my test content, and click Save (as you can see, the test context contains the arguments from the query along with values to be returned for each field of the result):

After I save the new Resolver, I click Test to see the request and the response:

Subscription Resolvers
Your AWS AppSync application can monitor changes to any data source using the @aws_subscribe GraphQL schema directive and defining a Subscription type. The AWS AppSync client SDK connects to AWS AppSync using MQTT over Websockets and the application is notified after each mutation. You can now attach resolvers (which convert GraphQL payloads into the protocol needed by the underlying storage system) to your subscription fields and perform authorization checks when clients attempt to connect. This allows you to perform the same fine grained authorization routines across queries, mutations, and subscriptions.

To learn more about this feature, read Real-Time Data.

Batch GraphQL Operations
Your resolvers can now make use of DynamoDB batch operations that span one or more tables in a region. This allows you to use a list of keys in a single query, read records multiple tables, write records in bulk to multiple tables, and conditionally write or delete related records across multiple tables.

In order to use this feature the IAM role that you use to access your tables must grant access to DynamoDB’s BatchGetItem and BatchPutItem functions.

To learn more, read the DynamoDB Batch Resolvers tutorial.

CloudWatch Logs Support
You can now tell AWS AppSync to log API requests to CloudWatch Logs. Click on Settings and Enable logs, then choose the IAM role and the log level:

CloudFormation Support
You can use the following CloudFormation resource types in your templates to define AWS AppSync resources:

AWS::AppSync::GraphQLApi – Defines an AppSync API in terms of a data source (an Amazon Elasticsearch Service domain or a DynamoDB table).

AWS::AppSync::ApiKey – Defines the access key needed to access the data source.

AWS::AppSync::GraphQLSchema – Defines a GraphQL schema.

AWS::AppSync::DataSource – Defines a data source.

AWS::AppSync::Resolver – Defines a resolver by referencing a schema and a data source, and includes a mapping template for requests.

Here’s a simple schema definition in YAML form:

  AppSyncSchema:
    Type: "AWS::AppSync::GraphQLSchema"
    DependsOn:
      - AppSyncGraphQLApi
    Properties:
      ApiId: !GetAtt AppSyncGraphQLApi.ApiId
      Definition: |
        schema {
          query: Query
          mutation: Mutation
        }
        type Query {
          singlePost(id: ID!): Post
          allPosts: [Post]
        }
        type Mutation {
          putPost(id: ID!, title: String!): Post
        }
        type Post {
          id: ID!
          title: String!
        }

Available Now
These new features are available now and you can start using them today! Here are a couple of blog posts and other resources that you might find to be of interest:

Jeff;

 

 

How to retain system tables’ data spanning multiple Amazon Redshift clusters and run cross-cluster diagnostic queries

Post Syndicated from Karthik Sonti original https://aws.amazon.com/blogs/big-data/how-to-retain-system-tables-data-spanning-multiple-amazon-redshift-clusters-and-run-cross-cluster-diagnostic-queries/

Amazon Redshift is a data warehouse service that logs the history of the system in STL log tables. The STL log tables manage disk space by retaining only two to five days of log history, depending on log usage and available disk space.

To retain STL tables’ data for an extended period, you usually have to create a replica table for every system table. Then, for each you load the data from the system table into the replica at regular intervals. By maintaining replica tables for STL tables, you can run diagnostic queries on historical data from the STL tables. You then can derive insights from query execution times, query plans, and disk-spill patterns, and make better cluster-sizing decisions. However, refreshing replica tables with live data from STL tables at regular intervals requires schedulers such as Cron or AWS Data Pipeline. Also, these tables are specific to one cluster and they are not accessible after the cluster is terminated. This is especially true for transient Amazon Redshift clusters that last for only a finite period of ad hoc query execution.

In this blog post, I present a solution that exports system tables from multiple Amazon Redshift clusters into an Amazon S3 bucket. This solution is serverless, and you can schedule it as frequently as every five minutes. The AWS CloudFormation deployment template that I provide automates the solution setup in your environment. The system tables’ data in the Amazon S3 bucket is partitioned by cluster name and query execution date to enable efficient joins in cross-cluster diagnostic queries.

I also provide another CloudFormation template later in this post. This second template helps to automate the creation of tables in the AWS Glue Data Catalog for the system tables’ data stored in Amazon S3. After the system tables are exported to Amazon S3, you can run cross-cluster diagnostic queries on the system tables’ data and derive insights about query executions in each Amazon Redshift cluster. You can do this using Amazon QuickSight, Amazon Athena, Amazon EMR, or Amazon Redshift Spectrum.

You can find all the code examples in this post, including the CloudFormation templates, AWS Glue extract, transform, and load (ETL) scripts, and the resolution steps for common errors you might encounter in this GitHub repository.

Solution overview

The solution in this post uses AWS Glue to export system tables’ log data from Amazon Redshift clusters into Amazon S3. The AWS Glue ETL jobs are invoked at a scheduled interval by AWS Lambda. AWS Systems Manager, which provides secure, hierarchical storage for configuration data management and secrets management, maintains the details of Amazon Redshift clusters for which the solution is enabled. The last-fetched time stamp values for the respective cluster-table combination are maintained in an Amazon DynamoDB table.

The following diagram covers the key steps involved in this solution.

The solution as illustrated in the preceding diagram flows like this:

  1. The Lambda function, invoke_rs_stl_export_etl, is triggered at regular intervals, as controlled by Amazon CloudWatch. It’s triggered to look up the AWS Systems Manager parameter store to get the details of the Amazon Redshift clusters for which the system table export is enabled.
  2. The same Lambda function, based on the Amazon Redshift cluster details obtained in step 1, invokes the AWS Glue ETL job designated for the Amazon Redshift cluster. If an ETL job for the cluster is not found, the Lambda function creates one.
  3. The ETL job invoked for the Amazon Redshift cluster gets the cluster credentials from the parameter store. It gets from the DynamoDB table the last exported time stamp of when each of the system tables was exported from the respective Amazon Redshift cluster.
  4. The ETL job unloads the system tables’ data from the Amazon Redshift cluster into an Amazon S3 bucket.
  5. The ETL job updates the DynamoDB table with the last exported time stamp value for each system table exported from the Amazon Redshift cluster.
  6. The Amazon Redshift cluster system tables’ data is available in Amazon S3 and is partitioned by cluster name and date for running cross-cluster diagnostic queries.

Understanding the configuration data

This solution uses AWS Systems Manager parameter store to store the Amazon Redshift cluster credentials securely. The parameter store also securely stores other configuration information that the AWS Glue ETL job needs for extracting and storing system tables’ data in Amazon S3. Systems Manager comes with a default AWS Key Management Service (AWS KMS) key that it uses to encrypt the password component of the Amazon Redshift cluster credentials.

The following table explains the global parameters and cluster-specific parameters required in this solution. The global parameters are defined once and applicable at the overall solution level. The cluster-specific parameters are specific to an Amazon Redshift cluster and repeat for each cluster for which you enable this post’s solution. The CloudFormation template explained later in this post creates these parameters as part of the deployment process.

Parameter name Type Description
Global parametersdefined once and applied to all jobs
redshift_query_logs.global.s3_prefix String The Amazon S3 path where the query logs are exported. Under this path, each exported table is partitioned by cluster name and date.
redshift_query_logs.global.tempdir String The Amazon S3 path that AWS Glue ETL jobs use for temporarily staging the data.
redshift_query_logs.global.role> String The name of the role that the AWS Glue ETL jobs assume. Just the role name is sufficient. The complete Amazon Resource Name (ARN) is not required.
redshift_query_logs.global.enabled_cluster_list StringList A comma-separated list of cluster names for which system tables’ data export is enabled. This gives flexibility for a user to exclude certain clusters.
Cluster-specific parametersfor each cluster specified in the enabled_cluster_list parameter
redshift_query_logs.<<cluster_name>>.connection String The name of the AWS Glue Data Catalog connection to the Amazon Redshift cluster. For example, if the cluster name is product_warehouse, the entry is redshift_query_logs.product_warehouse.connection.
redshift_query_logs.<<cluster_name>>.user String The user name that AWS Glue uses to connect to the Amazon Redshift cluster.
redshift_query_logs.<<cluster_name>>.password Secure String The password that AWS Glue uses to connect the Amazon Redshift cluster’s encrypted-by key that is managed in AWS KMS.

For example, suppose that you have two Amazon Redshift clusters, product-warehouse and category-management, for which the solution described in this post is enabled. In this case, the parameters shown in the following screenshot are created by the solution deployment CloudFormation template in the AWS Systems Manager parameter store.

Solution deployment

To make it easier for you to get started, I created a CloudFormation template that automatically configures and deploys the solution—only one step is required after deployment.

Prerequisites

To deploy the solution, you must have one or more Amazon Redshift clusters in a private subnet. This subnet must have a network address translation (NAT) gateway or a NAT instance configured, and also a security group with a self-referencing inbound rule for all TCP ports. For more information about why AWS Glue ETL needs the configuration it does, described previously, see Connecting to a JDBC Data Store in a VPC in the AWS Glue documentation.

To start the deployment, launch the CloudFormation template:

CloudFormation stack parameters

The following table lists and describes the parameters for deploying the solution to export query logs from multiple Amazon Redshift clusters.

Property Default Description
S3Bucket mybucket The bucket this solution uses to store the exported query logs, stage code artifacts, and perform unloads from Amazon Redshift. For example, the mybucket/extract_rs_logs/data bucket is used for storing all the exported query logs for each system table partitioned by the cluster. The mybucket/extract_rs_logs/temp/ bucket is used for temporarily staging the unloaded data from Amazon Redshift. The mybucket/extract_rs_logs/code bucket is used for storing all the code artifacts required for Lambda and the AWS Glue ETL jobs.
ExportEnabledRedshiftClusters Requires Input A comma-separated list of cluster names from which the system table logs need to be exported.
DataStoreSecurityGroups Requires Input A list of security groups with an inbound rule to the Amazon Redshift clusters provided in the parameter, ExportEnabledClusters. These security groups should also have a self-referencing inbound rule on all TCP ports, as explained on Connecting to a JDBC Data Store in a VPC.

After you launch the template and create the stack, you see that the following resources have been created:

  1. AWS Glue connections for each Amazon Redshift cluster you provided in the CloudFormation stack parameter, ExportEnabledRedshiftClusters.
  2. All parameters required for this solution created in the parameter store.
  3. The Lambda function that invokes the AWS Glue ETL jobs for each configured Amazon Redshift cluster at a regular interval of five minutes.
  4. The DynamoDB table that captures the last exported time stamps for each exported cluster-table combination.
  5. The AWS Glue ETL jobs to export query logs from each Amazon Redshift cluster provided in the CloudFormation stack parameter, ExportEnabledRedshiftClusters.
  6. The IAM roles and policies required for the Lambda function and AWS Glue ETL jobs.

After the deployment

For each Amazon Redshift cluster for which you enabled the solution through the CloudFormation stack parameter, ExportEnabledRedshiftClusters, the automated deployment includes temporary credentials that you must update after the deployment:

  1. Go to the parameter store.
  2. Note the parameters <<cluster_name>>.user and redshift_query_logs.<<cluster_name>>.password that correspond to each Amazon Redshift cluster for which you enabled this solution. Edit these parameters to replace the placeholder values with the right credentials.

For example, if product-warehouse is one of the clusters for which you enabled system table export, you edit these two parameters with the right user name and password and choose Save parameter.

Querying the exported system tables

Within a few minutes after the solution deployment, you should see Amazon Redshift query logs being exported to the Amazon S3 location, <<S3Bucket_you_provided>>/extract_redshift_query_logs/data/. In that bucket, you should see the eight system tables partitioned by customer name and date: stl_alert_event_log, stl_dlltext, stl_explain, stl_query, stl_querytext, stl_scan, stl_utilitytext, and stl_wlm_query.

To run cross-cluster diagnostic queries on the exported system tables, create external tables in the AWS Glue Data Catalog. To make it easier for you to get started, I provide a CloudFormation template that creates an AWS Glue crawler, which crawls the exported system tables stored in Amazon S3 and builds the external tables in the AWS Glue Data Catalog.

Launch this CloudFormation template to create external tables that correspond to the Amazon Redshift system tables. S3Bucket is the only input parameter required for this stack deployment. Provide the same Amazon S3 bucket name where the system tables’ data is being exported. After you successfully create the stack, you can see the eight tables in the database, redshift_query_logs_db, as shown in the following screenshot.

Now, navigate to the Athena console to run cross-cluster diagnostic queries. The following screenshot shows a diagnostic query executed in Athena that retrieves query alerts logged across multiple Amazon Redshift clusters.

You can build the following example Amazon QuickSight dashboard by running cross-cluster diagnostic queries on Athena to identify the hourly query count and the key query alert events across multiple Amazon Redshift clusters.

How to extend the solution

You can extend this post’s solution in two ways:

  • Add any new Amazon Redshift clusters that you spin up after you deploy the solution.
  • Add other system tables or custom query results to the list of exports from an Amazon Redshift cluster.

Extend the solution to other Amazon Redshift clusters

To extend the solution to more Amazon Redshift clusters, add the three cluster-specific parameters in the AWS Systems Manager parameter store following the guidelines earlier in this post. Modify the redshift_query_logs.global.enabled_cluster_list parameter to append the new cluster to the comma-separated string.

Extend the solution to add other tables or custom queries to an Amazon Redshift cluster

The current solution ships with the export functionality for the following Amazon Redshift system tables:

  • stl_alert_event_log
  • stl_dlltext
  • stl_explain
  • stl_query
  • stl_querytext
  • stl_scan
  • stl_utilitytext
  • stl_wlm_query

You can easily add another system table or custom query by adding a few lines of code to the AWS Glue ETL job, <<cluster-name>_extract_rs_query_logs. For example, suppose that from the product-warehouse Amazon Redshift cluster you want to export orders greater than $2,000. To do so, add the following five lines of code to the AWS Glue ETL job product-warehouse_extract_rs_query_logs, where product-warehouse is your cluster name:

  1. Get the last-processed time-stamp value. The function creates a value if it doesn’t already exist.

salesLastProcessTSValue = functions.getLastProcessedTSValue(trackingEntry=”mydb.sales_2000",job_configs=job_configs)

  1. Run the custom query with the time stamp.

returnDF=functions.runQuery(query="select * from sales s join order o where o.order_amnt > 2000 and sale_timestamp > '{}'".format (salesLastProcessTSValue) ,tableName="mydb.sales_2000",job_configs=job_configs)

  1. Save the results to Amazon S3.

functions.saveToS3(dataframe=returnDF,s3Prefix=s3Prefix,tableName="mydb.sales_2000",partitionColumns=["sale_date"],job_configs=job_configs)

  1. Get the latest time-stamp value from the returned data frame in Step 2.

latestTimestampVal=functions.getMaxValue(returnDF,"sale_timestamp",job_configs)

  1. Update the last-processed time-stamp value in the DynamoDB table.

functions.updateLastProcessedTSValue(“mydb.sales_2000",latestTimestampVal[0],job_configs)

Conclusion

In this post, I demonstrate a serverless solution to retain the system tables’ log data across multiple Amazon Redshift clusters. By using this solution, you can incrementally export the data from system tables into Amazon S3. By performing this export, you can build cross-cluster diagnostic queries, build audit dashboards, and derive insights into capacity planning by using services such as Athena. I also demonstrate how you can extend this solution to other ad hoc query use cases or tables other than system tables by adding a few lines of code.


Additional Reading

If you found this post useful, be sure to check out Using Amazon Redshift Spectrum, Amazon Athena, and AWS Glue with Node.js in Production and Amazon Redshift – 2017 Recap.


About the Author

Karthik Sonti is a senior big data architect at Amazon Web Services. He helps AWS customers build big data and analytical solutions and provides guidance on architecture and best practices.

 

 

 

 

Конкурси… и алманаси :)

Post Syndicated from Григор original http://www.gatchev.info/blog/?p=2131

Две обяви, насочени към всички любители на фантастиката:

1

НА ВАШЕТО ВНИМАНИЕ – „ФАНТАSTIKA 2017“

Излезе от печат осмият пореден алманах „ФантАstika“. Негов съставител, както винаги досега, е Атанас П. Славов – председател на Дружеството на българските фантасти „Тера Фантазия“.
Алманахът е интересен не само за читателите, запознати с предишните ежегодници, но и за ценителите на супержанра (във всичките му форми), които за пръв път ще вземат това издание в ръцете си.

Преводните автори са застъпени с оригинална новела на аржентинката Тереса Мира де Ечеверия, класически разказ на американеца Томас Шеред и една творба от македонския фантаст Никола Суботич, наскоро отличена в конкурса „Агоп Мелконян“.

В големия раздел на родните фантасти ще се срещнете както с доайена Христо Пощаков, представен като майстор на научната фантастика, фентъзито и хумора, така и с нови произведения от Ценка Бакърджиева, Валентин Д. Иванов, Мартин Петков, Янчо Чолаков, а също и с приказка от дебютната книга на Мел.

И сега разделът „Фантастология“ е посветен на обзори и тенденции в развитието на нашата и световната фантастика, плюс задочни срещи с класици като Светослав Минков и Елин Пелин, видени през погледа на Боряна Владимирова и Александър Карапанчев. Няколко статии разглеждат испаноезични писателки, руски тематични направления в модерната НФ, българската фантастика в нова аудио форма и последния брой на списание „Тера фантастика“.

В раздела „Съзвездие Кинотавър“ ще се запознаете с някои от актуалните екранизации на фантастични романи, с англичанина, създал сценария на „Изкуствен интелект“, и с шеговит комикс (за това как на Кубрик му е изглеждало бъдещето през 2019 година).

Броят обявява уникалния по темата си конкурс „Изгревът на следващото“ – за разкази, посветени на едно желаемо бъдеще. Разделът „Футурум“ включва статии за новите информационни религии, несъстояли се финали на света и особено любопитна фаКтастика.

И още по страниците на този алманах: подбрани картини от художника Андриан Бекяров… пристрастен репортаж за Еврокон 2017 в Дортмунд… поезия… и много други събития от неизчерпаемата сфера на въображението.

За повече информация: http://choveshkata.net/blog/?p=6617.

2

Дружество на българските фантасти „Тера Фантазия“ и фондация „Човешката библиотека“ канят всички автори да участват в първия Конкурс „Изгревът на следващото“.

В момента се провежда не един конкурс за български художествени текстове, но този е единственият, който има за тема възможното движение към позитивно бъдеще. Днес, в епохата на ширещи се антиутопии и безкритично катастрофично мислене, се изисква истинска интелектуална смелост, за да потърсим формите за Изхода. Смелост да допуснем, че Човешкият дух е в състояние да намери пътя си към по-високото ниво, интелект да си го представим и талант да го защитим художествено.

Какво е решението на задачата, наречена „Кризисно съвремие“?

Какво е решението, което води до по-висше състояние на ЧоВечността и Човечеството, към бъдеще, в което ЧоВечният Разум е надрасъл безчовечното невежество?

Какво е решението, което ще създаде свят, в който науките и технологиите ще се развиват, за да расте качеството на Човека, а не богатствата на единици?

Какво е решението, което ще избегне застиналите утопиянства, където позьорис бели хитони рецитират един на друг надути речи?

Конкурсът „Изгревът на следващото“ ще бъде мястото, където ще се публикуват истории, посветени на това търсене. Произведения, които с художествен талант и моделираща сила ще защитават нови светове от този вид по един от следните два начина:

  • По спиралата към следващото: Съдби на индивиди и общества, търсещи изхода от съвременното кризисно състояние на света ни; образи на учени, мислители и обикновени хора, напипващи в мрака на неизвестното пътищата към тази цел; приключения на личности, въвлечени в такъв спирален процес и постепенно осъзнаващи смисъла му.
  • Визии на следващото: Изграждане на образи, възникнали в нашето съвремие, но носещи белезите на новото, притежаващи вътрешната свобода, въпреки че са затворени в клетката на настоящата социална несвобода; образи на групи и общества, постигнали белези на следващото, без ескейпизъм, фанатизъм и аскетизъм. Хуманитарни технологии, водещи до освобождаване от опредметяването, разкриващи етическите и интелектуалните ресурси на ЧоВечното. Непротиворечиви и реалистично обрисувани общества на бъдещето, в които всяка личност е пълноценно разгърната и осъществена, без да зависи или да бъде притежавана от друга.

Приемливи са всички жанрове – достатъчно е разказите да засягат поне една от горните две теми.

Крайният срок за участие е 1 юни 2018 г.

Трите най-високо класирани разказа ще получат награди по 200 лв. и заедно с други подбрани заглавия от конкурса ще бъдат публикувани в следващите издания на алманаха „ФантАstika“.

Пълните условия са описани в сайта на Човешката библиотека: http://choveshkata.net/blog/?p=6668

Там ще откриете и най-актуална информация в случай на промени.