One of the most common enquiries I receive at Pi Towers is “How can I get my hands on a Raspberry Pi Oracle Weather Station?” Now the answer is: “Why not build your own version using our guide?”
Tadaaaa! The BYO weather station fully assembled.
Our Oracle Weather Station
In 2016 we sent out nearly 1000 Raspberry Pi Oracle Weather Station kits to schools from around the world who had applied to be part of our weather station programme. In the original kit was a special HAT that allows the Pi to collect weather data with a set of sensors.
The original Raspberry Pi Oracle Weather Station HAT
We designed the HAT to enable students to create their own weather stations and mount them at their schools. As part of the programme, we also provide an ever-growing range of supporting resources. We’ve seen Oracle Weather Stations in great locations with a huge differences in climate, and they’ve even recorded the effects of a solar eclipse.
Our new BYO weather station guide
We only had a single batch of HATs made, and unfortunately we’ve given nearly* all the Weather Station kits away. Not only are the kits really popular, we also receive lots of questions about how to add extra sensors or how to take more precise measurements of a particular weather phenomenon. So today, to satisfy your demand for a hackable weather station, we’re launching our Build your own weather station guide!
Fun with meteorological experiments!
Our guide suggests the use of many of the sensors from the Oracle Weather Station kit, so can build a station that’s as close as possible to the original. As you know, the Raspberry Pi is incredibly versatile, and we’ve made it easy to hack the design in case you want to use different sensors.
Many other tutorials for Pi-powered weather stations don’t explain how the various sensors work or how to store your data. Ours goes into more detail. It shows you how to put together a breadboard prototype, it describes how to write Python code to take readings in different ways, and it guides you through recording these readings in a database.
There’s also a section on how to make your station weatherproof. And in case you want to move past the breadboard stage, we also help you with that. The guide shows you how to solder together all the components, similar to the original Oracle Weather Station HAT.
Who should try this build
We think this is a great project to tackle at home, at a STEM club, Scout group, or CoderDojo, and we’re sure that many of you will be chomping at the bit to get started. Before you do, please note that we’ve designed the build to be as straight-forward as possible, but it’s still fairly advanced both in terms of electronics and programming. You should read through the whole guide before purchasing any components.
The sensors and components we’re suggesting balance cost, accuracy, and easy of use. Depending on what you want to use your station for, you may wish to use different components. Similarly, the final soldered design in the guide may not be the most elegant, but we think it is achievable for someone with modest soldering experience and basic equipment.
You can build a functioning weather station without soldering with our guide, but the build will be more durable if you do solder it. If you’ve never tried soldering before, that’s OK: we have a Getting started with soldering resource plus video tutorial that will walk you through how it works step by step.
For those of you who are more experienced makers, there are plenty of different ways to put the final build together. We always like to hear about alternative builds, so please post your designs in the Weather Station forum.
Our plans for the guide
Our next step is publishing supplementary guides for adding extra functionality to your weather station. We’d love to hear which enhancements you would most like to see! Our current ideas under development include adding a webcam, making a tweeting weather station, adding a light/UV meter, and incorporating a lightning sensor. Let us know which of these is your favourite, or suggest your own amazing ideas in the comments!
*We do have a very small number of kits reserved for interesting projects or locations: a particularly cool experiment, a novel idea for how the Oracle Weather Station could be used, or places with specific weather phenomena. If have such a project in mind, please send a brief outline to [email protected], and we’ll consider how we might be able to help you.
The German charity Save Nemo works to protect coral reefs, and they are developing Nemo-Pi, an underwater “weather station” that monitors ocean conditions. Right now, you can vote for Save Nemo in the Google.org Impact Challenge.
Save Nemo
The organisation says there are two major threats to coral reefs: divers, and climate change. To make diving saver for reefs, Save Nemo installs buoy anchor points where diving tour boats can anchor without damaging corals in the process.
In addition, they provide dos and don’ts for how to behave on a reef dive.
The Nemo-Pi
To monitor the effects of climate change, and to help divers decide whether conditions are right at a reef while they’re still on shore, Save Nemo is also in the process of perfecting Nemo-Pi.
This Raspberry Pi-powered device is made up of a buoy, a solar panel, a GPS device, a Pi, and an array of sensors. Nemo-Pi measures water conditions such as current, visibility, temperature, carbon dioxide and nitrogen oxide concentrations, and pH. It also uploads its readings live to a public webserver.
The Save Nemo team is currently doing long-term tests of Nemo-Pi off the coast of Thailand and Indonesia. They are also working on improving the device’s power consumption and durability, and testing prototypes with the Raspberry Pi Zero W.
The web dashboard showing live Nemo-Pi data
Long-term goals
Save Nemo aims to install a network of Nemo-Pis at shallow reefs (up to 60 metres deep) in South East Asia. Then diving tour companies can check the live data online and decide day-to-day whether tours are feasible. This will lower the impact of humans on reefs and help the local flora and fauna survive.
A healthy coral reef
Nemo-Pi data may also be useful for groups lobbying for reef conservation, and for scientists and activists who want to shine a spotlight on the awful effects of climate change on sea life, such as coral bleaching caused by rising water temperatures.
A bleached coral reef
Vote now for Save Nemo
If you want to help Save Nemo in their mission today, vote for them to win the Google.org Impact Challenge:
Click “Abstimmen” in the footer of the page to vote
Click “JA” in the footer to confirm
Voting is open until 6 June. You can also follow Save Nemo on Facebook or Twitter. We think this organisation is doing valuable work, and that their projects could be expanded to reefs across the globe. It’s fantastic to see the Raspberry Pi being used to help protect ocean life.
In today’s guest post, seventh-grade students Evan Callas, Will Ross, Tyler Fallon, and Kyle Fugate share their story of using the Raspberry Pi Oracle Weather Station in their Innovation Lab class, headed by Raspberry Pi Certified Educator Chris Aviles.
United Nations Sustainable Goals
The past couple of weeks in our Innovation Lab class, our teacher, Mr Aviles, has challenged us students to design a project that helps solve one of the United Nations Sustainable Goals. We chose Climate Action. Innovation Lab is a class that gives students the opportunity to learn about where the crossroads of technology, the environment, and entrepreneurship meet. Everyone takes their own paths in innovation and learns about the environment using project-based learning.
Raspberry Pi Oracle Weather Station
For our climate change challenge, we decided to build a Raspberry Pi Oracle Weather Station. Tackling the issues of climate change in a way that helps our community stood out to us because we knew with the help of this weather station we can send the local data to farmers and fishermen in town. Recent changes in climate have been affecting farmers’ crops. Unexpected rain, heat, and other unusual weather patterns can completely destabilize the natural growth of the plants and destroy their crops altogether. The amount of labour output needed by farmers has also significantly increased, forcing farmers to grow more food on less resources. By using our Raspberry Pi Oracle Weather Station to alert local farmers, they can be more prepared and aware of the weather, leading to better crops and safe boating.
Growing teamwork and coding skills
The process of setting up our weather station was fun and simple. Raspberry Pi made the instructions very easy to understand and read, which was very helpful for our team who had little experience in coding or physical computing. We enjoyed working together as a team and were happy to be growing our teamwork skills.
Once we constructed and coded the weather station, we learned that we needed to support the station with PVC pipes. After we completed these steps, we brought the weather station up to the roof of the school and began collecting data. Our information is currently being sent to the Initial State dashboard so that we can share the information with anyone interested. This information will also be recorded and seen by other schools, businesses, and others from around the world who are using the weather station. For example, we can see the weather in countries such as France, Greece and Italy.
Raspberry Pi allows us to build these amazing projects that help us to enjoy coding and physical computing in a fun, engaging, and impactful way. We picked climate change because we care about our community and would like to make a substantial contribution to our town, Fair Haven, New Jersey. It is not every day that kids are given these kinds of opportunities, and we are very lucky and grateful to go to a school and learn from a teacher where these opportunities are given to us. Thanks, Mr Aviles!
To see more awesome projects by Mr Avile’s class, you can keep up with him on his blog and follow him on Twitter.
This post courtesy of Ed Lima, AWS Solutions Architect
We are excited to announce that you can now develop your AWS Lambda functions using the Node.js 8.10 runtime, which is the current Long Term Support (LTS) version of Node.js. Start using this new version today by specifying a runtime parameter value of nodejs8.10 when creating or updating functions.
Supporting async/await
The Lambda programming model for Node.js 8.10 now supports defining a function handler using the async/await pattern.
Asynchronous or non-blocking calls are an inherent and important part of applications, as user and human interfaces are asynchronous by nature. If you decide to have a coffee with a friend, you usually order the coffee then start or continue a conversation with your friend while the coffee is getting ready. You don’t wait for the coffee to be ready before you start talking. These activities are asynchronous, because you can start one and then move to the next without waiting for completion. Otherwise, you’d delay (or block) the start of the next activity.
Asynchronous calls used to be handled in Node.js using callbacks. That presented problems when they were nested within other callbacks in multiple levels, making the code difficult to maintain and understand.
Promises were implemented to try to solve issues caused by “callback hell.” They allow asynchronous operations to call their own methods and handle what happens when a call is successful or when it fails. As your requirements become more complicated, even promises become harder to work with and may still end up complicating your code.
Async/await is the new way of handling asynchronous operations in Node.js, and makes for simpler, easier, and cleaner code for non-blocking calls. It still uses promises but a callback is returned directly from the asynchronous function, just as if it were a synchronous blocking function.
Take for instance the following Lambda function to get the current account settings, using the Node.js 6.10 runtime:
let AWS = require('aws-sdk');
let lambda = new AWS.Lambda();
exports.handler = (event, context, callback) => {
let getAccountSettingsPromise = lambda.getAccountSettings().promise();
getAccountSettingsPromise.then(
(data) => {
callback(null, data);
},
(err) => {
console.log(err);
callback(err);
}
);
};
With the new Node.js 8.10 runtime, there are new handler types that can be declared with the “async” keyword or can return a promise directly.
This is how the same function looks like using async/await with Node.js 8.10:
let AWS = require('aws-sdk');
let lambda = new AWS.Lambda();
exports.handler = async (event) => {
return await lambda.getAccountSettings().promise() ;
};
Alternatively, you could have the handler return a promise directly:
let AWS = require('aws-sdk');
let lambda = new AWS.Lambda();
exports.handler = (event) => {
return new Promise((resolve, reject) => {
lambda.getAccountSettings(event)
.then((data) => {
resolve data;
})
.catch(reject);
});
};
The new handler types are alternatives to the callback pattern, which is still fully supported.
All three functions return the same results. However, in the new runtime with async/await, all callbacks in the code are gone, which makes it easier to read. This is especially true for those less familiar with promises.
Another great advantage of async/await is better error handling. You can use a try/catch block inside the scope of an async function. Even though the function awaits an asynchronous operation, any errors end up in the catch block.
You can improve your previous Node.js 8.10 function with this trusted try/catch error handling pattern:
let AWS = require('aws-sdk');
let lambda = new AWS.Lambda();
let data;
exports.handler = async (event) => {
try {
data = await lambda.getAccountSettings().promise();
}
catch (err) {
console.log(err);
return err;
}
return data;
};
While you now have a similar number of lines in both runtimes, the code is cleaner and more readable with async/await. It makes the asynchronous calls look more synchronous. However, it is important to notice that the code is still executed the same way as if it were using a callback or promise-based API.
Backward compatibility
You may port your existing Node.js 4.3 and 6.10 functions over to Node.js 8.10 by updating the runtime. Node.js 8.10 does include numerous breaking changes from previous Node versions.
Make sure to review the API changes between Node.js 4.3, 6.10, and Node.js 8.10 to see if there are other changes that might affect your code. We recommend testing that your Lambda function passes internal validation for its behavior when upgrading to the new runtime version.
You can use Lambda versions/aliases to safely test that your function runs as expected on Node 8.10, before routing production traffic to it.
New node features
You can now get better performance when compared to the previous LTS version 6.x (up to 20%). The new V8 6.0 engine comes with Turbofan and the Ignition pipeline, which leads to lower memory consumption and faster startup time across Node.js applications.
HTTP/2, which is subject to future changes, allows developers to use the new protocol to speed application development and undo many of HTTP/1.1 workarounds to make applications faster, simpler, and more powerful.
Data that describe processes in a spatial context are everywhere in our day-to-day lives and they dominate big data problems. Map data, for instance, whether describing networks of roads or remote sensing data from satellites, get us where we need to go. Atmospheric data from simulations and sensors underlie our weather forecasts and climate models. Devices and sensors with GPS can provide a spatial context to nearly all mobile data.
In this post, we introduce the WIND toolkit, a huge (500 TB), open weather model dataset that’s available to the world on Amazon’s cloud services. We walk through how to access this data and some of the open-source software developed to make it easily accessible. Our solution considers a subset of geospatial data that exist on a grid (raster) and explores ways to provide access to large-scale raster data from weather models. The solution uses foundational AWS services and the Hierarchical Data Format (HDF), a well adopted format for scientific data.
The approach developed here can be extended to any data that fit in an HDF5 file, which can describe sparse and dense vectors and matrices of arbitrary dimensions. This format is already popular within the physical sciences for both experimental and simulation data. We discuss solutions to gridded data storage for a massive dataset of public weather model outputs called the Wind Integration National Dataset (WIND) toolkit. We also highlight strategies that are general to other large geospatial data management problems.
Wind Integration National Dataset
As variable renewable power penetration levels increase in power systems worldwide, the importance of renewable integration studies to ensure continued economic and reliable operation of the power grid is also increasing. The WIND toolkit is the largest freely available grid integration dataset to date.
The WIND toolkit was developed by 3TIER by Vaisala. They were under a subcontract to the National Renewable Energy Laboratory (NREL) to support studies on integration of wind energy into the existing US grid. NREL is a part of a network of national laboratories for the US Department of Energy and has a mission to advance the science and engineering of energy efficiency, sustainable transportation, and renewable power technologies.
The toolkit has been used by consultants, research groups, and universities worldwide to support grid integration studies. Less traditional uses also include resource assessments for wind plants (such as those powering Amazon data centers), and studying the effects of weather on California condor migrations in the Baja peninsula.
The diversity of applications highlights the value of accessible, open public data. Yet, there’s a catch: the dataset is huge. The WIND toolkit provides simulated atmospheric (weather) data at a two-km spatial resolution and five-minute temporal resolution at multiple heights for seven years. The entire dataset is half a petabyte (500 TB) in size and is stored in the NREL High Performance Computing data center in Golden, Colorado. Making this dataset publicly available easily and in a cost-effective manner is a major challenge.
As other laboratories and public institutions work to release their data to the world, they may face similar challenges to those that we experienced. Some prior, well-intentioned efforts to release huge datasets as-is have resulted in data resources that are technically available but fundamentally unusable. They may be stored in an unintuitive format or indexed and organized to support only a subset of potential uses. Downloading hundreds of terabytes of data is often impractical. Most users don’t have access to a big data cluster (or super computer) to slice and dice the data as they need after it’s downloaded.
We aim to provide a large amount of data (50 terabytes) to the public in a way that is efficient, scalable, and easy to use. In many cases, researchers can access these huge cloud-located datasets using the same software and algorithms they have developed for smaller datasets stored locally. Only the pieces of data they need for their individual analysis must be downloaded. To make this work in practice, we worked with the HDF Group and have built upon their forthcoming Highly Scalable Data Service.
In the rest of this post, we discuss how the HSDS software was developed to use Amazon EC2 and Amazon S3 resources to provide convenient and scalable access to these huge geospatial datasets. We describe how the HSDS service has been put to work for the WIND Toolkit dataset and demonstrate how to access it using the h5pyd Python library and the REST API. We conclude with information about our ongoing work to release more ‘open’ datasets to the public using AWS services, and ways to improve and extend the HSDS with newer Amazon services like Amazon ECS and AWS Lambda.
Developing a scalable service for big geospatial data
The HDF5 file format and API have been used for many years and is an effective means of storing large scientific datasets. For example, NASA’s Earth Observing System (EOS) satellites collect more than 16 TBs of data per day using HDF5.
With the rise of the cloud, there are new challenges and opportunities to rethink how HDF5 can be enhanced to work effectively as a component in a cloud-native architecture. For the HDF Group, working with NREL has been a great opportunity to put ideas into practice with a production-size dataset.
An HDF5 file consists of a directed graph of group and dataset objects. Datasets can be thought of as a multidimensional array with support for user-defined metadata tags and compression. Typical operations on datasets would be reading or writing data to a regular subregion (a hyperslab) or reading and writing individual elements (a point selection). Also, group and dataset objects may each contain an arbitrary number of the user-defined metadata elements known as attributes.
Many people have used the HDF library in applications developed or ported to run on EC2 instances, but there are a number of constraints that often prove problematic:
The HDF5 library can’t read directly from HDF5 files stored as S3 objects. The entire file (often many GB in size) would need to be copied to local storage before the first byte can be read. Also, the instance must be configured with the appropriately sized EBS volume)
The HDF library only has access to the computational resources of the instance itself (as opposed to a cluster of instances), so many operations are bottlenecked by the library.
Any modifications to the HDF5 file would somehow have to be synchronized with changes that other instances have made to same file before writing back to S3.
Using a pattern common to many offerings from AWS, the solution to these constraints is to develop a service framework around the HDF data model. Using this model, the HDF Group has created the Highly Scalable Data Service (HSDS) that provides all the functionality that traditionally was provided by the HDF5 library. By using the service, you don’t need to manage your own file volumes, but can just read and write whatever data that you need.
Because the service manages the actual data persistence to a durable medium (S3, in this case), you don’t need to worry about disk management. Simply stream the data you need from the service as you need it. Secondly, putting the functionality behind a service allows some tricks to increase performance (described in more detail later). And lastly, HSDS allows any number of clients to access the data at the same time, enabling HDF5 to be used as a coordination mechanism for multiple readers and writers.
In designing the HSDS architecture, we gave much thought to how to achieve scalability of the HSDS service. For accessing HDF5 data, there are two different types of scaling to consider:
Multiple clients making many requests to the service
Single requests that require a significant amount of data processing
To deal with the first scaling challenge, as with most services, we considered how the service responds as the request rate increases. AWS provides some great tools that help in this regard:
Auto Scaling groups
Elastic Load Balancing load balancers
The ability of S3 to handle large aggregate throughput rates
By using a cluster of EC2 instances behind a load balancer, you can handle different client loads in a cost-effective manner.
The second scaling challenge concerns single requests that would take significant processing time with just one compute node. One example of this from the WIND toolkit would be extracting all the values in the seven-year time span for a given geographic point and dataset.
In HDF5, large datasets are typically stored as “chunks”; that is, a regular partition of the array. In HSDS, each chunk is stored as a binary object in S3. The sequential approach to retrieving the time series values would be for the service to read each chunk needed from S3, extract the needed elements, and go on to the next chunk. In this case, that would involve processing 2557 chunks, and would be quite slow.
Fortunately, with HSDS, you can speed this up quite a bit by exploiting the compute and I/O capabilities of the cluster. Upon receiving the request, the receiving node can use other nodes in the cluster to read different portions of the selection. With multiple nodes reading from S3 in parallel, performance improves as the cluster size increases.
The diagram below illustrates how this works in simplified case of four chunks and four nodes.
This architecture has worked in well in practice. In testing with the WIND toolkit and time series extraction, we observed a request latency of ~60 seconds using four nodes vs. ~5 seconds with 40 nodes. Performance roughly scales with the size of the cluster.
A planned enhancement to this is to use AWS Lambda for the worker processing. This enables 1000-way parallel reads at a reasonable cost, as you only pay for the milliseconds of CPU time used with AWS Lambda.
Public access to atmospheric data using HSDS and AWS
An early challenge in releasing the WIND toolkit data was in deciding how to subset the data for different use cases. In general, few researchers need access to the entire 0.5 PB of data and a great deal of efficiency and cost reduction can be gained by making directed constituent datasets.
NREL grid integration researchers initially extracted a 2-TB subset by selecting 120,000 points where the wind resource seemed appropriate for development. They also chose only those data important for wind applications (100-m wind speed, converted to power), the most interesting locations for those performing grid studies. To support the remaining users who needed more data resolution, we down-sampled the data to a 60-minute temporal resolution, keeping all the other variables and spatial resolution intact. This reduced dataset is 50 TB of data describing 30+ atmospheric variables of data for 7 years at a 60-minute temporal resolution.
The WindViz browser-based Gridded Wind Toolkit Visualizer was created as an example implementation of the HSDS REST API in JavaScript. The visualizer is written in the style of ECMAScript 2016 using a modern development toolchain that includes webpack and Babel. The source code is available through our GitHub repository. The demo page is hosted via GitHub pages, and we use a cross-origin AJAX request to fetch data from the HSDS service running on the EC2 infrastructure. The visualizer can be used to explore the gridded wind toolkit data on a map. Achieve full spatial resolution by zooming in to a specific region.
Programmatic access is possible using the h5pyd Python library, a distributed analog to the widely used h5py library. Users interact with the datasets (variables) and slice the data from its (time x longitude x latitude) cube form as they see fit.
Examples and use cases are described in a set of Jupyter notebooks and available on GitHub:
Now you have a Jupyter notebook server running on your EC2 server.
From your laptop, create an SSH tunnel:
$ ssh –L 8888:localhost:8888 (IP address of the EC2 server)
Now, you can browse to localhost:8888 using the correct token, and interact with the notebooks as if they were local. Within the directory, there are examples for accessing the HSDS API and plotting wind and weather data using matplotlib.
Controlling access and defraying costs
A final concern is rate limiting and access control. Although the HSDS service is scalable and relatively robust, we had a few practical concerns:
How can we protect from malicious or accidental use that may lead to high egress fees (for example, someone who attempts to repeatedly download the entire dataset from S3)?
How can we keep track of who is using the data both to document the value of the data resource and to justify the costs?
If costs become too high, can we charge for some or all API use to help cover the costs?
To approach these problems, we investigated using Amazon API Gateway and its simplified integration with the AWS Marketplace for SaaS monetization as well as third-party API proxies.
In the end, we chose to use API Umbrella due to its close involvement with http://data.gov. While AWS Marketplace is a compelling option for future datasets, the decision was made to keep this dataset entirely open, at least for now. As community use and associated costs grow, we’ll likely revisit Marketplace. Meanwhile, API Umbrella provides controls for rate limiting and API key registration out of the box and was simple to implement as a front-end proxy to HSDS. Those applications that may want to charge for API use can accomplish a similar strategy using Amazon API Gateway and AWS Marketplace.
Ongoing work and other resources
As NREL and other government research labs, municipalities, and organizations try to share data with the public, we expect many of you will face similar challenges to those we have tried to approach with the architecture described in this post. Providing large datasets is one challenge. Doing so in a way that is affordable and convenient for users is an entirely more difficult goal. Using AWS cloud-native services and the existing foundation of the HDF file format has allowed us to tackle that challenge in a meaningful way.
Dr. Caleb Phillips is a senior scientist with the Data Analysis and Visualization Group within the Computational Sciences Center at the National Renewable Energy Laboratory. Caleb comes from a background in computer science systems, applied statistics, computational modeling, and optimization. His work at NREL spans the breadth of renewable energy technologies and focuses on applying modern data science techniques to data problems at scale.
Dr. Caroline Draxl is a senior scientist at NREL. She supports the research and modeling activities of the US Department of Energy from mesoscale to wind plant scale. Caroline uses mesoscale models to research wind resources in various countries, and participates in on- and offshore boundary layer research and in the coupling of the mesoscale flow features (kilometer scale) to the microscale (tens of meters). She holds a M.S. degree in Meteorology and Geophysics from the University of Innsbruck, Austria, and a PhD in Meteorology from the Technical University of Denmark.
John Readey has been a Senior Architect at The HDF Group since he joined in June 2014. His interests include web services related to HDF, applications that support the use of HDF and data visualization.Before joining The HDF Group, John worked at Amazon.com from 2006–2014 where he developed service-based systems for eCommerce and AWS.
Jordan Perr-Sauer is an RPP intern with the Data Analysis and Visualization Group within the Computational Sciences Center at the National Renewable Energy Laboratory. Jordan hopes to use his professional background in software engineering and his academic training in applied mathematics to solve the challenging problems facing America and the world.
This is part two of a series on the factors that an organization needs to consider when opening a data center and the challenges that must be met in the process.
In Part 1 of this series, we looked at the different types of data centers, the importance of location in planning a data center, data center certification, and the single most expensive factor in running a data center, power.
In Part 2, we continue to look at factors that need to considered both by those interested in a dedicated data center and those seeking to colocate in an existing center.
In part 1, we began our discussion of the power requirements of data centers.
As we discussed, redundancy and failover is a chief requirement for data center power. A redundantly designed power supply system is also a necessity for maintenance, as it enables repairs to be performed on one network, for example, without having to turn off servers, databases, or electrical equipment.
Power Path
The common critical components of a data center’s power flow are:
Utility Supply
Generators
Transfer Switches
Distribution Panels
Uninterruptible Power Supplies (UPS)
PDUs
Utility Supply is the power that comes from one or more utility grids. While most of us consider the grid to be our primary power supply (hats off to those of you who manage to live off the grid), politics, economics, and distribution make utility supply power susceptible to outages, which is why data centers must have autonomous power available to maintain availability.
Generators are used to supply power when the utility supply is unavailable. They convert mechanical energy, usually from motors, to electrical energy.
Transfer Switches are used to transfer electric load from one source or electrical device to another, such as from one utility line to another, from a generator to a utility, or between generators. The transfer could be manually activated or automatic to ensure continuous electrical power.
Distribution Panels get the power where it needs to go, taking a power feed and dividing it into separate circuits to supply multiple loads.
A UPS, as we touched on earlier, ensures that continuous power is available even when the main power source isn’t. It often consists of batteries that can come online almost instantaneously when the current power ceases. The power from a UPS does not have to last a long time as it is considered an emergency measure until the main power source can be restored. Another function of the UPS is to filter and stabilize the power from the main power supply.
Data center UPSs
PDU stands for the Power Distribution Unit and is the device that distributes power to the individual pieces of equipment.
Network
After power, the networking connections to the data center are of prime importance. Can the data center obtain and maintain high-speed networking connections to the building? With networking, as with all aspects of a data center, availability is a primary consideration. Data center designers think of all possible ways service can be interrupted or lost, even briefly. Details such as the vulnerabilities in the route the network connections make from the core network (the backhaul) to the center, and where network connections enter and exit a building, must be taken into consideration in network and data center design.
Routers and switches are used to transport traffic between the servers in the data center and the core network. Just as with power, network redundancy is a prime factor in maintaining availability of data center services. Two or more upstream service providers are required to ensure that availability.
How fast a customer can transfer data to a data center is affected by: 1) the speed of the connections the data center has with the outside world, 2) the quality of the connections between the customer and the data center, and 3) the distance of the route from customer to the data center. The longer the length of the route and the greater the number of packets that must be transferred, the more significant a factor will be played by latency in the data transfer. Latency is the delay before a transfer of data begins following an instruction for its transfer. Generally latency, not speed, will be the most significant factor in transferring data to and from a data center. Packets transferred using the TCP/IP protocol suite, which is the conceptual model and set of communications protocols used on the internet and similar computer networks, must be acknowledged when received (ACK’d) and requires a communications roundtrip for each packet. If the data is in larger packets, the number of ACKs required is reduced, so latency will be a smaller factor in the overall network communications speed.
Those interested in testing the overall speed and latency of their connection to Backblaze’s data centers can use the Check Your Bandwidth tool on our website.
Data center telecommunications equipment
Data center under floor cable runs
Cooling
Computer, networking, and power generation equipment generates heat, and there are a number of solutions employed to rid a data center of that heat. The location and climate of the data center is of great importance to the data center designer because the climatic conditions dictate to a large degree what cooling technologies should be deployed that in turn affect the power used and the cost of using that power. The power required and cost needed to manage a data center in a warm, humid climate will vary greatly from managing one in a cool, dry climate. Innovation is strong in this area and many new approaches to efficient and cost-effective cooling are used in the latest data centers.
Switch’s uninterruptible, multi-system, HVAC Data Center Cooling Units
There are three primary ways data center cooling can be achieved:
Room Cooling cools the entire operating area of the data center. This method can be suitable for small data centers, but becomes more difficult and inefficient as IT equipment density and center size increase.
Row Cooling concentrates on cooling a data center on a row by row basis. In its simplest form, hot aisle/cold aisle data center design involves lining up server racks in alternating rows with cold air intakes facing one way and hot air exhausts facing the other. The rows composed of rack fronts are called cold aisles. Typically, cold aisles face air conditioner output ducts. The rows the heated exhausts pour into are called hot aisles. Typically, hot aisles face air conditioner return ducts.
Rack Cooling tackles cooling on a rack by rack basis. Air-conditioning units are dedicated to specific racks. This approach allows for maximum densities to be deployed per rack. This works best in data centers with fully loaded racks, otherwise there would be too much cooling capacity, and the air-conditioning losses alone could exceed the total IT load.
Security
Data Centers are high-security facilities as they house business, government, and other data that contains personal, financial, and other secure information about businesses and individuals.
This list contains the physical-security considerations when opening or co-locating in a data center:
Layered Security Zones. Systems and processes are deployed to allow only authorized personnel in certain areas of the data center. Examples include keycard access, alarm systems, mantraps, secure doors, and staffed checkpoints.
Physical Barriers. Physical barriers, fencing and reinforced walls are used to protect facilities. In a colocation facility, one customers’ racks and servers are often inaccessible to other customers colocating in the same data center.
Backblaze racks secured in the data center
Monitoring Systems. Advanced surveillance technology monitors and records activity on approaching driveways, building entrances, exits, loading areas, and equipment areas. These systems also can be used to monitor and detect fire and water emergencies, providing early detection and notification before significant damage results.
Top-tier providers evaluate their data center security and facilities on an ongoing basis. Technology becomes outdated quickly, so providers must stay-on-top of new approaches and technologies in order to protect valuable IT assets.
To pass into high security areas of a data center requires passing through a security checkpoint where credentials are verified.
The gauntlet of cameras and steel bars one must pass before entering this data center
Facilities and Services
Data center colocation providers often differentiate themselves by offering value-added services. In addition to the required space, power, cooling, connectivity and security capabilities, the best solutions provide several on-site amenities. These accommodations include offices and workstations, conference rooms, and access to phones, copy machines, and office equipment.
Additional features may consist of kitchen facilities, break rooms and relaxation lounges, storage facilities for client equipment, and secure loading docks and freight elevators.
Moving into A Data Center
Moving into a data center is a major job for any organization. We wrote a post last year, Desert To Data in 7 Days — Our New Phoenix Data Center, about what it was like to move into our new data center in Phoenix, Arizona.
Our Director of Product Marketing Andy Klein wrote a popular post last year on what it’s like to visit a data center called A Day in the Life of a Data Center.
Would you Like to Know More about The Challenges of Opening and Running a Data Center?
That’s it for part 2 of this series. If readers are interested, we could write a post about some of the new technologies and trends affecting data center design and use. Please let us know in the comments.
Don’t miss future posts on data centers and other topics, including hard drive stats, cloud storage, and tips and tricks for backing up to the cloud. Use the Join button above to receive notification of future posts on our blog.
This is part one of a series. The second part will be posted later this week. Use the Join button above to receive notification of future posts in this series.
Though most of us have never set foot inside of a data center, as citizens of a data-driven world we nonetheless depend on the services that data centers provide almost as much as we depend on a reliable water supply, the electrical grid, and the highway system. Every time we send a tweet, post to Facebook, check our bank balance or credit score, watch a YouTube video, or back up a computer to the cloud we are interacting with a data center.
In this series, The Challenges of Opening a Data Center, we’ll talk in general terms about the factors that an organization needs to consider when opening a data center and the challenges that must be met in the process. Many of the factors to consider will be similar for opening a private data center or seeking space in a public data center, but we’ll assume for the sake of this discussion that our needs are more modest than requiring a data center dedicated solely to our own use (i.e. we’re not Google, Facebook, or China Telecom).
Data center technology and management are changing rapidly, with new approaches to design and operation appearing every year. This means we won’t be able to cover everything happening in the world of data centers in our series, however, we hope our brief overview proves useful.
What is a Data Center?
A data center is the structure that houses a large group of networked computer servers typically used by businesses, governments, and organizations for the remote storage, processing, or distribution of large amounts of data.
While many organizations will have computing services in the same location as their offices that support their day-to-day operations, a data center is a structure dedicated to 24/7 large-scale data processing and handling.
Depending on how you define the term, there are anywhere from a half million data centers in the world to many millions. While it’s possible to say that an organization’s on-site servers and data storage can be called a data center, in this discussion we are using the term data center to refer to facilities that are expressly dedicated to housing computer systems and associated components, such as telecommunications and storage systems. The facility might be a private center, which is owned or leased by one tenant only, or a shared data center that offers what are called “colocation services,” and rents space, services, and equipment to multiple tenants in the center.
A large, modern data center operates around the clock, placing a priority on providing secure and uninterrrupted service, and generally includes redundant or backup power systems or supplies, redundant data communication connections, environmental controls, fire suppression systems, and numerous security devices. Such a center is an industrial-scale operation often using as much electricity as a small town.
Types of Data Centers
There are a number of ways to classify data centers according to how they will be used, whether they are owned or used by one or multiple organizations, whether and how they fit into a topology of other data centers; which technologies and management approaches they use for computing, storage, cooling, power, and operations; and increasingly visible these days: how green they are.
Data centers can be loosely classified into three types according to who owns them and who uses them.
Exclusive Data Centers are facilities wholly built, maintained, operated and managed by the business for the optimal operation of its IT equipment. Some of these centers are well-known companies such as Facebook, Google, or Microsoft, while others are less public-facing big telecoms, insurance companies, or other service providers.
Managed Hosting Providers are data centers managed by a third party on behalf of a business. The business does not own data center or space within it. Rather, the business rents IT equipment and infrastructure it needs instead of investing in the outright purchase of what it needs.
Colocation Data Centers are usually large facilities built to accommodate multiple businesses within the center. The business rents its own space within the data center and subsequently fills the space with its IT equipment, or possibly uses equipment provided by the data center operator.
Backblaze, for example, doesn’t own its own data centers but colocates in data centers owned by others. As Backblaze’s storage needs grow, Backblaze increases the space it uses within a given data center and/or expands to other data centers in the same or different geographic areas.
Availability is Key
When designing or selecting a data center, an organization needs to decide what level of availability is required for its services. The type of business or service it provides likely will dictate this. Any organization that provides real-time and/or critical data services will need the highest level of availability and redundancy, as well as the ability to rapidly failover (transfer operation to another center) when and if required. Some organizations require multiple data centers not just to handle the computer or storage capacity they use, but to provide alternate locations for operation if something should happen temporarily or permanently to one or more of their centers.
Organizations operating data centers that can’t afford any downtime at all will typically operate data centers that have a mirrored site that can take over if something happens to the first site, or they operate a second site in parallel to the first one. These data center topologies are called Active/Passive, and Active/Active, respectively. Should disaster or an outage occur, disaster mode would dictate immediately moving all of the primary data center’s processing to the second data center.
While some data center topologies are spread throughout a single country or continent, others extend around the world. Practically, data transmission speeds put a cap on centers that can be operated in parallel with the appearance of simultaneous operation. Linking two data centers located apart from each other — say no more than 60 miles to limit data latency issues — together with dark fiber (leased fiber optic cable) could enable both data centers to be operated as if they were in the same location, reducing staffing requirements yet providing immediate failover to the secondary data center if needed.
This redundancy of facilities and ensured availability is of paramount importance to those needing uninterrupted data center services.
LEED Certification
Leadership in Energy and Environmental Design (LEED) is a rating system devised by the United States Green Building Council (USGBC) for the design, construction, and operation of green buildings. Facilities can achieve ratings of certified, silver, gold, or platinum based on criteria within six categories: sustainable sites, water efficiency, energy and atmosphere, materials and resources, indoor environmental quality, and innovation and design.
Green certification has become increasingly important in data center design and operation as data centers require great amounts of electricity and often cooling water to operate. Green technologies can reduce costs for data center operation, as well as make the arrival of data centers more amenable to environmentally-conscious communities.
The ACT, Inc. data center in Iowa City, Iowa was the first data center in the U.S. to receive LEED-Platinum certification, the highest level available.
ACT Data Center exterior
ACT Data Center interior
Factors to Consider When Selecting a Data Center
There are numerous factors to consider when deciding to build or to occupy space in a data center. Aspects such as proximity to available power grids, telecommunications infrastructure, networking services, transportation lines, and emergency services can affect costs, risk, security and other factors that need to be taken into consideration.
The size of the data center will be dictated by the business requirements of the owner or tenant. A data center can occupy one room of a building, one or more floors, or an entire building. Most of the equipment is often in the form of servers mounted in 19 inch rack cabinets, which are usually placed in single rows forming corridors (so-called aisles) between them. This allows staff access to the front and rear of each cabinet. Servers differ greatly in size from 1U servers (i.e. one “U” or “RU” rack unit measuring 44.50 millimeters or 1.75 inches), to Backblaze’s Storage Pod design that fits a 4U chassis, to large freestanding storage silos that occupy many square feet of floor space.
Location
Location will be one of the biggest factors to consider when selecting a data center and encompasses many other factors that should be taken into account, such as geological risks, neighboring uses, and even local flight paths. Access to suitable available power at a suitable price point is often the most critical factor and the longest lead time item, followed by broadband service availability.
With more and more data centers available providing varied levels of service and cost, the choices increase each year. Data center brokers can be employed to find a data center, just as one might use a broker for home or other commercial real estate.
Websites listing available colocation space, such as upstack.io, or entire data centers for sale or lease, are widely used. A common practice is for a customer to publish its data center requirements, and the vendors compete to provide the most attractive bid in a reverse auction.
Business and Customer Proximity
The center’s closeness to a business or organization may or may not be a factor in the site selection. The organization might wish to be close enough to manage the center or supervise the on-site staff from a nearby business location. The location of customers might be a factor, especially if data transmission speeds and latency are important, or the business or customers have regulatory, political, tax, or other considerations that dictate areas suitable or not suitable for the storage and processing of data.
Climate
Local climate is a major factor in data center design because the climatic conditions dictate what cooling technologies should be deployed. In turn this impacts uptime and the costs associated with cooling, which can total as much as 50% or more of a center’s power costs. The topology and the cost of managing a data center in a warm, humid climate will vary greatly from managing one in a cool, dry climate. Nevertheless, data centers are located in both extremely cold regions and extremely hot ones, with innovative approaches used in both extremes to maintain desired temperatures within the center.
Geographic Stability and Extreme Weather Events
A major obvious factor in locating a data center is the stability of the actual site as regards weather, seismic activity, and the likelihood of weather events such as hurricanes, as well as fire or flooding.
Backblaze’s Sacramento data center describes its location as one of the most stable geographic locations in California, outside fault zones and floodplains.
Sometimes the location of the center comes first and the facility is hardened to withstand anticipated threats, such as Equinix’s NAP of the Americas data center in Miami, one of the largest single-building data centers on the planet (six stories and 750,000 square feet), which is built 32 feet above sea level and designed to withstand category 5 hurricane winds.
Equinix “NAP of the Americas” Data Center in Miami
Most data centers don’t have the extreme protection or history of the Bahnhof data center, which is located inside the ultra-secure former nuclear bunker Pionen, in Stockholm, Sweden. It is buried 100 feet below ground inside the White Mountains and secured behind 15.7 in. thick metal doors. It prides itself on its self-described “Bond villain” ambiance.
Bahnhof Data Center under White Mountain in Stockholm
Usually, the data center owner or tenant will want to take into account the balance between cost and risk in the selection of a location. The Ideal quadrant below is obviously favored when making this compromise.
Risk mitigation also plays a strong role in pricing. The extent to which providers must implement special building techniques and operating technologies to protect the facility will affect price. When selecting a data center, organizations must make note of the data center’s certification level on the basis of regulatory requirements in the industry. These certifications can ensure that an organization is meeting necessary compliance requirements.
Power
Electrical power usually represents the largest cost in a data center. The cost a service provider pays for power will be affected by the source of the power, the regulatory environment, the facility size and the rate concessions, if any, offered by the utility. At higher level tiers, battery, generator, and redundant power grids are a required part of the picture.
Fault tolerance and power redundancy are absolutely necessary to maintain uninterrupted data center operation. Parallel redundancy is a safeguard to ensure that an uninterruptible power supply (UPS) system is in place to provide electrical power if necessary. The UPS system can be based on batteries, saved kinetic energy, or some type of generator using diesel or another fuel. The center will operate on the UPS system with another UPS system acting as a backup power generator. If a power outage occurs, the additional UPS system power generator is available.
Many data centers require the use of independent power grids, with service provided by different utility companies or services, to prevent against loss of electrical service no matter what the cause. Some data centers have intentionally located themselves near national borders so that they can obtain redundant power from not just separate grids, but from separate geopolitical sources.
Higher redundancy levels required by a company will of invariably lead to higher prices. If one requires high availability backed by a service-level agreement (SLA), one can expect to pay more than another company with less demanding redundancy requirements.
Stay Tuned for Part 2 of The Challenges of Opening a Data Center
That’s it for part 1 of this post. In subsequent posts, we’ll take a look at some other factors to consider when moving into a data center such as network bandwidth, cooling, and security. We’ll take a look at what is involved in moving into a new data center (including stories from Backblaze’s experiences). We’ll also investigate what it takes to keep a data center running, and some of the new technologies and trends affecting data center design and use. You can discover all posts on our blog tagged with “Data Center” by following the link https://www.backblaze.com/blog/tag/data-center/.
The second part of this series on The Challenges of Opening a Data Center will be posted later this week. Use the Join button above to receive notification of future posts in this series.
Big things are afoot in the world of HackSpace magazine! This month we’re running our first special issue, with wearables projects throughout the magazine. Moreover, we’re giving away our first subscription gift free to all 12-month print subscribers. Lastly, and most importantly, we’ve made the cover EXTRA SHINY!
Prepare your eyeballs — it’s HackSpace magazine issue 4!
Wearables
In this issue, we’re taking an in-depth look at wearable tech. Not Fitbits or Apple Watches — we’re talking stuff you can make yourself, from projects that take a couple of hours to put together, to the huge, inspiring builds that are bringing technology to the runway. If you like wearing clothes and you like using your brain to make things better, then you’ll love this feature.
We’re continuing our obsession with Nixie tubes, with the brilliant Time-To-Go-Clock – Trump edition. This ingenious bit of kit uses obsolete Russian electronics to count down the time until the end of the 45th president’s term in office. However, you can also program it to tell the time left to any predictable event, such as the deadline for your tax return or essay submission, or the date England gets knocked out of the World Cup.
We’re also talking to Dr Lucy Rogers — NASA alumna, Robot Wars judge, and fellow of the Institution of Mechanical Engineers — about the difference between making as a hobby and as a job, and about why we need the Guild of Makers. Plus, issue 4 has a teeny boat, the most beautiful Raspberry Pi cases you’ve ever seen, and it explores the results of what happens when you put a bunch of hardware hackers together in a French chateau — sacré bleu!
Tutorials
As always, we’ve got more how-tos than you can shake a soldering iron at. Fittingly for the current climate here in the UK, there’s a hot water monitor, which shows you how long you have before your morning shower turns cold, and an Internet of Tea project to summon a cuppa from your kettle via the web. Perhaps not so fittingly, there’s also an ESP8266 project for monitoring a solar power station online. Readers in the southern hemisphere, we’ll leave that one for you — we haven’t seen the sun here for months!
And there’s more!
We’re super happy to say that all our 12-month print subscribers have been sent an Adafruit Circuit Playground Express with this new issue:
This gadget was developed primarily with wearables in mind and comes with all sorts of in-built functionality, so subscribers can get cracking with their latest wearable project today! If you’re not a 12-month print subscriber, you’ll miss out, so subscribe here to get your magazine and your device, and let us know what you’ll make.
This post summarizes the responses we received to our November 28 post asking our readers how they handle the challenge of digital asset management (DAM). You can read the previous posts in this series below:
Use the Join button above to receive notification of future posts on this topic.
This past November, we published a blog post entitled What’s the Best Solution for Managing Digital Photos and Videos? We asked our readers to tell us how they’re currently backing up their digital media assets and what their ideal system might be. We posed these questions:
How are you currently backing up your digital photos, video files, and/or file libraries/catalogs? Do you have a backup system that uses attached drives, a local network, the cloud, or offline storage media? Does it work well for you?
Imagine your ideal digital asset backup setup. What would it look like? Don’t be constrained by current products, technologies, brands, or solutions. Invent a technology or product if you wish. Describe an ideal system that would work the way you want it to.
We were thrilled to receive a large number of responses from readers. What was clear from the responses is that there is no consensus on solutions for either amateur or professional, and that users had many ideas for how digital media management could be improved to meet their needs.
We asked our readers to contribute to this dialog for a number of reasons. As a cloud backup and cloud storage service provider, we want to understand how our users are working with digital media so we know how to improve our services. Also, we want to participate in the digital media community, and hope that sharing the challenges our readers are facing and the solutions they are using will make a contribution to that community.
The State of Managing Digital Media
While a few readers told us they had settled on a system that worked for them, most said that they were still looking for a better solution. Many expressed frustration with dealing with the growing amount of data for digital photos and videos that is only getting larger with the increasing resolution of still and video cameras. Amateurs are making do with a number of consumer services, while professionals employ a wide range of commercial, open source, or jury rigged solutions for managing data and maintaining its integrity.
I’ve summarized the responses we received in three sections on, 1) what readers are doing today, 2) common wishes they have for improvements, and 3) concerns that were expressed by a number of respondents.
The Digital Media Workflow
Protecting Media From Camera to Cloud
We heard from a wide range of smartphone users, DSLR and other format photographers, and digital video creators. Speed of operation, the ability to share files with collaborators and clients, and product feature sets were frequently cited as reasons for selecting their particular solution. Also of great importance was protecting the integrity of media through the entire capture, transfer, editing, and backup workflow.
Avid Media Composer
Many readers said they backed up their camera memory cards as soon as possible to a computer or external drive and erased cards only when they had more than one backup of the media. Some said that they used dual memory cards that are written to simultaneously by the camera for peace-of-mind.
While some cameras now come equipped with Wi-Fi, no one other than smartphone users said they were using Wi-Fi as part of their workflow. Also, we didn’t receive feedback from any photographers who regularly shoot tethered.
Some readers said they still use CDs and DVDs for storing media. One user admitted to previously using VHS tape.
NAS (Network Attached Storage) is in wide use. Synology, Drobo, FreeNAS, and other RAID and non-RAID storage devices were frequently mentioned.
A number were backing up their NAS to the cloud for archiving. Others said they had duplicate external drives that were stored onsite or offsite, including in a physical safe, other business locations, a bank lock box, and even “mom’s house.”
Many said they had regular backup practices, including nightly backups, weekly and other regularly scheduled backups, often in non-work hours.
One reader said that a monthly data scrub was performed on the NAS to ensure data integrity.
Hardware used for backups included Synology, QNAP, Drobo, and FreeNAS systems.
Services used by our readers for backing up included Backblaze Backup, Backblaze B2 Cloud Storage, CrashPlan, SmugMug, Amazon Glacier, Google Photos, Amazon Prime Photos, Adobe Creative Cloud, Apple Photos, Lima, DropBox, and Tarsnap. Some readers made a distinction between how they used sync (such as DropBox), backup (such as Backblaze Backup), and storage (such as Backblaze B2), but others did not. (See Sync vs. Backup vs. Storage on our blog for an explanation of the differences.)
Software used for backups and maintaining file integrity included Arq, Carbon Copy Cloner, ChronoSync, SoftRAID, FreeNAS, corz checksum, rclone, rsync, Apple Time Machine, Capture One, Btrfs, BorgBackup, SuperDuper, restic, Acronis True Image, custom Python scripts, and smartphone apps PhotoTransfer and PhotoSync.
Cloud torrent services mentioned were Offcloud, Bitport, and Seedr.
A common practice mentioned is to use SSD (Solid State Drives) in the working computer or attached drives (or both) to improve speed and reliability. Protection from magnetic fields was another reason given to use SSDs.
Many users copy their media to multiple attached or network drives for redundancy.
Users of Lightroom reported keeping their Lightroom catalog on a local drive and their photo files on an attached drive. They frequently had different backup schemes for the catalog and the media. Many readers are careful to have multiple backups of their Lightroom catalog. Some expressed the desire to back up both their original raw files and their edited (working) raw files, but limitations in bandwidth and backup media caused some to give priority to good backups of their raw files, since the edited files could be recreated if necessary.
A number of smartphone users reported using Apple or Google Photos to store their photos and share them.
Digital Editing and Enhancement
Adobe still rules for many users for photo editing. Some expressed interest in alternatives from Phase One, Skylum (formerly Macphun), ON1, and DxO.
Adobe Lightroom
While Adobe Lightroom (and Adobe Photoshop for some) are the foundation of many users’ photo media workflow, others are still looking for something that might better suit their needs. A number of comments were made regarding Adobe’s switch to a subscription model.
Software used for image and video editing and enhancement included Adobe Lightroom, Adobe Photoshop, Luminar, Affinity Photo, Phase One, DxO, ON1, GoPro Quik, Apple Aperture (discontinued), Avid Media Composer, Adobe Premiere, and Apple Final Cut Studio (discontinued) or Final Cut Pro.
Luminar 2018 DAM preview
Managing, Archiving, Adding Metadata, Searching for Media Files
While some of our respondents are casual or serious amateur digital media users, others make a living from digital photography and videography. A number of our readers report having hundreds of thousands of files and many terabytes of data — even approaching one petabyte of data for one professional who responded. Whether amateur or professional, all shared the desire to preserve their digital media assets for the future. Consequently, they want to be able to attach metadata quickly and easily, and search for and retrieve files from wherever they are stored when necessary.
It’s not surprising that metadata was of great interest to our readers. Tagging, categorizing, and maintaining searchable records is important to anyone dealing with digital media.
While Lightroom was frequently used to manage catalogs, metadata, and files, others used spreadsheets to record archive location and grep for searching records.
Some liked the idea of Adobe’s Creative Cloud but weren’t excited about its cost and lack of choice in cloud providers.
Others reported using Photo Mechanic, DxO, digiKam, Google Photos, Daminion, Photo Supreme, Phraseanet, Phase One Media Pro, Google Picasa (discontinued), Adobe Bridge, Synology Photo Station, FotoStation, PhotoShelter, Flickr, and SmugMug.
Photo Mechanic 5
Common Wishes For Managing Digital Media in the Future
Our readers came through with numerous suggestions for how digital media management could be improved. There were a number of common themes centered around bigger and better storage, faster broadband or other ways to get data into the cloud, managing metadata, and ensuring integrity of their data.
Many wished for faster internet speeds that would make transferring and backing up files more efficient. This desire was expressed multiple times. Many said that the sheer volume of digital data they worked with made cloud services and storage impractical.
A number of readers would like the option to be able to ship files on a physical device to a cloud provider so that the initial large transfer would not take as long. Some wished to be able to send monthly physical transfers with incremental transfers send over the internet. (Note that Backblaze supports adding data via a hardware drive to B2 Cloud Storage with our Fireball service.)
Reasonable service cost, not surprisingly, was a desire expressed by just about everyone.
Many wished for not just backup, but long-term archiving of data. One suggestion was to be able to specify the length-of-term for archiving and pay by that metric for specific sets of files.
An easy-to-use Windows, Macintosh, or Linux client was a feature that many appreciated. Some were comfortable with using third-party apps for cloud storage and others wanted a vendor-supplied client.
A number of users like the combination of NAS and cloud. Many backed up their NAS devices to the cloud. Some suggested that the NAS should be the local gateway to unlimited virtual storage in the cloud. (They should read our recent blog post on Morro Data’s CloudNAS solution.)
Some just wanted the storage problem solved. They would like the computer system to manage storage intelligently so they don’t have to. One reader said that storage should be managed and optimized by the system, as RAM is, and not by the user.
Common Concerns Expressed by our Readers
Over and over again our readers expressed similar concerns about the state of digital asset management.
Dealing with large volumes of data was a common challenge. As digital media files increase in size, readers struggle to manage the amount of data they have to deal with. As one reader wrote, “Why don’t I have an online backup of my entire library? Because it’s too much damn data!”
Many said they would back up more often, or back up even more files if they had the bandwidth or storage media to do so.
The cloud is attractive to many, but some said that they didn’t have the bandwidth to get their data into the cloud in an efficient manner, the cloud is too expensive, or they have other concerns about trusting the cloud with their data.
Most of our respondents are using Apple computer systems, some Windows, and a few Linux. A lot of the Mac users are using Time Machine. Some liked the concept of Time Machine but said they had experienced corrupted data when using it.
Visibility into the backup process was mentioned many times. Users want to know what’s happening to their data. A number said they wanted automatic integrity checks of their data and reports sent to them if anything changes.
A number of readers said they didn’t want to be locked into one vendor’s proprietary solution. They prefer open standards to prevent loss if a vendor leaves the market, changes the product, or makes a turn in strategy that they don’t wish to follow.
A number of users talked about how their practices differed depending on whether they were working in the field or working in a studio or at home. Access to the internet and data transfer speed was an issue for many.
It’s clear that people working in high resolution photography and videography are pushing the envelope for moving data between storage devices and the cloud.
Some readers expressed concern about the integrity of their stored data. They were concerned that over time, files would degrade. Some asked for tools to verify data integrity manually, or that data integrity should be monitored and reported by the storage vendor on a regular basis. The OpenZFS and Btrfs file systems were mentioned by some.
A few readers mentioned that they preferred redundant data centers for cloud storage.
Metadata is an important element for many, and making sure that metadata is easily and permanently associated with their files is essential.
The ability to share working files with collaborators or finished media with clients, friends, and family also is a common requirement.
Thank You for Your Comments and Suggestions
As a cloud backup and storage provider, your contributions were of great interest to us. A number of readers made suggestions for how we can improve or augment our services to increase the options for digital media management. We listened and are considering your comments. They will be included in our discussions and planning for possible future services and offerings from Backblaze. We thank everyone for your contributions.
Digital media management
Let’s Keep the Conversation Going!
Were you surprised by any of the responses? Do you have something further to contribute? This is by no means the end of our exploration of how to better serve media professionals, so let’s keep the lines of communication open.
Първият път, когато се сблъсках със SCADA си помислих колко много потенциал има в тази платформа и колко ужасно дървено е реализирана тя. Дълго време за мен това беше пример за консервативна и егоцентрична система. Твърде затворена, скъпа, със сложно лицензиране – тя беше пълна противоположност на това, което се опитваше да бъде – универсална индустриална платформа за контрол и управление.
Времето обаче променя много неща. В наши дни вече има реализации, които са все по-отворени, поддържат все по-набъбващо количество протоколи и стандарти за интеграция, потребителските интерфейси са web-базирани, лицензирането е ясно и простичко (per server), данните се съхраняват в лесни за споделяне с други платформи бази-данни, имат все по-читави и разнообразни развойни средства. И най-важното – достъпни са и за по-малки и средни предприятия.
Те ще споделят от първо лице опита си, както с използването и внедряването ѝ в собствените им продукти, така във фабриките на техни клиенти.
Този път ще експериментираме с нов формат на събитието – ограничен малък брой наши гости ще могат да наблюдават презентацията на живо, да участват в дискусия и да задават въпросите си към лекторите ни, а след това и да останат за неформален разговор и networking помежду си. Това ще бъдат първите, които закупят VIP pass от сайта ни, преди да са се изчерпали местата.
Тези, които не могат или не успеят навреме да се регистрират за да присъстват, ще могат да гледат (само презентацията) чрез живо излъчване в нашия нов канал в YouTube на адрес https://trakia.tech/live или в последствие на запис, отново там. Това, разбира се, ще е безплатно, но без възможност за участие в дискусията и networking частта след нея.
Иначе всичко ще се случи на 11 декември (понеделник), от 16 часа, в Пловдив, при нашите любезни домакини Limacon. Заповядайте!
Since we launched the Oracle Weather Station project, we’ve collected more than six million records from our network of stations at schools and colleges around the world. Each one of these records contains data from ten separate sensors — that’s over 60 million individual weather measurements!
Weather station measurements in Oracle database
Weather data collection
Having lots of data covering a long period of time is great for spotting trends, but to do so, you need some way of visualising your measurements. We’ve always had great resources like Graphing the weather to help anyone analyse their weather data.
And from now on its going to be even easier for our Oracle Weather Station owners to display and share their measurements. I’m pleased to announce a new partnership with our friends at Initial State: they are generously providing a white-label platform to which all Oracle Weather Station recipients can stream their data.
Using Initial State
Initial State makes it easy to create vibrant dashboards that show off local climate data. The service is perfect for having your Oracle Weather Station data on permanent display, for example in the school reception area or on the school’s website.
But that’s not all: the Initial State toolkit includes a whole range of easy-to-use analysis tools for extracting trends from your data. Distribution plots and statistics are just a few clicks away!
Looks like Auntie Beryl is right — it has been a damp old year! (Humidity value distribution May–Nov 2017)
The wind direction data from my Weather Station supports my excuse as to why I’ve not managed a high-altitude balloon launch this year: to use my launch site, I need winds coming from the east, and those have been in short supply.
Chart showing wind direction over time
Initial State credientials
Every Raspberry Pi Oracle Weather Station school will shortly be receiving the credentials needed to start streaming their data to Initial State. If you’re super keen though, please email [email protected] with a photo of your Oracle Weather Station, and I’ll let you jump the queue!
The Initial State folks are big fans of Raspberry Pi and have a ton of Pi-related projects on their website. They even included shout-outs to us in the music video they made to celebrate the publication of their 50th tutorial. Can you spot their weather station?
Your home-brew weather station
If you’ve built your own Raspberry Pi–powered weather station and would like to dabble with the Initial State dashboards, you’re in luck! The team at Initial State is offering 14-day trials for everyone. For more information on Initial State, and to sign up for the trial, check out their website.
When James Puderer moved to Lima, Peru, his roadside runs left a rather nasty taste in his mouth. Hit by the pollution from old diesel cars in the area, he decided to monitor the air quality in his new city using Raspberry Pis and the abundant taxies as his tech carriers.
How to assemble the enclosure for my Taxi Datalogger project: https://www.hackster.io/james-puderer/distributed-air-quality-monitoring-using-taxis-69647e
Sensing air quality in Lima
Luckily for James, almost all taxies in Lima are equipped with the standard hollow vinyl roof sign seen in the video above, which makes them ideal for hacking.
With the onboard tech, the device collects data on longitude, latitude, humidity, temperature, pressure, and airborne particle count, feeding it back to an Android Things datalogger. This data is then pushed to Google IoT Core, where it can be remotely accessed.
Next, the data is processed by Google Dataflow and turned into a BigQuery table. Users can then visualize the collected measurements. And while James uses Google Maps to analyse his data, there are many tools online that will allow you to organise and study your figures depending on what final result you’re hoping to achieve.
James hopped in a taxi and took his monitor on the road, collecting results throughout the journey
James has provided the complete build process, including all tech ingredients and code, on his Hackster.io project page, and urges makers to create their own air quality monitor for their local area. He also plans on building upon the existing design by adding a 12V power hookup for connecting to the taxi, functioning lights within the sign, and companion apps for drivers.
Sensing the world around you
We’ve seen a wide variety of Raspberry Pi projects using sensors to track the world around us, such as Kasia Molga’s Human Sensor costume series, which reacts to air pollution by lighting up, and Clodagh O’Mahony’s Social Interaction Dress, which she created to judge how conversation and physical human interaction can be scored and studied.
Kasia Molga’s Human Sensor — a collection of hi-tech costumes that react to air pollution within the wearer’s environment.
Many people also build their own Pi-powered weather stations, or use the Raspberry Pi Oracle Weather Station, to measure and record conditions in their towns and cities from the roofs of schools, offices, and homes.
Have you incorporated sensors into your Raspberry Pi projects? Share your builds in the comments below or via social media by tagging us.
Amazon Cognito user pools are full-fledged identity providers (IdP) that you can use to maintain a user directory. The directory can scale to hundreds of millions of users and also add sign-up and sign-in support to your mobile or web applications.
You can provide access from an Amazon Cognito user pool to Amazon QuickSight with IAM identity federation in a serverless way. This solution uses Amazon API Gateway and AWS Lambda as a backend fronted by a simple web app. New features such as Amazon Cognito user pools app Integration make it even easier to add sign-in and sign-up logic to your application and federation use cases.
Solution architecture
Here’s how you can accomplish this solution.
In this scenario, your web app hosted on Amazon S3 integrates with Amazon Cognito User Pools to authenticate users. It uses Amazon Cognito Federated Identities to authorize access to Amazon QuickSight on behalf of the authenticated user, with temporary AWS credentials and appropriate permissions. The app then uses an ID token generated by Amazon Cognito to call API Gateway and Lambda to obtain a sign-in token for Amazon QuickSight from AWS Sign-In Federation. With this token, the app redirects access to Amazon QuickSight:
The Amazon Cognito hosted UI provided by the app integration domain performs all sign-in, sign-up, verification, and authentication logic for the web app. This allows you to register and authenticate users.
After a user is authenticated with a valid user name and password, an OpenID Connect token (ID token) is sent to Amazon Cognito Federated Identities. The token retrieves temporary AWS credentials based on an IAM role with “quickSight:CreateUser” permissions. These credentials are used to build a session string that is encoded into the URL https://signin.aws.amazon.com/federation?Action=getSigninToken.
The ID token, along with the encoded URL, is sent to API Gateway, which in turn verifies the token with a user pool authorizer to authorize the API call.
The URL is passed on to a Lambda function that calls the AWS SSO federation endpoint to retrieve a sign-In token.
AWS SSO processes the federation request, authenticates the user, and forwards the authentication token to Amazon QuickSight, which then uses the authentication token and authorizes user access.
How can you use, configure and test this serverless solution in your own AWS account? I created a simple SAM (Serverless Application Model) template that can be used to spin up all the resources needed for the solution.
Implementation steps
Using the AWS CLI, create an S3 bucket in the same region in which to deploy all resources:
Then, execute a couple of commands in a terminal or command prompt from inside the uncompressed folder where all files are located, referring to the S3 bucket created earlier:
AWS CloudFormation automatically creates and configures the following resources in your account:
Amazon CloudFront distribution
S3 static website
Amazon Cognito user pool
Amazon Cognito identity pool
IAM role for authenticated users
API Gateway API
Lambda function
You can follow the progress of the stack creation from the CloudFormation console. View the Outputs tab for the completed stack to get the identifiers of all created resources. You could also execute the following command with the AWS CLI:
aws cloudformation describe-stacks --query 'Stacks[0].[Outputs[].[OutputKey,OutputValue]]|[]' --output text --stack-name CognitoQuickSight
Use the information from the console or CLI command to replace the related resource identifiers in the file “auth.js”.
In the Amazon Cognito User Pools console, select the pool named QuickSightUsers generated by CloudFormation.
Under App integration, choose Domain name and create a domain. Domain names must be unique to the region. Add the domain to the “auth.js” file accordingly:
Choose App integration, App client settings and then select the option Cognito User Pool. Add the CloudFront distribution address (with https://, as SSL is a requirement for the callback/sign out URLs) and make sure that the address matches the related settings in the “auth.js” file exactly. For Allowed OAuth Flows, select implicit grant. For Allowed OAuth Scopes, select openid.
The app integration configuration is now done. Your “auth.js” file should look like the following:
The preceding resources don’t exist anymore. The CloudFormation stack that generated them was deleted. I recommend that you delete your stack after testing, for cleanup purposes. Deleting the stack also deletes all the resources.
Next, upload the four JS and HTML files to the S3 bucket named “cognitoquicksight-s3website-xxxxxxxxx”. Make sure that all files are publicly readable:
Congratulations, the configuration part is now finished!
It’s time to create your first user. Access your CloudFront distribution address in a browser and choose SIGN IN / SIGN UP.
On the Amazon Cognito hosted UI, choose SIGN UP and provide a user name, password and a valid email.
You receive a verification code in email to confirm the user.
In a production system, you might not want to allow open access to your dashboards. As you now have a confirmed user, you can disable the sign-up functionality altogether to avoid letting other users sign themselves up.
In the Amazon Cognito User Pools console, choose General settings, Policies and select Only allow administrators to create users.
In the web app, you can now sign in as the Amazon Cognito user to access the Amazon QuickSight console. Because this is the first time this user is accessing Amazon QuickSight with an IAM role, provide your email address and sign up as an Amazon QuickSight user.
Enjoy your federated access to Amazon QuickSight!
After you’re done testing, go to the CloudFormation console and delete the CognitoQuickSight stack to remove all the resources.
Extending and customizing the solution
Additionally, you could configure SAML federation for your user pool with a couple of clicks, following the instructions in the Amazon Cognito User Pools supports federation with SAML post. If you add more than one SAML IdP, Amazon Cognito can identify the provider to which to redirect the authentication request, based on the user’s corporate email address.
It’s important to understand that while Amazon Cognito User Pools is authenticating (AuthN) the user, the IAM role created for the identity pool is authorizing (AuthZ) the user to perform actions on specific resources. As it is configured, the role only allows “quickSight:CreateUser” permissions. For additional permissions, modify the role accordingly, as in Setting Your IAM Policy. If your users create datasets, remember to add access to data sources such as Amazon S3.
You can customize this solution further by adding multiple groups to your user pool and associating each group with different IAM roles. For more information, see Amazon Cognito Groups and Fine-Grained Role-Based Access Control. For instance, with role-based access control, it’s possible to have a group for Amazon QuickSight administrators and one for users.
You can also modify the sign-In URL (Step 6) to redirect your user to any other service console, provided that the IAM role has appropriate permissions. After receiving a valid sign-in token from the SSO federation endpoint, the user is redirected to https://quicksight.aws.amazon.com. However, you can change the redirection to the main AWS console at https://console.aws.amazon.com. You could also change it to specific services, such as Amazon Redshift, Amazon EMR, Amazon Elasticsearch Service and Kibana (In this particular case the application needs to return an AWS Signature Version 4 – Sigv4 – signed URL based on the temporary credentials received from Cognito instead of calling the AWS Sign-In Federation endpoint), Amazon Kinesis, or AWS Lambda. Or, you can even customize a frontend portal with separate links to multiple specific services and resources that the user is only allowed to access provided that the IAM role being assumed has access to those services and resources.
Conclusion
With the power, flexibility, security, scalability, and the new federation and application integration features of Amazon Cognito user pools, there’s no need to worry about the undifferentiated heavy lifting of maintaining your own identity servers. This allows you to focus on application logic and secure access to other great AWS services, such as Amazon QuickSight.
If you have questions or suggestions, please comment below.
About the Author
Ed Lima is a Solutions Architect who helps AWS customers with their journey in the cloud. He has provided thought leadership to define and drive strategic direction for adoption of Amazon platforms and technologies, skillfully adapting and blending business requirements with technical aspects to achieve the best outcome helping implement well architected solutions. In his spare time, he enjoys snowboarding.
Вероятно сте чували, а може би още не сте, но Регламентът за защита на личните данни (GDPR) е вече факт. След 4 години на обсъждане, лобиране и събиране на примери и добри практики Европейският съюз създаде нормативен акт, който променя мисленето ни за личните данни. Разбирането на нуждата от GDPR минава през осъзнаването, че четвъртата индустриална революция се случва сега. Все по–често конкурентното предимство на новите бизнес модели спрямо утвърдените такива се състои в достъпа, анализа и управлението на данни.
Политически кампании се печелят и компании излизат на върха на класациите чрез дейности по профилиране, анализ на големи масиви от данни и персонализиран маркетинг.
Какво променя GDPR и готова ли е компанията ви за новите правила или по-точно – кога е трябвало да бъде готова? Какво реално ще трябва да промените – организация, техническа инфраструктура или нагласа на служителите и ръководството? Ще може ли френския държавен орган по защитата на данните да вземе отношение спрямо нарушение, извършено от компания в Пазарджик? Какво е доклад относно въздействието от защитата на данни? Можете ли да продадете бизнеса си или да сключите стратегически договор без такъв доклад?
На 30 август в Пловдив експерти от адвокатско дружество Точева и Мандажиева заедно с Trakia Tech ще се опитат да отговорят на тези въпроси. Обещаваме Ви да е интересно и полезно. Нетърпеливи сме да чуем Вашите въпроси, които ще ни накарат да мислим още по-задълбочено в тази насока. Събитието организираме съвместно с нашите партньори от Капитал и ETNHost.
Работният език е български, а събитието е в малък формат и местата са ограничени, затова регистрацията на адрес capital.bg/gdpr е задължителна.
Очакваме ви в Център за събития Limacon, чийто адрес е: Пловдив, бул. Марица 154A (до големия паркинг на Билла).
The Bicrophonic Sonic Bike, created by British sound artist Kaffe Matthews, utilises a Raspberry Pi and GPS signals to map location data and plays music and sound in response to the places you take it on your cycling adventures.
Bicrophonics is about the mobility of sound, experienced and shared within a moving space, free of headphones and free of the internet. Music made by the journey you take, played with the space that you move through. The Bicrophonic Research Institute (BRI) http://sonicbikes.net
Cycling and music
I’m sure I wasn’t the only teen to go for bike rides with a group of friends and a radio. Spurred on by our favourite movie, the mid-nineties classic Now and Then, we’d hook up a pair of cheap portable speakers to our handlebars, crank up the volume, and sing our hearts out as we cycled aimlessly down country lanes in the cool light evenings of the British summer.
While Sonic Bikes don’t belt out the same classics that my precariously attached speakers provided, they do give you the same sense of connection to your travelling companions via sound. Linked to GPS locations on the same preset map of zones, each bike can produce the same music, creating a cloud of sound as you cycle.
Sonic Bikes
The Sonic Bike uses five physical components: a Raspberry Pi, power source, USB GPS receiver, rechargeable speakers, and subwoofer. Within the Raspberry Pi, the build utilises mapping software to divide a map into zones and connect each zone with a specific music track.
Custom software enables the Raspberry Pi to locate itself among the zones using the USB GPS receiver. Then it plays back the appropriate track until it registers a new zone.
Bicrophonic Research Institute
The Bicrophonic Research Institute is a collective of artists and coders with the shared goal of creating sound directed by people and places via Sonic Bikes. In their own words:
Bicrophonics is about the mobility of sound, experienced and shared within a moving space, free of headphones and free of the internet. Music made by the journey you take, played with the space that you move through.
Their technology has potential beyond the aims of the BRI. The Sonic Bike software could be useful for navigation, logging data and playing beats to indicate when to alter speed or direction. You could even use it to create a guided cycle tour, including automatically reproduced information about specific places on the route.
For the creators of Sonic Bike, the project is ever-evolving, and “continues to be researched and developed to expand the compositional potentials and unique listening experiences it creates.”
Sensory Bike
A good example of this evolution is the Sensory Bike. This offshoot of the Sonic Bike idea plays sounds guided by the cyclist’s own movements – it acts like a two-wheeled musical instrument!
a work for Sensory Bikes, the Berlin wall and audience to ride it. ‘ lean to go up, slow to go loud ‘ explores freedom and celebrates escape. Celebrating human energy to find solutions, hot air balloons take off, train lines sing, people cheer and nature continues to grow.
Sensors on the wheels, handlebars, and brakes, together with a Sense HAT at the rear, register the unique way in which the rider navigates their location. The bike produces output based on these variables. Its creators at BRI say:
The Sensory Bike becomes a performative instrument – with riders choosing to go slow, go fast, to hop, zigzag, or circle, creating their own unique sound piece that speeds, reverses, and changes pitch while they dance on their bicycle.
Build your own Sonic Bike
As for many wonderful Raspberry Pi-based builds, the project’s code is available on GitHub, enabling makers to recreate it. All the BRI team ask is that you contact them so they can learn more of your plans and help in any way possible. They even provide code to create your own Sonic Kayak using GPS zones, temperature sensors, and an underwater microphone!
Sonic Kayaks are musical instruments for expanding our senses and scientific instruments for gathering marine micro-climate data. Made by foAm_Kernow with the Bicrophonic Research Institute (BRI), two were first launched at the British Science Festival in Swansea Bay September 6th 2016 and used by the public for 2 days.
Oh, sure, you can find fringe wacko who also knows crypto that agrees with you but all the sane members of the security community will not.
Telegram is not trustworthy because it’s partially closed-source. We can’t see how it works. We don’t know if they’ve made accidental mistakes that can be hacked. We don’t know if they’ve been bribed by the NSA or Russia to put backdoors in their program. In contrast, PGP and Signal are open-source. We can read exactly what the software does. Indeed, thousands of people have been reviewing their software looking for mistakes and backdoors. Being open-source doesn’t automatically make software better, but it does make hiding secret backdoors much harder.
Telegram is not trustworthy because we aren’t certain the crypto is done properly. Signal, and especially PGP, are done properly.
The thing about encryption is that when done properly, it works. Neither the NSA nor the Russians can break properly encrypted content. There’s no such thing as “military grade” encryption that is better than consumer grade. There’s only encryption that nobody can hack vs. encryption that your neighbor’s teenage kid can easily hack. Those scenes in TV/movies about breaking encryption is as realistic as sound in space: good for dramatic presentation, but not how things work in the real world.
In particular, end-to-end encryption works. Sure, in the past, such apps only encrypted as far as the server, so whoever ran the server could read your messages. Modern chat apps, though, are end-to-end: the servers have absolutely no ability to decrypt what’s on them, unless they can get the decryption keys from the phones. But some tasks, like encrypted messages to a group of people, can be hard to do properly.
Thus, in contrast to what John Schindler says, while we techies have doubts about Telegram, we don’t have doubts about Russia authorities having access to Signal and PGP messages.
Snowden hatred has become the anti-vax of crypto. Sure, there’s no particular reason to trust Snowden — people should really stop treating him as some sort of privacy-Jesus. But there’s no particular reason to distrust him, either. His bland statements on crypto are indistinguishable from any other crypto-enthusiast statements. If he’s a Russian pawn, then so too is the bulk of the crypto community.
With all this said, using Signal doesn’t make you perfectly safe. The person you are chatting with could be a secret agent — especially in group chat. There could be cameras/microphones in the room where you are using the app. The Russians can also hack into your phone, and likewise eavesdrop on everything you do with the phone, regardless of which app you use. And they probably have hacked specific people’s phones. On the other hand, if the NSA or Russians were widely hacking phones, we’d detect that this was happening. We haven’t.
Signal is therefore not a guarantee of safety, because nothing is, and if your life depends on it, you can’t trust any simple advice like “use Signal”. But, for the bulk of us, it’s pretty damn secure, and I trust neither the Russians nor the NSA are reading my Signal or PGP messages.
At first blush, this @20committee tweet appears to be non-experts opining on things outside their expertise. But in reality, it’s just obtuse partisanship, where truth and expertise doesn’t matter. Nothing you or I say can change some people’s minds on this matter, no matter how much our expertise gives weight to our words. This post is instead for bystanders, who don’t know enough to judge whether these crazy statements have merit.
Bonus:
So let’s talk about “every crypto expert“. It’s, of course, impossible to speak for every crypto expert. It’s like saying how the consensus among climate scientists is that mankind is warming the globe, while at the same time, ignoring the wide spread disagreement on how much warming that is.
The same is true here. You’ll get a widespread different set of responses from experts about the above tweet. Some, for example, will stress my point at the bottom that hacking the endpoint (the phone) breaks all the apps, and thus justify the above tweet from that point of view. Others will point out that all software has bugs, and it’s quite possible that Signal has some unknown bug that the Russians are exploiting.
So I’m not attempting to speak for what all experts might say here in the general case and what long lecture they can opine about. I am, though, pointing out the basics that virtually everyone agrees on, the consensus of open-source and working crypto.
Our identities are what define us as human beings. Philosophical discussions aside, it also applies to our day-to-day lives. For instance, I need my work badge to get access to my office building or my passport to travel overseas. My identity in this case is attached to my work badge or passport. As part of the system that checks my access, these documents or objects help define whether I have access to get into the office building or travel internationally.
This exact same concept can also be applied to cloud applications and APIs. To provide secure access to your application users, you define who can access the application resources and what kind of access can be granted. Access is based on identity controls that can confirm authentication (AuthN) and authorization (AuthZ), which are different concepts. According to Wikipedia:
“The process of authorization is distinct from that of authentication. Whereas authentication is the process of verifying that “you are who you say you are,” authorization is the process of verifying that “you are permitted to do what you are trying to do.” This does not mean authorization presupposes authentication; an anonymous agent could be authorized to a limited action set.”
Amazon Cognito allows building, securing, and scaling a solution to handle user management and authentication, and to sync across platforms and devices. In this post, I discuss the different ways that you can use Amazon Cognito to authenticate API calls to Amazon API Gateway and secure access to your own API resources.
Amazon Cognito Concepts
It’s important to understand that Amazon Cognito provides three different services:
Today, I discuss the use of the first two. One service doesn’t need the other to work; however, they can be configured to work together.
Amazon Cognito Federated Identities
To use Amazon Cognito Federated Identities in your application, create an identity pool. An identity pool is a store of user data specific to your account. It can be configured to require an identity provider (IdP) for user authentication, after you enter details such as app IDs or keys related to that specific provider.
After the user is validated, the provider sends an identity token to Amazon Cognito Federated Identities. In turn, Amazon Cognito Federated Identities contacts the AWS Security Token Service (AWS STS) to retrieve temporary AWS credentials based on a configured, authenticated IAM role linked to the identity pool. The role has appropriate IAM policies attached to it and uses these policies to provide access to other AWS services.
Amazon Cognito Federated Identities currently supports the IdPs listed in the following graphic.
In another symptom of climate change, Chile’s largest squid producer “plans to diversify its offering in the future, selling sea urchin, cod and octopus, to compensate for the volatility of giant squid catches….”
As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.
Last year we launched new AWS Regions in Canada, India, Korea, the UK (London), and the United States (Ohio), and announced that new regions are coming to France (Paris) and China (Ningxia).
Today, I am happy to be able to tell you that we are planning to open up an AWS Region in Stockholm, Sweden in 2018. This region will give AWS partners and customers in Denmark, Finland, Iceland, Norway, and Sweden low-latency connectivity and the ability to run their workloads and store their data close to home.
The Nordics is well known for its vibrant startup community and highly innovative business climate. With successful global enterprises like ASSA ABLOY, IKEA, and Scania along with fast growing startups like Bambora, Supercell, Tink, and Trustpilot, it comes as no surprise that Forbes ranks Sweden as the best country for business, with all the other Nordic countries in the top 10. Even better, the European Commission ranks Sweden as the most innovative country in EU.
This will be the fifth AWS Region in Europe joining four other Regions there — EU (Ireland), EU (London), EU (Frankfurt) and an additional Region in France expected to launch in the coming months. Together, these Regions will provide our customers with a total of 13 Availability Zones (AZs) and allow them to architect highly fault tolerant applications while storing their data in the EU.
Today, our infrastructure comprises 42 Availability Zones across 16 geographic regions worldwide, with another three AWS Regions (and eight Availability Zones) in France, China and Sweden coming online throughout 2017 and 2018, (see the AWS Global Infrastructure page for more info).
We are looking forward to serving new and existing Nordic customers and working with partners across Europe. Of course, the new region will also be open to existing AWS customers who would like to process and store data in Sweden. Public sector organizations (government agencies, educational institutions, and nonprofits) in Sweden will be able to use this region to store sensitive data in-country (the AWS in the Public Sector page has plenty of success stories drawn from our worldwide customer base).
If you are a customer or a partner and have specific questions about this Region, you can contact our Nordic team.
Help Wanted As part of our launch, we are hiring individual contributors and managers for IT support, electrical, logistics, and physical security positions. If you are interested in learning more, please contact [email protected].
Think about all of the websites you visit every day. Now imagine if the likes of Time Warner, AT&T, and Verizon collected all of your browsing history and sold it on to the highest bidder. That’s what will probably happen if Congress has its way.
This week, lawmakers voted to allow Internet service providers to violate your privacy for their own profit. Not only have they voted to repeal a rule that protects your privacy, they are also trying to make it illegal for the Federal Communications Commission to enact other rules to protect your privacy online.
That this is not provoking greater outcry illustrates how much we’ve ceded any willingness to shape our technological future to for-profit companies and are allowing them to do it for us.
There are a lot of reasons to be worried about this. Because your Internet service provider controls your connection to the Internet, it is in a position to see everything you do on the Internet. Unlike a search engine or social networking platform or news site, you can’t easily switch to a competitor. And there’s not a lot of competition in the market, either. If you have a choice between two high-speed providers in the US, consider yourself lucky.
What can telecom companies do with this newly granted power to spy on everything you’re doing? Of course they can sell your data to marketers — and the inevitable criminals and foreign governments who also line up to buy it. But they can do more creepy things as well.
They can snoop through your traffic and insert their own ads. They can deploy systems that remove encryption so they can better eavesdrop. They can redirect your searches to other sites. They can install surveillance software on your computers and phones. None of these are hypothetical.
They’re all things Internet service providers have done before, and they are some of the reasons the FCC tried to protect your privacy in the first place. And now they’ll be able to do all of these things in secret, without your knowledge or consent. And, of course, governments worldwide will have access to these powers. And all of that data will be at risk of hacking, either by criminals and other governments.
Telecom companies have argued that other Internet players already have these creepy powers — although they didn’t use the word “creepy” — so why should they not have them as well? It’s a valid point.
Surveillance is already the business model of the Internet, and literally hundreds of companies spy on your Internet activity against your interests and for their own profit.
Your e-mail provider already knows everything you write to your family, friends, and colleagues. Google already knows our hopes, fears, and interests, because that’s what we search for.
Your cellular provider already tracks your physical location at all times: it knows where you live, where you work, when you go to sleep at night, when you wake up in the morning, and — because everyone has a smartphone — who you spend time with and who you sleep with.
And some of the things these companies do with that power is no less creepy. Facebook has run experiments in manipulating your mood by changing what you see on your news feed. Uber used its ride data to identify one-night stands. Even Sony once installed spyware on customers’ computers to try and detect if they copied music files.
Aside from spying for profit, companies can spy for other purposes. Uber has already considered using data it collects to intimidate a journalist. Imagine what an Internet service provider can do with the data it collects: against politicians, against the media, against rivals.
Of course the telecom companies want a piece of the surveillance capitalism pie. Despite dwindlingrevenues, increasinguse of ad blockers, and increases in clickfraud, violating our privacy is still a profitable business — especially if it’s done in secret.
The bigger question is: why do we allow for-profit corporations to create our technological future in ways that are optimized for their profits and anathema to our own interests?
When markets work well, different companies compete on price and features, and society collectively rewards better products by purchasing them. This mechanism fails if there is no competition, or if rival companies choose not to compete on a particular feature. It fails when customers are unable to switch to competitors. And it fails when what companies do remains secret.
Unlike service providers like Google and Facebook, telecom companies are infrastructure that requires government involvement and regulation. The practical impossibility of consumers learning the extent of surveillance by their Internet service providers, combined with the difficulty of switching them, means that the decision about whether to be spied on should be with the consumer and not a telecom giant. That this new bill reverses that is both wrong and harmful.
Today, technology is changing the fabric of our society faster than at any other time in history. We have big questions that we need to tackle: not just privacy, but questions of freedom, fairness, and liberty. Algorithms are making decisions about policing, healthcare.
Driverless vehicles are making decisions about traffic and safety. Warfare is increasingly being fought remotely and autonomously. Censorship is on the rise globally. Propaganda is being promulgated more efficiently than ever. These problems won’t go away. If anything, the Internet of things and the computerization of every aspect of our lives will make it worse.
In today’s political climate, it seems impossible that Congress would legislate these things to our benefit. Right now, regulatory agencies such as the FTC and FCC are our best hope to protect our privacy and security against rampant corporate power. That Congress has decided to reduce that power leaves us at enormous risk.
It’s too late to do anything about this bill — Trump will certainly sign it — but we need to be alert to future bills that reduce our privacy and security.
EDITED TO ADD: Former FCC Commissioner Tom Wheeler wrote a good op-ed on the subject. And here’s an essay laying out what this all means to the average Internet user.
The collective thoughts of the interwebz
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.