Shining a HOT light on optomechanics

Post Syndicated from Paul Seidler original

We all use micromechanical devices in our daily lives, although we may be less aware of them than electronic and optical technologies. Micromechanical sensors, for example, detect the motion of our smartphones and cars, allowing us to play the latest games and drive safely, and mechanical resonators serve as filters to extract the relevant cellular network signal from the broadband radio waves caught by a mobile phone’s antenna. For applications such as these, micromechanical devices are integrated with their control and readout electronics in what are known as micro-electro-mechanical systems (MEMS). Decades of research and development have turned MEMS into a mature technology, with a global market approaching one hundred billion US dollars.

In recent years, a community of researchers from various universities and institutes across Europe and the United States set out to explore the physics of micro- and nano-mechanical devices coupled to light. The initial focus of these investigations was on demonstrating and exploiting uniquely quantum effects in the interaction of light and mechanical motion, such as quantum superposition, where a mechanical oscillator occupies two places simultaneously. The scope of this work quickly broadened as it became clear that these so-called optomechanical devices would open the door to a broad range of new applications.

Hybrid Optomechanical Technologies (HOT) is a research and innovation action funded by the European Commission’s FET Proactive program that supports future and emerging technologies at an early stage. HOT is laying the foundation for a new generation of devices that bring together several nanoscale platforms in a single hybrid system. It unites researchers from thirteen leading academic groups and four major industrial companies across Europe working to bring technologies to market that exploit the combination of light and motion.

One key set of advances made in the HOT consortium involves a family of non-reciprocal optomechanical devices , including optomechanical circulators. Imagine a device that acts like a roundabout for light or microwaves , where a signal input from one port emerges from a second port, and a signal input from that second port emerges from a third one, and so on. Such a device is critical to signal processing chains in radiofrequency or optical systems, as it allows efficient distribution of information among sources and receivers and protection of fragile light sources from unwanted back-reflections. It has however proven very tricky to implement a circulator at small scales without involving strong magnetic fields to facilitate the required unidirectional flow of signals.

Introducing a mechanical component makes it possible to overcome this limitation. Motion induced by optical forces causes light to flow in one direction through the roundabout. The resulting devices are more compact, do not require strong permanent magnets, and are therefore more amenable to large-scale device integration.

HOT researchers have also created mechanical systems that are simultaneously coupled to an electric and an optical resonator. These quintessentially hybrid devices interconvert electronic and optical signals via a mechanical intermediary, and they do so with very low added noise, high quantum efficiency, and a compact footprint. This makes them interesting for applications that benefit from the advantages of analog signal transmission over optical fibers instead of copper cables, such as those requiring high bandwidth, low loss, low crosstalk, and immunity to harsh environmental conditions.

An example of such a device is a receiver for a magnetic resonance imaging (MRI) scanner, as used in hospitals for three-dimensional imaging inside the human body. In MRI, tiny electronic signals are collected from several sensors on a patient inside the scanner. The signals need to be extracted from the scanner in the presence of large magnetic fields of several tesla, with the lowest possible distortion, to form high-resolution images. Conversion to the optical domain provides a means of protecting the signal. A prototype of a MRI sensor that uses optical readout has been developed by HOT researchers.

Another application of simultaneous optical and electronic control over mechanical resonators is the realization of very stable oscillators. These can function as on-chip clocks and microwave sources with ultrahigh purity. HOT researchers filed a patent application that shows how to stabilize nanoscale mechanical resonators that naturally oscillate at gigahertz frequencies driven by optical and electric fields. Combining all components on a single chip makes such devices extremely compact.

A somewhat more exotic application of hybrid transducers, but one with potentially far-reaching implications, is the interconnection of quantum computers. Quantum computers hold the promise of tackling computational problems that our current classical computers will never be able to solve. The leading contender as the platform for future quantum computers encodes information in microwave photons confined in superconducting circuits to form qubits. Unlike the bits used in conventional computers that take on values of either 0 or 1, qubits can exist in states representing both 0 and 1 simultaneously. The qubits however are bound to the ultracold environment of a dilution refrigerator to prevent thermal noise from destroying their fragile quantum states. Transferring quantum information to and from computing nodes, even within a quantum data center, will require conversion of the stationary superconducting qubits to so-called flying qubits that can be transmitted between separate locations. Optical photons represent a particularly attractive option for flying qubits, as they are robust at room temperature and thus provide one of the few practical means of transmitting quantum states over distances greater than a few meters. In fact, the transfer of quantum information encoded in optical photons is now routinely achieved over distances of hundreds of kilometers.

A key prerequisite for quantum networking is therefore quantum-coherent bidirectional conversion between microwave and optical frequencies. To date, no experimental demonstration exists of efficient transduction at the level of individual quantum states. However, many research groups around the world are diligently pursuing various possible solutions. The approaches that have come the closest so far utilize a mechanical system as an intermediary, and this is where the technologies pursued by the HOT consortium come into play.

HOT researchers have created compact chip-scale devices on commercially available silicon wafers that are fully compatible with both silicon photonics and superconducting qubit technology. The unique optomechanical designs developed by the HOT consortium exploit strong optical field confinement, producing large optomechanical coupling. As a  result, electrical signals at the gigahertz frequencies typical of superconducting qubits can be coherently converted to optical frequencies commonly used for telecommunication. Such integrated photonic devices employing optomechanical coupling are often plagued by the deleterious effects of heating due to absorption of high-intensity light. The thermal problems can be circumvented by optimizing the device design and using alternative dielectric materials, and internal efficiencies exceeding unity have been achieved for ultra-low optical pump powers.

With the capabilities provided by such transducers, the power of quantum information processing could be brought to a whole new class of tasks, such as secure data sharing, in addition to creating networks of quantum devices.

As hybrid optomechanical systems enter the quantum regime , new challenges emerge. One particularly important consideration is that the very act of measuring the state of a system must be rethought. Contrary to our everyday experience, quantum mechanics requires that any measurement exerts some inevitable backaction onto the system being measured. This often has adverse effects; the response of an optomechanical sensor to a signal of interest, for example, can be washed out by the backaction caused by reading out the sensor. Luckily, these effects are well understood today, and can be corrected for using advanced quantum measurement and control techniques.

HOT researchers have pioneered the application of such techniques to mechanical sensors. They have shown how quantum state estimation and feedback can help overcome the measurement challenges. The classical counterparts of these approaches are widely used in many areas of engineering and are familiar to consumers in such products as noise-cancelling headphones. 

In the setting of optomechanics, they have been used to measure and control the quantum state of motion of a mechanical sensor. For example, HOT researchers have managed to limit the random thermal fluctuations of a vibrating drum to the minimal level allowed by quantum mechanics. This provides an excellent starting point to detect even the smallest forces exerte by other quantum systems like a single electron or photon.

With its focus on real-world technologies, the HOT consortium also considers such practical matters as device packaging and large-scale fabrication. Optomechanical devices require electronic and optical connectivity in a package that also keeps the mechanical element under vacuum. Whereas such demands have been met separately before, consortium member and industry giant STMicroelectronics is addressing their combination in a single device package as well as the potential for mass production.

This project is financed by the European Commission through its Horizon 2020 research and innovation programme under grant agreement no. 732894 (FET-Proactive HOT).

For more information about Hybrid Optomechanical Technologies, visit our website , like our Facebook page , and follow our Twitter stream. You can watch our videos on YouTube , including an explanation of the optical circulator built by the network and how to silence a quantum drum.

Bright X-Rays, AI, and Robotic Labs—A Roadmap for Better Batteries

Post Syndicated from Steven Cherry original

Steven Cherry Hi, this is Steven Cherry for Radio Spectrum.

Batteries have come a long way. What used to power flashlights and toys, Timex watches and Sony Walkmans, are now found in everything from phones and laptops to cars and planes.

Batteries all work the same: Chemical energy is converted to electrical energy by creating a flow of electrons from one material to another; that flow generates an electrical current.

Yet batteries are also wildly different, both because the light bulb in a flashlight and the engine in a Tesla have different needs, and because battery technology keeps improving as researchers fiddle with every part of the system: the two chemistries that make up the anode and the cathode, and the electrolyte and how the ions pass through it from one to the other.

A Chinese proverb says, “Give a man a fish, and you feed him for a day. Teach a man to fish, and you feed him for a lifetime.” The Christian Bible says, “follow me and I will make you fishers of men.”

In other words, a more engineering-oriented proverb would say, “let’s create a lab and develop techniques for measuring the efficacy of different fishing rods, which will help us develop different rods for different bodies of water and different species of fish.”

The Argonne National Laboratory is one such lab. There, under the leadership of Venkat Srinivasan, director of its Collaborative Center for Energy Storage Science, a team of scientists has developed a quiver of techniques for precisely measuring the velocity and behavior of ions and comparing it to mathematical models of battery designs.

Venkat Srinivasan [Ven-kat Sri-ni-va-san] is also deputy director of Argonne’s Joint Center for Energy Storage Research, a national program that looks beyond the current generation of lithium–ion batteries. He was previously a staff scientist at Lawrence Berkeley National Laboratory, wrote a popular blog, “This Week in Batteries,” and is my guest today via Teams.

Venkat, welcome to the podcast.

Venkat Srinivasan Thank you so much. I appreciate the time. I always love talking about batteries, so it’d be great to have this conversation.

Steven Cherry I think I gave about as simplistic a description of batteries as one could give, maybe we could start with what are the main battery types today and why is one better than another for a given application?

Venkat Srinivasan So, Steve, there are two kinds of batteries that I think all of us use in our daily lives. One of them is a primary battery. The ones that you don’t recharge. So a common one is something that you might be putting in your children’s toys or something like that.

The second, which I think is the one that is sort of powering everything that we think of, things like electric cars and grid storage or rechargeable batteries. So these are the ones where we have to go back and charge them again. So let’s talk a little bit more about rechargeable batteries that are a number of them that are sitting somewhere in the world. You have lead–acid batteries that are sitting in your car today. They’ve been sitting there for the last 30, 40 years where they used to stop the car for lighting the car up when the engine is not on. This is something that will continue to be in our cars for quite some time.


You’re also seeing lithium–ion batteries that are not powering the car itself. Instead of having internal combustion engine and gasoline, you’re seeing more pure chemicals coming out that have lithium–ion batteries. And then the third battery, which we sort of don’t see, but we have some in different places are nickel–cadmium and metal–hydride batteries. These are kind of going away slowly. But the Toyota Prius is a great example of a nickel–metal hybrid. But many people still drive Priuses—I have one—that still has nickel-metal batteries in them. These are some of the classes of materials that are more common. But there are others, like flow batteries, that people haven’t really probably thought about and haven’t seen, which is being researched quite a bit, there are companies that are trying to install flow batteries for grid storage, which are also rechargeable batteries that are of a different type.

The most prevalent of these is lithium–ion; that’s the chemistry that has completely changed electric vehicle transportation. It’s changed the way we speak on our phones. The iPhone would not be possible if not for the lithium–ion battery. It’s the battery that has pretty much revolutionized all of transportation. And it’s the reason why the Nobel Prize two years ago went to the lithium–ion batteries for the discovery and ultimately the commercialization of the technology—it’s because it had such a wide impact.

Steven Cherry I gather that remarkably, we’ve designed all these different batteries and can power a cell phone for a full day and power a car from New York to Boston without fully understanding the chemistry involved. I’m going to offer a comparison and I’d like you to say whether it’s accurate or not.

We developed vaccines for smallpox beginning in 1798; we ended smallpox as a threat to humanity—all without understanding the actual mechanisms at the genetic level or even the cellular level by which the vaccine confers immunity. But the coronavirus vaccines we’re now deploying were developed in record time because we were able to study the virus and how it interacts with human organs at those deeper levels. And the comparison here is that with these new techniques developed at Argonne and elsewhere, we can finally understand battery chemistry at the most fundamental level.

Venkat Srinivasan That is absolutely correct. If you go back in time and ask yourself, what about the batteries like the acid batteries and the nickel–cadmium batteries—did we invent them in some systematic fashion? Well, I guess not, right?

Certainly once the materials were discovered, there was a lot of innovation that went into it using what was state-of-the-art techniques at that time to make them better and better and better. But to a large extent, the story that you just said about the vaccines with the smallpox is probably very similar to the kinds of things that are happening in batteries, the older chemistries.

The world has changed now. If you look at the kinds of things we are doing today, like you said, that in a variety of techniques, both experimental but also mathematical, meaning, now computer simulations have come to our aid and now we’re able to take a deeper understanding on how batteries behave and then use that to discover new materials—first, maybe on a computer, but certainly in the lab at some point. So this is something that is also happening in the battery world. The kinds of innovations you are seeing now with COVID vaccines are the kinds of things we are seeing happen in the battery world in terms of discovering the next big breakthrough.

Steven Cherry So I gather the main technology you’re using now is ultraright X-rays and you’re using it to come up with for the first time the electrical current, something known as the transport number. Let’s let’s start with the X-rays.

Venkat Srinivasan We used to cycle the battery up back. Things used to happen to them. We then had to open up the battery and see what happened on the inside. And as you can imagine, right when you open up a battery, you hope that nothing changes by the time you take it to your experimental technique of choice to look at what’s happening on the inside. But oftentimes things change. So what you have inside the battery during its operation may not be the same as what you’re probing when you open up the cell. So a trend that’s been going on for some time now is to say, well, maybe we should be thinking about in situ to operando methods, meaning inside the party’s environment during operation, trying to find more information in the cell.

Typically all battery people will do is they’ll send a current into the battery and then measure the potential or vice versa. That’s a common thing that’s done. So what we are trying to do now is do one more thing on top of that: Can we probe something on the inside without opening up the cell? X-rays come into play because these are extremely powerful light, they can go through the battery casing, go into the cell, and you can actually start seeing things inside the battery itself during operando operation, meaning you can pass current keep the battery in the environment you want it to be and send the X-ray beam and see what’s happening on the inside.

So this is a trend that we’ve been slowly exploring, going back a decade. And a decade ago, we probably did not have the resolution to be able to see things at a very minute scale. So we were seeing maybe a few tens of microns of what was happening in these batteries. Maybe we were measuring things once every minute or so, but we’re slowly getting better and better; we’re making the resolution tighter, meaning we can see smaller features and we are trying to get the time resolution such that we can see things at a faster and faster time. So that trend is something that is going to is helping us and we continue to help us make batteries better.

Steven Cherry So if I could push my comparison a little further, we developed the COVID vaccines in record time and with stunning efficiency. I mean, 95 percent effective right out of the gate. Will this new ability to look inside the battery while it’s in operation, will this create new generations of better batteries in record time?

Venkat Srinivasan That will be the hope. And I do want to bring in two aspects that I think work complementarily with each other. One is the extreme techniques—and related techniques like X-ray, so we should not forget that there are non-X-ray techniques also that give us information that can be crucially important. But along with that, there has been this revolution in computing that has really come to the forefront in the last five to 10 years. What this computing revolution is that basically because computers are getting more and more powerful and computing resources are getting cheaper, we are able to now start to calculate on computers all sorts of things. For example, we can calculate how much lithium can a material hold—without actually having to go into the lab. And we can do this in a high-throughput fashion: screen a variety of materials and start to see which of these looks the most promising. Similarly, we can do it, same thing, to ask: Can we find iron conductors to find, say, solid-state battery materials using the same techniques?

Now, once you have these kinds of materials in play and you do them very, very fast using computers, you can start to think about how do you combine them with these X-ray techniques. So you could imagine that you’re finding a material on the computer. You’re trying to synthesize them and during the synthesis you try to watch and see, are you making the material you were predicting or did something happen during synthesis where you were not able to make the particular material?

And using this complementary way of looking at things, I think in the next five to 10 years you’re going to see this amazing acceleration of material discovery between the computing and the X-ray sources and other techniques of experimental methods. They’re going to see this incredible acceleration in terms of finding new things. You know, the big trick in materials—and this is certainly true for battery materials—if you can find those materials, maybe one of them looks interesting. So the job here is to cycle through those thousand as quickly as possible to find that one nugget that can be exciting. And so what we’re seeing now with computing and with these X-rays is the ability to cycle through many materials very quickly so that we can start to pin down which of those which of the one among those thousand looks the most promising that we can spend a lot more resources and time on them.

Steven Cherry We’ve been relying on lithium–ion for quite a while. It was first developed in 1985 and first used commercially by Sony in 1991. These batteries are somewhat infamous for occasionally exploding in phones and laptops and living rooms and on airplanes and even in the airplanes themselves in the case of the Boeing 787. Do you think this research will lead to safer batteries?

Venkat Srinivasan Absolutely. The first thing I should clarify is that the lithium–ion from the 1990s is not the same lithium–ion we used today. There have been many generations of materials that have changed over time; they’ve gotten better; the energy density has actually gone up by a factor of three in those twenty-five years, and there’s a chance that it’s going to continue to go up by another factor of two in the next decade or so. The reality is that when we use the word lithium–ion, we’re actually talking about a variety of material classes that go into the into the anodes, the cathode, and the electrolytes that make up the lithium–ion batteries. So the first thing to kind of notice is that these materials are changing continuously, what the new techniques are bringing is a way for us to push the boundaries of lithium–ion, meaning there is still a lot of room left for lithium–ion to get better, and these new techniques are allowing us to invent the next generation of cathode materials, anode materials, and electrolytes that could be used in the system to continue to push on things like energy density, fast-charge capability, cycle life. These are the kinds of big problems we’re worried about. So these techniques are certainly going to allow us to get there.

There is another important thing to think about for lithium–ion, which is recyclability. I think it’s been pretty clear that as the market for batteries starts to go up, they’re going to have a lot of batteries that are going to reach end-of-life at some stage and we do not want to throw them away. We want to take out the precious metals in them, the ones that we think are going to be useful for the next generation of batteries. And we want to make sure we dispose of them in a very sort of a safe and efficient manner for the environment. So I think that is also an area of R&D that’s going to be enabled by these kinds of techniques.

The last thing I’d say is that we’re thinking hard about systems that go beyond lithium–ion, things like solid-state batteries, things like magnesium-based batteries … And those kinds of chemistries, we really feel like taking these modern techniques and putting them in play is going to accelerate the development time frame. So you mentioned 1985 and 1991; lithium–ion battery research started in the 1950s and 60s, and it’s taken as many decades before we could get to a stage where Sony could actually go and commercialize it. And we think we can accelerate the timeline pretty significantly for things like solid-state batteries or magnesium-based batteries because of all the modern techniques.

Steven Cherry Charging time is also a big area for potential improvement, especially in electric cars, which still only have a driving range that maybe gets to 400 kilometers, in practical terms. Will we be getting to the point where we can recharge in the time it takes to get a White Chocolate Gingerbread Frappuccino at Starbucks?

Venkat Srinivasan That’s the that’s the dream. So Argonne actually leads a project for the Department of Energy working with multiple other national labs on enabling 10-minute charging of batteries. I will say that in the last two or three years, there’s been tremendous progress in this area. Instead of a forty-five-minute charge or a one-hour charge that was considered to be a fast charge. We now feel like there is a possibility of getting under 30 minutes of charging. They still have to be proven out. They have to be implemented at large scale. But more and more as we learn using these similar techniques that I can see a little bit more about, that there is a lot of work happening at the Advanced Photon Source looking at fast charging of batteries, trying to understand the phenomenon that is stopping us from charging very fast. These same techniques are allowing us to think about how to solve the problem.

And I’ll take a bet in the next five years, we’ll start to look at 10-minute charging as something that is going to be possible. Three or four years ago, I would not have said that. But in the next five years, I think they are going to start saying, hey, you know, I think there are ways in which you can start to get to this kind of charging time. Certainly it’s a big challenge. It’s not just a challenge in the battery side, it’s a challenge in how are we going to get the electricity to reach the electric car? I mean, there’s going to be a problem there. There’s a lot of heat generation that happens in these systems. We’ve got to find a way to pull it out. So there’s a lot of challenges that we have to solve. But I think these techniques are slowly giving us answers to, why is it a problem to begin with? And allowing us to start to test various hypotheses to find ways to solve the problem.

Steven Cherry The last area where I think people are looking for dramatic improvement is weight and bulk. It’s important in our cell phones and it’s also important in electric cars.

Venkat Srinivasan Yeah, absolutely. So frankly, it’s not just in electric cars. At Argonne they’re starting to think about light-duty vehicles, which is our passenger cars, but also heavy-duty vehicles. Right. I mean, what happens when you start to think about trucking across the country carrying a heavy payload? We are trying to think hard about aviation, about marine, and rail. As you start to get to these kinds of applications, the energy density requirement goes up dramatically.

I’ll give you some numbers. If you look at today’s lithium–ion batteries at the pack level, the energy density is approximately 180 watt-hours per kilogram, give or take. Depending on the company, That could be a little bit higher or lower, but approximately 180 Wh/kg. If we look at a 737 going across the country or a significant distance carrying a number of passengers, the kinds of energy density you would need is upwards of 800 Wh/kg. So just to give you a sense for that, right, we said it’s 180 for today’s lithium–ion. We’re talking about four to five times the energy density of today’s lithium–ion before we can start to think about electric aviation. So energy density would gravimetric and volumetric. It’s going to be extremely important in the future. Much of the R&D that we are doing is trying to discover materials that allow us to increase energy density. The hope is that you will increase energy density. You will make the battery charge very fine. To get them to last very long, all simultaneously, that tends to be a big deal, but it is not all about compromising between these different competing metrics—cycle life, calendar life, cost, safety, performance, all of them tend to play against each other. But the big hope is that we are able to improve the energy density without compromising on these other metrics. That’s kind of the big focus of the R&D that’s going on worldwide, but certainly at Argonne.

Steven Cherry I gather there’s also a new business model for conducting this research, a nonprofit organization that brings corporate and government, and academic research all under one aegis. Tell us about CalCharge.

Venkat Srinivasan Yeah, if you kind of think about the battery world and this is true for many of these hard technologies, the sort of the cleantech or greentech as people have come to call them. There is a lot of innovation that is needed, which means in our lab R&D, the kinds of techniques and models that we’re talking about is crucially important. But it’s also important for us to find a way to make them into a market, meaning you have to be able to take that lab innovation; you’ve got to be able to manufacture them; you’ve got to get them in the hands of, say, a car company that’s going to test them and ultimately qualify them and then integrate them into the vehicle.

So this is a long road to go from lab to market. And the traditional way you’ve thought about this is you will want to throw it across the fence, right. So, say at Argonne National Lab, invent something and then we throw it across the fence to industry and then you hope that industry takes it from there and they run with it and they solve the problems. That tends to be an extremely inefficient process. That’s because oftentimes that a national lab might stop is not enough for an industry to run with it—there are multiple paths that show up. And when you integrate these devices into the company’s existing other components there are problems that show up when you get it up to manufacturing, when you start to get up to a larger scale; there are problems that show up and you make a pact with it. And oftentimes the solution to these problems goes back to the material. So the fundamental principle that me and many others have started thinking about is you do not want to keep R&D, the manufacturing, and the market separate. You have to find a way to connect them up.

And if you connect them up very closely, then the market starts to drive the R&D, the R&D innovation starts to get the people to the manufacturing world excited. And there is this close connection among all of these three things that makes things go faster and faster. We’ve seen this in other industries and it certainly will be true in the battery world. So we’ve been trying very, very hard to kind of enable these kinds of what I would call public-private[NDASH] partnerships, ways in which we, the public, meaning the national lab systems, can start to interact with the private companies and find ways to move this along. So this is a concept that I think of me and a few others have been sort of thinking about for quite some time. Before I moved to Argonne, I was at Lawrence Berkeley. And at Lawrence Berkeley—the Bay Area has a very rich ecosystem of battery companies, especially startup companies.

So I created this entity called CalCharge, which was a way to connect up the local ecosystem in the San Francisco Bay Area to the national labs in the area—Lawrence Berkeley, SLAC, and Sandia National Labs in Livermore. So those are the three that were connected. And the idea behind this is how do we take the sort of the national lab facilities, the people, and the kind of the amazing brains that they have and use them to start to solve some of the problems that is facing? And how do we take the IP that is sitting in the lab and how do we move them to market using these startups so that we can continuously work with each other, make sure that we don’t have these valleys of death as we’ve come to call them, when we move from lab to market and try to accelerate that. I’ve been doing very similar things at Argonne in the last four years thinking hard about how do you do this, but on a national scale.

So we’ve been working closely with the Department of Energy, working with various entities both in the Chicagoland area, but also in the wider U.S. community, to start to think about enabling these kinds of ecosystems where national labs like ours and others across the country—there are 70 national labs, Department of Energy national labs—maybe a dozen of them have expertise that can be used for the free world. How do we connect them up? And the local universities that are the different parts of the country with amazing expertise, how do you connect them up to these startups, the big companies, the manufacturers, the car companies that are coming in, but also the material companies, companies that are providing lithium for a supply chain perspective? So my dream is that we would have this big ecosystem of everybody talking to each other, finding ways to leverage each other and ultimately making this technology something that can reach the market as quickly as possible.

Steven Cherry And right now, who is waiting on whom? Is there enough new research that it’s up to the corporations to do something with it? Or are they looking for specific improvements that that they need to wait for you to make?

Venkat Srinivasan All of the above. That is probably quite a bit of R&D that’s going on that industry is not aware of, and that tends to be a big problem—there’s a visibility problem when it comes to the kinds of things that are going on in the national labs and the academic world. There are things where we are not aware of the problems that industry is facing. And I think these kinds of disconnects where sometimes the lack of awareness keeps things from happening fast is what we need to solve. And the more connections we have, the more interactions we have, the more conversations we have with each other, the exposure increases. And when the exposure increases, we have a better chance of being able to solve these kinds of problems where the lack of information stops us from getting the kinds of innovation that we could get.

Steven Cherry And at your end, at the research end, I gather one immediate improvement you’re looking to make is the brightness of the X-rays. Is there anything else that we should look forward to?

Venkat Srinivasan Yeah, there are a couple of things that I think are very important. The first one is the brightness of the X-rays. There’s an upgrade that is coming up for the advanced photon source that’s going to change the time resolution in which we can start to see these batteries. So, for example, when you’re charging the batteries very fast, you can get data very quickly. So that’s going to be super important. The second one is you can also start to think about seeing features that are even smaller than the kinds of features we see today. So that’s the first big thing.

The second thing that is connected to that is artificial intelligence and machine learning is becoming something that is permeating through all forms of research, including battery research, we use AI and ML for all sorts of things. But one thing we’ve been thinking about is how do we connect up AI and ML to the kinds of X-ray techniques we’ve been using. So, for example, instead of looking all over the battery to see if there is a problem, can we use signatures but of where the problems could be occurring? So that these machine learning tools can quickly go in and identify the spot where things could be going wrong so that you can spend all your time and energy taking data at that particular spot. So that, again, we’re being very efficient with the time that we have to ensure that we’re catching the problems we have to catch. So I think the next big thing that is going on is this whole artificial intelligence and machine learning that is going to be integral for us in the battery discovery world.

The last thing which is an emerging trend is what is called automated labs or self-driving labs. The idea behind this is that instead of a human being going in and sort of synthesizing a material starting in the morning and finishing the evening and then characterizing it the next day and finding out what happened to it and then going back and trying the next material, could we start to do this using robotics? This is something that’s been a trend for a while now. But where things are heading is that more and more robots can start to do things that a human being could do. So you could imagine robots going in and synthesizing electrolyte molecules, mixing them up, testing for the conductivity and trying to see if the conductivity is higher than the one that you had before. If it’s not going back and iterating on finding a new molecule based on the previous results so that you can efficiently try to find the answer for a higher conductive electrolyte than one that you have is your baseline. Robots work 24/7. So it kind of makes it very, very useful for us to think about these ways of innovating. Robots generate a lot of data, which we now know how to handle because of all the machine learning tools we’ll be developing in the last three, four, five years. So all of a sudden, the synergy, the intersection between machine learning, the ability to analyze a lot of data, and robotics are starting to come into play. And I think we’re going to see that that’s going to open up new ways to discover materials in a rapid fashion.

Steven Cherry Well, Venkat, if you will forgive a rather obvious pun, the future of battery technology seems bright. And I wish you and your colleagues at Argonne and CalCharge every success. Thank you for your role in this research and for being here today.

Thank you so much. I appreciate the time you’ve taken to ask me this questions.

We’ve been speaking with Venkat Srivinasan of Argonne National Lab about a newfound ability to study batteries at the molecular level and about improvements that might result from it.

Radio Spectrum is brought to you by IEEE Spectrum, the member magazine of the Institute of Electrical and Electronic Engineers, a professional organization dedicated to advancing technology for the benefit of humanity.

This interview was recorded January 6, 2021 using Adobe Audition and edited in Audacity. Our theme music is by Chad Crouch.

You can subscribe to Radio Spectrum on the Spectrum website, where you can also sign up for alerts, or on Spotify, Apple, Google—wherever you get your podcasts. We welcome your feedback on the web or in social media.

For Radio Spectrum, I’m Steven Cherry.

Note: Transcripts are created for the convenience of our readers and listeners. The authoritative record of IEEE Spectrum’s audio programming is the audio version.

We welcome your comments on Twitter (@RadioSpectrum1 and @IEEESpectrum) and Facebook.

See Also:

Battery of tests: Scientists figure out how to track what happens inside batteries

Concentration and velocity profiles in a polymeric lithium-ion battery electrolyte

Building a cost efficient, petabyte-scale lake house with Amazon S3 lifecycle rules and Amazon Redshift Spectrum: Part 1

Post Syndicated from Cristian Gavazzeni original

The continuous growth of data volumes combined with requirements to implement long-term retention (typically due to specific industry regulations) puts pressure on the storage costs of data warehouse solutions, even for cloud native data warehouse services such as Amazon Redshift. The introduction of the new Amazon Redshift RA3 node types helped in decoupling compute from storage growth. Integration points provided by Amazon Redshift Spectrum, Amazon Simple Storage Service (Amazon S3) storage classes, and other Amazon S3 features allow for compliance of retention policies while keeping costs under control.

An enterprise customer in Italy asked the AWS team to recommend best practices on how to implement a data journey solution for sales data; the objective of part 1 of this series is to provide step-by-step instructions and best practices on how to build an end-to-end data lifecycle management system integrated with a data lake house implemented on Amazon S3 with Amazon Redshift. In part 2, we show some additional best practices to operate the solution: implementing a sustainable monthly ageing process, using Amazon Redshift local tables to troubleshoot common issues, and using Amazon S3 access logs to analyze data access patterns.

Amazon Redshift and Redshift Spectrum

At re:Invent 2019, AWS announced new Amazon Redshift RA3 nodes. Even though this introduced new levels of cost efficiency in the cloud data warehouse, we faced customer cases where the data volume to be kept is an order of magnitude higher due to specific regulations that impose historical data to be kept for up to 10–12 years or more. In addition, this historical cold data must be accessed by other services and applications external to Amazon Redshift (such as Amazon SageMaker for AI and machine learning (ML) training jobs), and occasionally it needs to be queried jointly with Amazon Redshift hot data. In these situations, Redshift Spectrum is a great fit because, among other factors, you can use it in conjunction with Amazon S3 storage classes to further improve TCO.

Redshift Spectrum allows you to query data that resides in S3 buckets using already in place application code and logic used for data warehouse tables, and potentially performing joins and unions of Amazon Redshift local tables and data on Amazon S3.

Redshift Spectrum uses a fleet of compute nodes managed by AWS that increases system scalability. To use it, we need to define at least an external schema and an external table (unless an external schema and external database are already defined in the AWS Glue Data Catalog). Data definition language (DDL) statements used to define an external table include a location attribute to address S3 buckets and prefixes containing the dataset, which could be in common file formats like ORC, Parquet, AVRO, CSV, JSON, or plain text. Compressed and columnar file formats like Apache Parquet are preferred because they provide less storage usage and better performance.

For a data catalog, we could use AWS Glue or an external hive metastore. For this post, we use AWS Glue.

S3 Lifecycle rules

Amazon S3 storage classes include S3 Standard, S3-IA, S3 One-Zone, S3 Intelligent-Tiering, S3 Glacier, and S3 Glacier Deep Archive. For our use case, we need to keep data accessible for queries for 5 years and with high durability, so we consider only S3 Standard and S3-IA for this time frame, and S3 Glacier only for long term (5–12 years). Data access to S3 Glacier requires data retrieval in the range of minutes (if using expedited retrieval) and this can’t be matched with the ability to query data. We can adopt Glacier for very cold data if you implement a manual process to first restore the Glacier archive to a temporary S3 bucket, and then query this data defined via an external table.

S3 Glacier Select allows you to query on data directly in S3 Glacier, but it only supports uncompressed CSV files. Because the objective of this post is to propose a cost-efficient solution, we didn’t consider it. If for any reason you have constraints for storing in CSV file format (instead of compressed formats like Parquet), Glacier Select might also be a good fit.

Excluding retrieval costs, the cost for storage for S3-IA is typically around 45% cheaper than S3 Standard, and S3 Glacier is 68% cheaper than S3-IA. For updated pricing information, see Amazon S3 pricing.

We don’t use S3 Intelligent Tiering because it bases storage transition on the last access time, and this resets every time we need to query the data. We use the S3 Lifecycle rules that are based either on creation time or prefix or tag matching, which is consistent regardless of data access patterns.

Simulated use case and retention policy

For our use case, we need to implement the data retention strategy for trip records outlined in the following table.

Corporate RuleDataset StartDataset endData StorageEngine
Last 6 months in Redshift SpectrumDecember 2019May 2020Amazon Redshift local tablesAmazon Redshift
Months 6–11 in Amazon S3June 2019November 2019S3 StandardRedshift Spectrum
Months 12–14 in S3-IAMarch 2019May 2019S3-IARedshift Spectrum
After month 15January 2019February 2019GlacierN/A

For this post, we create a new table in a new Amazon Redshift cluster and load a public dataset. We use the New York City Taxi and Limousine Commission (TLC) Trip Record Data because it provides the required historical depth.

We use the Green Taxi Trip Records, based on monthly CSV files containing 20 columns with fields like vendor ID, pickup time, drop-off time, fare, and other information. 

Preparing the dataset and setting up the Amazon Redshift environment

As a first step, we create an AWS Identity and Access Management (IAM) role for Redshift Spectrum. This is required to allow access to Amazon Redshift to Amazon S3 for querying and loading data, and also to allow access to the AWS Glue Data Catalog whenever we create, modify, or delete a new external table.

  1. Create a role named BlogSpectrumRole.
  2. Edit the following two JSON files, which have the IAM policies according to the bucket and prefix used in this post, as needed, and attach the policies to the role you created:

    	    "Version": "2012-10-17",
    	    "Statement": [
    	            "Sid": "VisualEditor0",
    	            "Effect": "Allow",
    	            "Action": "s3:*",
    	            "Resource": [


    	"Version": "2012-10-17",
    	"Statement": [
    			"Sid": "VisualEditor0",
    			"Effect": "Allow",
    			"Action": [
    			"Resource": [

Now you create a single-node Amazon Redshift cluster based on a DC2.large instance type, attaching the newly created IAM role BlogSpectrumRole.

  1. On the Amazon Redshift console, choose Create cluster.
  2. Keep the default cluster identifier redshift-cluster-1.
  3. Choose the node type DC2.large.
  4. Set the configuration to single node.
  5. Keep the default dbname and ports.
  6. Set a primary user password.
  7. Choose Create cluster.
  8. After the cluster is configured, check the attached IAM role on the Properties tab for the cluster.
  9. Take note of the IAM role ARN, because you use it to create external tables.

Copying data into Amazon Redshift

To connect to Amazon Redshift, you can use a free client like SQL Workbench/J or use the AWS console embedded query editor with the previously created credentials.

  1. Create a table according to the dataset schema using the following DDL statement:
    	create table greentaxi(
    	vendor_id integer,
    	pickup timestamp,
    	dropoff timestamp,
    	storeandfwd char(2),
    	ratecodeid integer,
    	pulocid integer,
    	dolocid integer,
    	passenger_count integer,
    	trip_dist real,
    	fare_amount real,
    	extra real,
    	mta_tax real,
    	tip_amount real,
    	toll_amount real,
    	ehail_fee real,
    	improve_surch real,
    	total_amount real,
    	pay_type integer,
    	trip_type integer,
    	congest_surch real

The most efficient method to load data into Amazon Redshift is using the COPY command, because it uses the distributed architecture (each slice can ingest one file at the same time).

  1. Load year 2020 data from January to June in the Amazon Redshift table with the following command (replace the IAM role value with the one you created earlier):
    copy greentaxi from ‘s3://nyc-tlc/trip data/green_tripdata_2020’ 
    	iam_role ‘arn:aws:iam::123456789012:role/BlogSpectrumRole’ 
    	delimiter ‘,’ 
    	dateformat ‘auto’
    	region ‘us-east-1’ 
    ignoreheader as 1;

Now the greentaxi table includes all records starting from January 2019 to June 2020. You’re now ready to leverage Redshift Spectrum and S3 storage classes to save costs.

Extracting data from Amazon Redshift

You perform the next steps using the AWS Command Line Interface (AWS CLI). For download and installation instructions, see Installing, updating, and uninstalling the AWS CLI version 2.

Use the AWS CONFIGURE command to set the access key and secret access key of your IAM user and your selected AWS Region (same as your S3 buckets) of your Amazon Redshift cluster.

In this section, we evaluate two different use cases:

  • New customer – As a new customer, you don’t have much Amazon Redshift old data, and want to extract only the oldest monthly data and apply a lifecycle policy based on creation date. Storage tiering only affects future data and is fully automated.
  • Old customer – In this use case, you come from a multi-year data growth, and need to move existing Amazon Redshift data to different storage classes. In addition, you want a fully automated solution but with the ability to override and decide what and when to transition data between S3 storage classes. This requirement is due to many factors, like the GDPR rule “right to be forgotten.” You may need to edit historical data to remove specific customer records, which changes the file creation date. For this reason, you need S3 Lifecycle rules based on tagging instead of creation date.

New customer use case

The UNLOAD command uses the result of an embedded SQL query to extract data from Amazon Redshift to Amazon S3, producing different file formats such as CSV, text, and Parquet. To extract data from January 2019, complete the following steps:

  1. Create a destination bucket like the following:
    aws s3 mb s3://rs-lakehouse-blog-post

  2. Create a folder named archive in the destination bucket rs-lakehouse-blog-post:
    aws s3api put-object --bucket rs-lakehouse-blog-post -–key archive

  3. Use the following SQL code to implement the UNLOAD statement. The SELECT on Data data types requires quoting as well as a SELECT statement embedded in the UNLOAD command:
    unload ('select * from greentaxi where pickup between ''2019-01-01 00:00:00'' and ''2019-01-30 23:59:59''')
    to 's3://rs-lakehouse-blog-post/archive/5to12years_taxi'
    iam_role 'arn:aws:iam::123456789012:role/BlogSpectrumRole'; 

    1. You can perform a check with the AWS CLI:
      aws s3 ls s3://rs-lakehouse-blog-post/archive/
      2020-10-12 14:49:51          0 
      2020-11-04 09:51:00   34792620 5to12years_taxi0000_part_00
      2020-11-04 09:51:00   34792738 5to12years_taxi0001_part_00

  • The output shows that the UNLOAD statement generated two files of 33 MB each. By default, UNLOAD generates at least one file for each slice in the Amazon Redshift cluster. My cluster is a single node with DC2 type instances with two slices. This default file format is text, which is not storage optimized.

    To simplify the process, you create a single file for each month so that you can later apply lifecycle rules to each file. In real-world scenarios, extracting data with a single file isn’t the best practice in terms of performance optimization. This is just to simplify the process for the purpose of this post.

    1. Create your files with the following code:
      unload ('select * from greentaxi where pickup between ''2019-01-01 00:00:00'' and ''2019-01-31 23:59:59''')
      to 's3://rs-lakehouse-blog-post/archive/green_tripdata_2019-01'
      iam_role 'arn:aws:iam::123456789012:role/BlogSpectrumRole' parquet parallel off;

    The output of the UNLOAD commands is a single file (per month) in Parquet format, which takes 80% less space than the previous unload. This is important to save costs related to both Amazon S3 and Glacier, but also for costs associated to Redshift Spectrum queries, which is billed by amount of data scanned.

    1. You can check how efficient Parquet is compared to text format:
      aws s3 ls s3://rs-lakehouse-blog-post/archive/

      2020-08-12 17:17:42   14523090 green_tripdata_2019-01000.parquet 

    1. Clean up previous the text files:
      aws s3 rm s3://rs-lakehouse-blog-post/ --recursive \
      --exclude "*" --include "archive/5to12*"

      The next step is creating a lifecycle rule based on creation date to automate the migration to S3-IA after 12 months and to Glacier after 15 months. The proposed policy name is 12IA-15Glacier and it’s filtered on the prefix archive/.

    1. Create a JSON file containing the lifecycle policy definition named json:
      	"Rules": [
      			"ID": "12IA-15Glacier",
      			"Filter": {
      				"Prefix": "archive"
      			"Status": "Enabled",
      			"Transitions": [
      				"Days": 365,
      				"StorageClass": "STANDARD_IA"
      	                    	"Days": 548,
      	                    	"StorageClass": "GLACIER"

    1. Run the following command to send the JSON file to Amazon S3:
      aws s3api put-bucket-lifecycle-configuration \ 
      --bucket rs-lakehouse-blog-post \ 
      --lifecycle-configuration file://lifecycle.json

    This lifecycle policy migrates all keys in the archive prefix from Amazon S3 to S3-IA after 12 months and from S3-IA to Glacier after 15 months. For example, if today were 2020-09-12, and you unload the 2020-03 data to Amazon S3, by 2021-09-12, this 2020-03 data is automatically migrated to S3-IA.

    If using this basic use case, you can skip the partition steps in the section Defining the external schema and external tables.

    Old customer use case

    In this use case, you extract data with different ageing in the same time frame. You extract all data from January 2019 to February 2019 and, because we assume that you aren’t using this data, archive it to S3 Glacier.

    Data from March 2019 to May 2019 is migrated as an external table on S3-IA, and data from June 2019 to November 2019 is migrated as an external table to S3 Standard. With this approach, you comply with customer long-term retention policies and regulations, and reduce TCO.

    You implement the retention strategy described in the Simulated use case and retention policy section.

    1. Create a destination bucket (if you also walked through the first use case, use a different bucket):
      aws s3 mb s3://rs-lakehouse-blog-post

    2. Create three folders named extract_longtermextract_midterm, and extract_shortterm in the destination bucket rs-lakehouse-blog-post. The following code is the syntax for creating the extract_longterm folder:
      aws s3api put-object --bucket rs-lakehouse-blog-post –key

    3. Extract the data:
      unload ('select * from greentaxi where pickup between ''2019-01-01 00:00:00'' and ''2019-01-31 23:59:59''')
      	to 's3://rs-lakehouse-blog-post/extract_longterm/green_tripdata_2019-01'
      iam_role 'arn:aws:iam::123456789012:role/BlogSpectrumRole' parquet parallel off;

    4. Repeat these steps for the February 2019 time frame.

    Managing data ageing with Amazon S3 storage classes and lifecycle policies

    In this section, you manage your data with storage classes and lifecycle policies.

    1. Migrate your keys in Parquet format to Amazon Glacier:
      aws s3api copy-object \
      --copy-source rs-lakehouse-blog-post/extract_longterm/green_tripdata_2019-01000.parquet \
      --storage-class GLACIER \
      --bucket rs-lakehouse-blog-post \
      --key extract_longterm/green_tripdata_2019-01000.parquet
      aws s3api copy-object \
      --copy-source rs-lakehouse-blog-post/extract_longterm/green_tripdata_2019-02000.parquet \
      --storage-class GLACIER \
      --bucket rs-lakehouse-blog-post \
      --key extract_longterm/green_tripdata_2019-02000.parquet

    2. Extract the data from March 2019 to May 2019 (months 12–15) and migrate them to S3-IA. The following code is for March:
      unload ('select * from greentaxi where pickup between ''2019-03-01 00:00:00'' and ''2019-03-31 23:59:59''')
      to 's3://rs-lakehouse-blog-post/extract_midterm/03/green_tripdata_2019-03'
      iam_role 'arn:aws:iam::123456789012:role/BlogSpectrumRole' parquet parallel off;

    3. Repeat the previous step for April and May.
    4. Migrate all three months to S3-IA using same process as before. The following code is for March:
      aws s3api copy-object \
      --copy-source rs-lakehouse-blog-post/extract_midterm/03/green_tripdata_2019-03000.parquet \
      --storage-class STANDARD_IA \ 8. --bucket rs-lakehouse-blog-post \
      --key extract_midterm/03/green_tripdata_2019-03000.parquet 

    5. Do the same for other two months.
    6. Check the newly applied storage class with the following AWS CLI command:
      aws s3api head-object \
      --bucket rs-lakehouse-blog-post \
      --key extract_midterm/03/green_tripdata_2019-03000.parquet
          "AcceptRanges": "bytes",
          "LastModified": "2020-10-12T13:47:32+00:00",
          "ContentLength": 14087514,
          "ETag": "\"15bf39e6b3f32b10ef589d75e0988ce6\"",
          "ContentType": "application/x-www-form-urlencoded; charset=utf-8",
          "Metadata": {},
          "StorageClass": "STANDARD_IA"

    In the next step, you tag every monthly file with a key value named ageing set to the number of months elapsed from the origin date.

    1. Set March to 14, April to 13, and May to 12:
      aws s3api put-object-tagging \
      --bucket rs-lakehouse-blog-post \
      --key extract_midterm/03/green_tripdata_2019-03000.parquet \
      --tagging '{"TagSet": [{ "Key": "ageing", "Value": "14"} ]}'
      aws s3api put-object-tagging \
      --bucket rs-lakehouse-blog-post \
      --key extract_midterm/04/green_tripdata_2019-04000.parquet \
      --tagging '{"TagSet": [{ "Key": "ageing", "Value": "13"} ]}'
      aws s3api put-object-tagging \
      --bucket rs-lakehouse-blog-post \
      --key extract_midterm/05/green_tripdata_2019-05000.parquet \
      --tagging '{"TagSet": [{ "Key": "ageing", "Value": "12"} ]}'

    In this set of three objects, the oldest file has the tag ageing set to value 14, and the newest is set to 12. In the second post in this series, you discover how to manage the ageing tag as it increases month by month.

    The next step is to create a lifecycle rule based on this specific tag in order to automate the migration to Glacier at month 15. The proposed policy name is 15IAtoGlacier and the definition is to limit the scope to only object with the tag ageing set to 15 in the specific bucket.

    1. Create a JSON file containing the lifecycle policy definition named json:
          "Rules": [
                  "ID": "15IAtoGlacier",
                  "Filter": {
                      "Tag": {
                          "Key": "ageing",
                          "Value": "15"
                  "Status": "Enabled",
                  "Transitions": [
                          "Days": 1,
                          "StorageClass": "GLACIER"

    2. Run the following command to send the JSON file to Amazon S3:
      aws s3api put-bucket-lifecycle-configuration \
      --bucket rs-lakehouse-blog-post \
      --lifecycle-configuration file://lifecycle.json 

    This lifecycle policy migrates all objects with the tag ageing set to 15 from S3-IA to Glacier.

    Though I described this process as automating the migration, I actually want to control the process from the application level using the self-managed tag mechanism. I use this approach because otherwise, the transition is based on file creation date, and the objective is to be able to delete, update, or create a new file whenever needed (for example, to delete parts of records in order to comply to the GDPR “right to be forgotten” rule).

    Now you extract all data from June 2019 to November 2019 (7–11 months old) and keep them in Amazon S3 with a lifecycle policy to automatically migrate to S3-IA after ageing 12 months, using same process as described. These six new objects also inherit the rule created previously to migrate to Glacier after 15 months. Finally, you set the ageing tag as described before.

    Use the extract_shortterm prefix for these unload operations.

    1. Unload June 2019 with the following code:
      unload ('select * from greentaxi where pickup between ''2019-06-01 00:00:00 '' and ''2019-06-30 23:59:59''')
      to 's3://rs-lakehouse-blog-post/extract_shortterm/06/green_tripdata_2019-06'
      iam_role 'arn:aws:iam::123456789012:role/BlogSpectrumRole' parquet parallel off;

    2. Use the same logic for the remaining months up to October.
    3. For November, see the following code:
      	unload ('select * from greentaxi where pickup between ''2019-11-01 00:00:00'' and ''2019-11-30 23:59:59''')
      	to 's3://rs-lakehouse-blog-post/extract_shortterm/11/green_tripdata_2019-11''
      	iam_role 'arn:aws:iam::123456789012:role/BlogSpectrumRole' parquet parallel off;
      aws s3 ls –recursive s3://rs-lakehouse-blog-post/extract_shortterm/
      2020-10-12 14:52:11          0 extract_shortterm/
      2020-10-12 18:45:42          0 extract_shortterm/06/
      2020-10-12 18:46:49   10889436 extract_shortterm/06/green_tripdata_2019-06000.parquet
      2020-10-12 18:45:53          0 extract_shortterm/07/
      2020-10-12 18:47:03   10759747 extract_shortterm/07/green_tripdata_2019-07000.parquet
      2020-10-12 18:45:58          0 extract_shortterm/08/
      2020-10-12 18:47:24    9947793 extract_shortterm/08/green_tripdata_2019-08000.parquet
      2020-10-12 18:46:03          0 extract_shortterm/09/
      2020-10-12 18:47:45   10302432 extract_shortterm/09/green_tripdata_2019-09000.parquet
      2020-10-12 18:46:08          0 extract_shortterm/10/
      2020-10-12 18:48:00   10659857 extract_shortterm/10/green_tripdata_2019-10000.parquet
      2020-10-12 18:46:11          0 extract_shortterm/11/
      2020-10-12 18:48:14   10247201 extract_shortterm/11/green_tripdata_2019-11000.parquet

      aws s3 ls --recursive rs-lakehouse-blog-post/extract_longterm/

      2020-10-12 14:49:51          0 extract_longterm/
      2020-10-12 14:56:38   14403710 extract_longterm/green_tripdata_2019-01000.parquet
      2020-10-12 15:30:14   13454341 extract_longterm/green_tripdata_2019-02000.parquet

    4. Apply the tag ageing with range 11 to 6 (June 2019 to November 2019), using either the AWS CLI or console if you prefer.
    5. Create a new lifecycle rule named 12S3toS3IA, which transitions from Amazon S3 to S3-IA.
    6. With the AWS CLI, create a JSON file that includes the previously defined rule 15IAtoGlacier and new 12S3toS3IA, because the command s3api overwrites the current configuration (no incremental approach) with the new policy definition file (JSON). The following code is the new lifecycle.json:
          "Rules": [
                  "ID": "12S3toS3IA",
                  "Filter": {
                      "Tag": {
                          "Key": "ageing",
                          "Value": "12"
                  "Status": "Enabled",
                  "Transitions": [
                          "Days": 30,
                          "StorageClass": "STANDARD_IA"
                  "ID": "15IAtoGlacier",
                  "Filter": {
                      "Tag": {
                          "Key": "ageing",
                          "Value": "15"
                  "Status": "Enabled",
                  "Transitions": [
                          "Days": 1,
                          "StorageClass": "GLACIER"

      aws s3api get-bucket-lifecycle-configuration \
      --bucket rs-lakehouse-blog-post

    7. Check the applied policies with the following command:
      aws s3api get-bucket-lifecycle-configuration \
      --bucket rs-lakehouse-blog-post

  • You get in stdout a single JSON with merge of 15IAtoGlacier and 12S3toS3IA.

    Defining the external schema and external tables

    Before deleting the records you extracted from Amazon Redshift with the UNLOAD command, we define the external schema and external tables to enable Redshift Spectrum queries for these Parquet files.

    1. Enter the following code to create your schema:
      create external schema taxispectrum
      	from data catalog
      	database 'blogdb'
      	iam_role 'arn:aws:iam::123456789012:role/BlogSpectrumRole'
      create external database if not exists;

    2. Create the external table taxi_archive in the taxispectrum external schema. If you’re walking through the new customer use case, replace the prefix extract_midterm with archive:
      create external table taxispectrum.taxi_archive(
      	vendor_id integer,
      	pickup timestamp,
      	dropoff timestamp,
      	storeandfwd char(2),
      	ratecodeid integer,
      	pulocid integer,
      	dolocid integer,
      	passenger_count integer,
      	trip_dist real,
      	fare_amount real,
      	extra real,
      	mta_tax real,
      	tip_amount real,
      	toll_amount real,
      	ehail_fee real,
      	improve_surch real,
      	total_amount real,
      	pay_type integer,
      	trip_type integer,
      	congest_surch real)
      	partitioned by (yearmonth char(7))
      	stored as parquet
      location 's3://rs-lakehouse-blog-post/extract_midterm/'

    3. Add the six files stored in Amazon S3 and three files stored in S3-IA as partitions (if you’re walking through the new customer use case, you can skip the following partitioning steps). The following code shows March and April:
      ALTER TABLE taxispectrum.taxi_archive
      ADD PARTITION (yearmonth='2019-03') 
      LOCATION 's3://rs-lakehouse-blog-post/extract_midterm/03/';
      ALTER TABLE taxispectrum.taxi_archive
      ADD PARTITION (yearmonth='2019-04') 
      LOCATION 's3://rs-lakehouse-blog-post/extract_midterm/04/';

    4. Continue this process up to December 2019, using extract_shortterm instead of extract_midterm.
    5. Check the table isn’t empty with the following SQL statement:
      Select count (*) from taxispectrum.taxi_archive 

    You get the number of entries in this external table.

    1. Optionally, you can check the partitions mapped to this table with a query to the Amazon Redshift internal table:
      select * from svv_external_partitions

      Run a SELECT command using partitioning in order to optimize costs related to Redshift Spectrum scanning:

      select * from taxispectrum.taxi_archive where yearmonth='2019-11' and fare_amount > 20

    2. Redshift Spectrum scans only specific partitions matching yearmonth.

    The final step is cleaning all the records extracted from the Amazon Redshift local tables:

    delete from public.greentaxi where pickup between '2019-01-01 00:00:00' and '2019-11-30 23:59:59'


    We demonstrated how to extract historical data from Amazon Redshift and implement an archive strategy with Redshift Spectrum and Amazon S3 storage classes. In addition, we showed how to optimize Redshift Spectrum scans with partitioning.

    In the next post in this series, we show how to operate this solution day by day, especially for the old customer use case, and share some best practices.

    About the Authors

    Cristian Gavazzeni is a senior solution architect at Amazon Web Services. He has more than 20 years of experience as a pre-sales consultant focusing on Data Management, Infrastructure and Security. During his spare time he likes eating Japanese food and travelling abroad with only fly and drive bookings.



    Francesco MarelliFrancesco Marelli is a senior solutions architect at Amazon Web Services. He has lived and worked in London for 10 years, after that he has worked in Italy, Switzerland and other countries in EMEA. He is specialized in the design and implementation of Analytics, Data Management and Big Data systems, mainly for Enterprise and FSI customers. Francesco also has a strong experience in systems integration and design and implementation of web applications. He loves sharing his professional knowledge, collecting vinyl records and playing bass.

    To Close the Digital Divide, the FCC Must Redefine Broadband Speeds

    Post Syndicated from Stacey Higginbotham original

    The coronavirus pandemic has brought the broadband gap in the United States into stark relief—5.6 percent of the population has no access to broadband infrastructure. But for an even larger percentage of the population, the issue is that they can’t afford access, or they get by on mobile phone plans. Recent estimates, for example, suggest that 15 million to 16 million students—roughly 30 percent of the grade-school population in the United States—lack broadband access for some reason.

    The Federal Communications Commission (FCC) has punted on broadband access for at least a decade. With the recent change in the regulatory regime, it’s time for the country that created the ARPANET to fix its broadband access problem. While the lack of access is driven largely by broadband’s high cost, the reason that cost is driving the broadband gap is because the FCC’s current definition of broadband is stuck in the early 2000s.

    The FCC defines broadband as a download speed of 25 megabits per second and an upload speed of 3 Mb/s. The agency set this definition in 2015, when it was already immediately outdated. At that time, I was already stressing a 50 Mb/s connection just from a couple of Netflix streams and working from home. Before 2015, the defined broadband speeds in the United States were an anemic 4 Mb/s down and 1 Mb/s up, set in 2010.

    If the FCC wants to address the broadband gap rather than placate the telephone companies it’s supposed to regulate, it should again redefine broadband. The FCC could easily establish broadband as 100 Mb/s down and at least 10 Mb/s up. This isn’t a radical proposal: As of 2018, 90.5 percent of the U.S. population already had access to 100 Mb/s speeds, but only 45.7 percent were tapping into it, according to the FCC’s 2020 Broadband Deployment Report.

    Redefining broadband will force upgrades where necessary and also reveal locations where competition is low and prices are high. As things stand, most people in need of speeds above 100 Mb/s have only one option: cable providers. Fiber is an alternative, but most U.S. fiber deployments are in wealthy suburban and dense urban areas, leaving rural students and those living on reservations behind. A lack of competition leaves cable providers able to impose data caps and raise fees.

    What seems like a lack of demand is more likely a rejection of a high-cost service, even as more people require 100 Mb/s for their broadband needs. In the United States, 100 Mb/s plans cost $81.19 per month on average, according to data from consumer interest group New America. The group gathered broadband prices across 760 plans in 28 cities around the world, including 14 cities in the United States. When compared with other countries, prices in the United States are much higher. In Europe, the average cost of a 100/10 Mb/s plan is $48.48, and in Asia, a similar plan would cost $69.76.

    Closing the broadband gap will still require more infrastructure and fewer monopolies, but redefining broadband is a start. With a new understanding of what constitutes reasonable broadband, the United States can proactively create new policies that promote the rollout of plans that will meet the needs of today and the future.

    This article appears in the February 2021 print issue as “Redefining Broadband.”

    Журналистът е атакуван заради журналистическата му дейност Съветът на Европа реагира срещу репресиите на полиция и прокуратура над журналиста на Биволъ Стоян Тончев

    Post Syndicated from Биволъ original

    вторник 19 януари 2021

    Разследващият журналист Стоян Тончев – автор на и издател на, e подложен на системен тормоз от прокуратурата и полицията на България, заради  журналистическата му дейност. Съветът на Европа,…

    Run Apache Spark 3.0 workloads 1.7 times faster with Amazon EMR runtime for Apache Spark

    Post Syndicated from Al MS original

    With Amazon EMR release 6.1.0, Amazon EMR runtime for Apache Spark is now available for Spark 3.0.0. EMR runtime for Apache Spark is a performance-optimized runtime for Apache Spark that is 100% API compatible with open-source Apache Spark.

    In our benchmark performance tests using TPC-DS benchmark queries at 3 TB scale, we found EMR runtime for Apache Spark 3.0 provides a 1.7 times performance improvement on average, and up to 8 times improved performance for individual queries over open-source Apache Spark 3.0.0. With Amazon EMR 6.1.0, you can now run your Apache Spark 3.0 applications faster and cheaper without requiring any changes to your applications.

    Results observed using TPC-DS benchmarks

    To evaluate the performance improvements, we used TPC-DS benchmark queries with 3 TB scale and ran them on a 6-node c4.8xlarge EMR cluster with data in Amazon Simple Storage Service (Amazon S3). We ran the tests with and without the EMR runtime for Apache Spark. The following two graphs compare the total aggregate runtime and geometric mean for all queries in the TPC-DS 3 TB query dataset between the Amazon EMR releases.

    The following table shows the total runtime in seconds.

    The following table shows the total runtime in seconds.

    The following table shows the geometric mean of the runtime in seconds.

    The following table shows the geometric mean of the runtime in seconds.

    In our tests, all queries ran successfully on EMR clusters that used the EMR runtime for Apache Spark. However, when using Spark 3.0 without the EMR runtime, 34 out of the 104 benchmark queries failed due to SPARK-32663. To work around these issues, we disabled spark.shuffle.readHostLocalDisk configuration. However, even after this change, queries 14a and 14b continued to fail. Therefore, we chose to exclude these queries from our benchmark comparison.

    The per-query speedup on Amazon EMR 6.1 with and without EMR runtime is illustrated in the following chart. The horizontal axis shows each query in the TPC-DS 3 TB benchmark. The vertical axis shows the speedup of each query due to the EMR runtime. We found a 1.7 times performance improvement as measured by the geometric mean of the per-query speedups, with all queries showing a performance improvement with the EMR Runtime.

    The per-query speedup on Amazon EMR 6.1 with and without EMR runtime is also illustrated in the following chart.


    You can run your Apache Spark 3.0 workloads faster and cheaper without making any changes to your applications by using Amazon EMR 6.1. To keep up to date, subscribe to the Big Data blog’s RSS feed to learn about more great Apache Spark optimizations, configuration best practices, and tuning advice.

    About the Authors

    AI MSAl MS is a product manager for Amazon EMR at Amazon Web Services.





    Peter Gvozdjak is a senior engineering manager for EMR at Amazon Web Services.

    InsightIDR: 2020 Highlights and What’s Ahead in 2021

    Post Syndicated from Margaret Zonay original

    InsightIDR: 2020 Highlights and What’s Ahead in 2021

    As we kick off 2021 here at Rapid7, we wanted to take a minute to reflect on 2020, highlight some key InsightIDR product investments we don’t want you to miss, and take a look ahead at where our team sees detection and response going this year.

    Rapid7 detection and response 2020 highlights

    Whenever we engage with customers or industry professionals, one theme that we hear on repeat is complexity. It can often feel like the cards are stacked against security teams as environments sprawl and security needs outpace the number of experienced professionals we have to address them. This dynamic was further amplified by the pandemic over the past year. Our focus over the past 12 months has been on enabling teams to work smarter, get the most out of our software and services, and accelerate their security maturity as efficiently as possible. Here are some highlights from our journey over 2020:

    In 2020, we made continuous enhancements to our Log Search feature to make it more efficient and customizable to customers’ needs. Now, you can:

    InsightIDR: 2020 Highlights and What’s Ahead in 2021
    LEQL Multi-groupby in InsightIDR

    For a look at the most up-to-date list of Log Search capabilities, check out our help documentation here.

    Greater visibility across the attack surface with Network Traffic Analysis

    With Rapid7’s lightweight Insight Network Sensor, customers can monitor, capture, and assess end-to-end network traffic across their physical and virtual environments (including AWS environments) with curated IDS alerts, plus DNS and DHCP data. For maximum visibility, customers can add on the network flow data module to further investigations, deepen forensic activities, and enable custom rule creation.

    The real-time visibility provided by InsightIDR’s Network Traffic Analysis has been especially helpful for organizations working remotely over the past year. Many customers are building custom InsightIDR dashboards to improve real-time monitoring of activity within their networks and at the edge to maintain optimal security as teams work from home.

    InsightIDR: 2020 Highlights and What’s Ahead in 2021

    Learn about how to leverage NTA and more by checking out our top Network Traffic blogs of 2020:

    Complete endpoint visibility with Enhanced Endpoint Telemetry

    InsightIDR’s latest add-on module, enhanced endpoint telemetry (EET), brings the enhanced endpoint data that’s currently used by Rapid7’s Managed Detection and Response (MDR) Services team in almost all of their investigations into InsightIDR.

    Get a full picture of endpoint activity, create custom detections, and see the full scope of an attack with EET’s process start activity data in Log Search. These logs give visibility into all endpoint activity to tell a story around what triggered a particular detection and to help inform remediation efforts. As remote working has increased for many organizations, so has the number of remote endpoints security teams have to monitor—the level of detail provided by EET helps teams detect and proactively hunt for custom threats across their expanding environments.

    InsightIDR: 2020 Highlights and What’s Ahead in 2021
    Enhanced Endpoint Telemetry dashboard card in InsightIDR

    Learn more about the benefits of EET in our blog post and how to get started in our help documentation.

    SOC automation with InsightIDR and InsightConnect

    Automation is critical for accelerating and streamlining incident response, especially as the threat landscape continues to evolve in 2021 and beyond. This is why we have built-in automation powered by InsightConnect, Rapid7’s Security Orchestration Automation and Response (SOAR) tool, at the heart of InsightIDR. SOC automation with InsightIDR and InsightConnect allows customers to auto-enrich alerts, customize alerting and escalation pathways, and auto-contain threats.

    InsightIDR: 2020 Highlights and What’s Ahead in 2021
    Comparing SecOps practices before and after automation is implemented using Rapid7’s SOAR solution, InsightConnect

    In 2020, we furthered the integration between InsightIDR and InsightConnect—in addition to kicking off workflows from User Behavior Analytics (UBA) alerts, joint customers can now trigger custom workflows to automatically initiate predefined actions each time a Custom Alert is triggered in InsightIDR.

    Learn more about the benefits of leveraging SIEM and SOAR by checking out the blogs below:

    MDR Elite “Active Response” for end-to-end detection and response

    Only Rapid7 MDR with Active Response can reduce attacker dwell time and save your team time and money with unrivaled response capabilities on both endpoint and user threats. Whether it’s a suspicious authentication while you’re buried in other security initiatives or an attacker executing malicious documents at 3 a.m., you can be confident that Rapid7 MDR is watching and responding to attacks in your environment.

    With MDR Elite with Active Response, our team of SOC experts provide 24×7 end-to-end detection and response to immediately limit an attacker’s ability to execute, giving you and your team peace of mind that Rapid7 will take action to protect your business and return the time normally spent investigating and responding to threats back to your analysts.

    2020 Rapid7 detection and response achievements

    At Rapid7, we’re grateful to have received multiple recognitions from analysts and customers alike for our Detection and Response portfolio throughout 2020, including:

    We’re so thankful to our customers for your continued partnership and feedback throughout the years. As we move into 2021, we’re excited to continue to invest in driving effective and efficient detection and response for teams.

    What’s ahead in 2021

    As we move forward in 2021, it’s clear that things aren’t going to jump back to “normal” anytime soon. Many companies continue to work remotely, increasing the already present need for security tools that can keep teams safe and secure.

    In 2020, a big theme for InsightIDR was giving teams advanced visibility into their environments. What’s ahead in 2021? More capabilities that help security teams do their jobs faster and more effectively.

    Sam Adams, VP of Engineering for Detection and Response at Rapid7 reflected, “In 2020, InsightIDR added a breadth of new ways to detect attacks in your environment, from endpoint to network to cloud. In 2021, we want to add depth to all of these capabilities, by allowing our customers fine-grained tuning and customization of our analytics engine and an even more robust set of tools to investigate alerts faster than ever before.”

    When speaking about the detection and response landscape overall, Jeffrey Gardner, a former healthcare company Information Security Officer and recently appointed Practice Advisor for Detection and Response at Rapid7, said, “I think the broader detection industry is at this place where there’s an overabundance of data—security professionals have this feeling of ‘I need these log sources and I want this telemetry collected,’ but most solutions don’t make it easy to pull actionable intelligence from this data. I call out ‘actionable’ because most of the products provide a lot of intel but really leave the ‘what should I do next?’ completely up to the end user without guidance.”

    InsightIDR targets this specific issue by providing teams with visibility across their entire environment while simultaneously enabling action from within the solution with curated built-in expertise through out-of-the-box detections, pre-built automation, and high-context investigation and response tools.

    When speaking about projected 2021 cybersecurity trends, Bob Rudis, Chief Data Scientist at Rapid7, noted, “We can be fairly certain ransomware tactics and techniques will continue to be commoditized and industrialized, and criminals will continue to exploit organizations that are strapped for resources and distracted by attempting to survive in these chaotic times.”

    To stay ahead of these new attacker tactics and techniques, visibility into logs, network traffic, and endpoint data will be crucial. These data sources contain the strongest and earliest indicators of potential compromise (as well as form the three pillars of Gartner’s SOC Visibility Triad). Having all of this critical data in a single solution like InsightIDR will help teams work more efficiently and effectively, as well as stay on top of potential new threats and tactics.

    Stay tuned for more in 2021

    See more of Rapid7’s 2021 cybersecurity predictions in our recent blog post here, and keep an eye on our blog and release notes as we continue to highlight the latest in detection and response at Rapid7 throughout the year.

    Not an InsightIDR customer? Start a free trial today!

    Get Started

    Certification Program Aims to Close Skills Gap in Renewable Energy

    Post Syndicated from The IEEE Standards Association original

    THE INSTITUTE Heavier reliance on renewables and other distributed energy resources (DERs) is central to the world’s ongoing evolution toward a more environmentally friendly and reliable energy landscape. Achieving consistent, high-quality assessments of the interconnections of DERs with electricity grids is an important step. But there’s a hurdle to making it a reality: not enough skilled workers.

    That’s why the IEEE Standards Association’s Conformity Assessment Program is working with several utilities on the IEEE 1547 Distributed Energy Resources (DER) Interconnection: Education and Credentialing Program. Partners are Baltimore Gas and Electric, Commonwealth Edison, Dominion Energy, Duke Energy, and Orange and Rockland Utilities.

    “We are excited to work with IEEE and the other partners in creating a program that will satisfy the industry need for a qualified workforce…to support ongoing growth of renewables and other DERs around the world,” Joseph Woomer, Dominion’s vice president of grid and technical solutions, said in a news release about the program.


    Both commercial and residential installations of battery, combined heat and power, solar, wind, and other DERs are on the rise globally. But various complexities around commissioning the interconnections with the electricity grid have blunted progress.

    One challenge is the severe shortage of qualified, credentialed workers to perform the commissioning tasks, especially in developing economies. Also, utilities, DER developers, and owners frequently do not share a common understanding of the requirements and needs—which can lead to missteps, delays, and additional costs.

    At the same time, DER vendors are being pressured to roll out new features and capabilities that meet the needs of different implementations in various regions. Utilities are now being pushed more than ever to evaluate and process higher volumes of DER-interconnection applications. And regulators are struggling to keep their jurisdictions’ interconnection rules in line with technology innovations and other industry developments.


    The credentialing program will be based on the widely adopted IEEE 1547-2018 Standard for Interconnection and Interoperability of Distributed Energy Resources With Associated Electric Power Systems Interfaces, as well as the IEEE 1547.1-2020 Standard Conformance Test Procedures for Equipment Interconnecting Distributed Energy Resources With Electric Power Systems and Associated Interfaces.

    IEEE 1547 defines technical specifications for interconnection and interoperability between electric power systems and DERs of every type. Since its development in 2003, the standard has been cited in energy legislation, regulatory deliberations, and utility engineering and business practices in markets around the world. One such example is in the U.S. Energy Policy Act of 2005, which references IEEE 1547 explicitly.

    As DER deployment has grown exponentially in the years since IEEE 1547’s initial publication, the standard has been refined to address emerging implementation challenges. Its 2018 update addresses numerous changes related to the increased levels of solar arrays and other DERs on the grid. The U.S. National Association of Regulatory Utility Commissioners passed a resolution last year recommending state public utility commissions and other member regulatory agencies engage stakeholders to adopt IEEE 1547-2018.

    The new education and credentialing program being rolled out is designed to address DER interconnection. It is intended to enable training and certification of the standards-based commissioning process for installed DER interconnections.

    “Working with the IEEE Standards Association and the other utilities signed on to this effort is an important step to standardizing a safe and reliable approach to integrating more distributed resources on the system,” Wesley O. Davis said in the news release. He is Duke’s director of DER technical standards, enterprise strategy, and planning.

    Get Involved

    To join in the collaborative effort, email [email protected].

    IEEE membership offers a wide range of benefits and opportunities for those who share a common interest in technology. If you are not already a member, consider joining IEEE and becoming part of a worldwide network of more than 400,000 students and professionals.

    [$] An introduction to SciPy

    Post Syndicated from jake original

    SciPy is a collection of Python
    libraries for scientific and numerical computing. Nearly every serious user
    of Python for scientific research uses SciPy. Since Python is popular across
    all fields of science, and continues to be a prominent language in some
    areas of research, such as data science, SciPy has a large user
    base. On New Year’s Eve, SciPy
    version 1.6
    of the scipy library, which is the central
    component in the SciPy stack. That release gives us a good opportunity to delve
    into this software and give
    some examples of its use.

    SANS and AWS Marketplace Webinar: Learn to improve your Cloud Threat Intelligence program through cloud-specific data sources

    Post Syndicated from IEEE Spectrum Recent Content full text original

    SOC efficiency

    You’re Invited!

    SANS and AWS Marketplace will discuss CTI detection and prevention metrics, finding effective intelligence data feeds and sources, and determining how best to integrate them into security operations functions.

    Attendees of this webinar will learn how to:

    • Understand cloud-specific data sources for threat intelligence, such as static indicators and TTPs.
    • Efficiently search for compromised assets based on indicators provided, events generated on workloads and within the cloud infrastructure, or communications with known malicious IP addresses and domains.
    • Place intelligence and automation at the core of security workflows and decision making to create a comprehensive security program.


    Dave Shackelford

    Dave Shackelford, SANS Analyst, Senior Instructor

    Dave Shackleford, a SANS analyst, senior instructor, course author, GIAC technical director and member of the board of directors for the SANS Technology Institute, is the founder and principal consultant with Voodoo Security. He has consulted with hundreds of organizations in the areas of security, regulatory compliance, and network architecture and engineering. A VMware vExpert, Dave has extensive experience designing and configuring secure virtualized infrastructures. He previously worked as chief security officer for Configuresoft and CTO for the Center for Internet Security. Dave currently helps lead the Atlanta chapter of the Cloud Security Alliance.

    Nam Le

    Nam Le, Specialist Solutions Architect, AWS

    Nam Le is a Specialist Solutions Architect at AWS covering AWS Marketplace, Service Catalog, Migration Services, and Control Tower. He helps customers implement security and governance best practices using native AWS Services and Partner products. He is an AWS Certified Solutions Architect, and his skills include security, compliance, cloud computing, enterprise architecture, and software development. Nam has also worked as a consulting services manager, cloud architect, and as a technical marketing manager.

    It’s Too Easy to Hide Bias in Deep-Learning Systems

    Post Syndicated from Matthew Hutson original

    If you’re on Facebook, click on “Why am I seeing this ad?” The answer will look something like “[Advertiser] wants to reach people who may be similar to their customers” or “[Advertiser] is trying to reach people ages 18 and older” or “[Advertiser] is trying to reach people whose primary location is the United States.” Oh, you’ll also see “There could also be more factors not listed here.” Such explanations started appearing on Facebook in response to complaints about the platform’s ad-placing artificial intelligence (AI) system. For many people, it was their first encounter with the growing trend of explainable AI, or XAI. 

    But something about those explanations didn’t sit right with Oana Goga, a researcher at the Grenoble Informatics Laboratory, in France. So she and her colleagues coded up AdAnalyst, a browser extension that automatically collects Facebook’s ad explanations. Goga’s team also became advertisers themselves. That allowed them to target ads to the volunteers they had running AdAnalyst. The result: “The explanations were often incomplete and sometimes misleading,” says Alan Mislove, one of Goga’s collaborators at Northeastern University, in Boston.

    When advertisers create a Facebook ad, they target the people they want to view it by selecting from an expansive list of interests. “You can select people who are interested in football, and they live in Cote d’Azur, and they were at this college, and they also like drinking,” Goga says. But the explanations Facebook provides typically mention only one interest, and the most general one at that. Mislove assumes that’s because Facebook doesn’t want to appear creepy; the company declined to comment for this article, so it’s hard to be sure.

    Google and Twitter ads include similar explanations. All three platforms are probably hoping to allay users’ suspicions about the mysterious advertising algorithms they use with this gesture toward transparency, while keeping any unsettling practices obscured. Or maybe they genuinely want to give users a modicum of control over the ads they see—the explanation pop-ups offer a chance for users to alter their list of interests. In any case, these features are probably the most widely deployed example of algorithms being used to explain other algorithms. In this case, what’s being revealed is why the algorithm chose a particular ad to show you.

    The world around us is increasingly choreographed by such algorithms. They decide what advertisements, news, and movie recommendations you see. They also help to make far more weighty decisions, determining who gets loans, jobs, or parole. And in the not-too-distant future, they may decide what medical treatment you’ll receive or how your car will navigate the streets. People want explanations for those decisions. Transparency allows developers to debug their software, end users to trust it, and regulators to make sure it’s safe and fair.

    The problem is that these automated systems are becoming so frighteningly complex that it’s often very difficult to figure out why they make certain decisions. So researchers have developed algorithms for understanding these decision-making automatons, forming the new subfield of explainable AI.

    In 2017, the Defense Advanced Research Projects Agency launched a US $75 million XAI project. Since then, new laws have sprung up requiring such transparency, most notably Europe’s General Data Protection Regulation, which stipulates that when organizations use personal data for “automated decision-making, including profiling,” they must disclose “meaningful information about the logic involved.” One motivation for such rules is a concern that black-box systems may be hiding evidence of illegal, or perhaps just unsavory, discriminatory practices.

    As a result, XAI systems are much in demand. And better policing of decision-making algorithms would certainly be a good thing. But even if explanations are widely required, some researchers worry that systems for automated decision-making may appear to be fair when they really aren’t fair at all.

    For example, a system that judges loan applications might tell you that it based its decision on your income and age, when in fact it was your race that mattered most. Such bias might arise because it reflects correlations in the data that was used to train the AI, but it must be excluded from decision-making algorithms lest they act to perpetuate unfair practices of the past.

    The challenge is how to root out such unfair forms of discrimination. While it’s easy to exclude information about an applicant’s race or gender or religion, that’s often not enough. Research has shown, for example, that job applicants with names that are common among African Americans receive fewer callbacks, even when they possess the same qualifications as someone else.

    A computerized résumé-screening tool might well exhibit the same kind of racial bias, even if applicants were never presented with checkboxes for race. The system may still be racially biased; it just won’t “admit” to how it really works, and will instead provide an explanation that’s more palatable.

    Regardless of whether the algorithm explicitly uses protected characteristics such as race, explanations can be specifically engineered to hide problematic forms of discrimination. Some AI researchers describe this kind of duplicity as a form of “fairwashing”: presenting a possibly unfair algorithm as being fair.

     Whether deceptive systems of this kind are common or rare is unclear. They could be out there already but well hidden, or maybe the incentive for using them just isn’t great enough. No one really knows. What’s apparent, though, is that the application of more and more sophisticated forms of AI is going to make it increasingly hard to identify such threats.

    No company would want to be perceived as perpetuating antiquated thinking or deep-rooted societal injustices. So a company might hesitate to share exactly how its decision-making algorithm works to avoid being accused of unjust discrimination. Companies might also hesitate to provide explanations for decisions rendered because that information would make it easier for outsiders to reverse engineer their proprietary systems. Cynthia Rudin, a computer scientist at Duke University, in Durham, N.C., who studies interpretable machine learning, says that the “explanations for credit scores are ridiculously unsatisfactory.” She believes that credit-rating agencies obscure their rationales intentionally. “They’re not going to tell you exactly how they compute that thing. That’s their secret sauce, right?”

    And there’s another reason to be cagey. Once people have reverse engineered your decision-making system, they can more easily game it. Indeed, a huge industry called “search engine optimization” has been built around doing just that: altering Web pages superficially so that they rise to the top of search rankings.

    Why then are some companies that use decision-making AI so keen to provide explanations? Umang Bhatt, a computer scientist at the University of Cambridge and his collaborators interviewed 50 scientists, engineers, and executives at 30 organizations to find out. They learned that some executives had asked their data scientists to incorporate explainability tools just so the company could claim to be using transparent AI. The data scientists weren’t told whom this was for, what kind of explanations were needed, or why the company was intent on being open. “Essentially, higher-ups enjoyed the rhetoric of explainability,” Bhatt says, “while data scientists scrambled to figure out how to implement it.”

    The explanations such data scientists produce come in all shapes and sizes, but most fall into one of two categories: explanations for how an AI-based system operates in general and explanations for particular decisions. These are called, respectively, global and local explanations. Both can be manipulated.

    Ulrich Aïvodji at the Université du Québec, in Montreal, and his colleagues showed how global explanations can be doctored to look better. They used an algorithm they called (appropriately enough for such fairwashing) LaundryML to examine a machine-learning system whose inner workings were too intricate for a person to readily discern. The researchers applied LaundryML to two challenges often used to study XAI. The first task was to predict whether someone’s income is greater than $50,000 (perhaps making the person a good loan candidate), based on 14 personal attributes. The second task was to predict whether a criminal will re-offend within two years of being released from prison, based on 12 attributes.

    Unlike the algorithms typically applied to generate explanations, LaundryML includes certain tests of fairness, to make sure the explanation—a simplified version of the original system—doesn’t prioritize such factors as gender or race to predict income and recidivism. Using LaundryML, these researchers were able to come up with simple rule lists that appeared much fairer than the original biased system but gave largely the same results. The worry is that companies could proffer such rule lists as explanations to argue that their decision-making systems are fair.

    Another way to explain the overall operations of a machine-learning system is to present a sampling of its decisions. Last February, Kazuto Fukuchi, a researcher at the Riken Center for Advanced Intelligence Project, in Japan, and two colleagues described a way to select a subset of previous decisions such that the sample would look representative to an auditor who was trying to judge whether the system was unjust. But the craftily selected sample met certain fairness criteria that the overall set of decisions did not.

    Organizations need to come up with explanations for individual decisions more often than they need to explain how their systems work in general. One technique relies on something XAI researchers call attention, which reflects the relationship between parts of the input to a decision-making system (say, single words in a résumé) and the output (whether the applicant appears qualified). As the name implies, attention values are thought to indicate how much the final judgment depends on certain attributes. But Zachary Lipton of Carnegie Mellon and his colleagues have cast doubt on the whole concept of attention.

    These researchers trained various neural networks to read short biographies of physicians and predict which of these people specialized in surgery. The investigators made sure the networks would not allocate attention to words signifying the person’s gender. An explanation that considers only attention would then make it seem that these networks were not discriminating based on gender. But oddly, if words like “Ms.” were removed from the biographies, accuracy suffered, revealing that the networks were, in fact, still using gender to predict the person’s specialty.

    “What did the attention tell us in the first place?” Lipton asks. The lack of clarity about what the attention metric actually means opens space for deception, he argues.

    Johannes Schneider at the University of Liechtenstein and others recently described a system that examines a decision it made, then finds a plausible justification for an altered (incorrect) decision. Classifying Internet Movie Database (IMDb) film reviews as positive or negative, a faithful model categorized one review as positive, explaining itself by highlighting words like “enjoyable” and “appreciated.” But Schneider’s system could label the same review as negative and point to words that seem scolding when taken out of context.

    Another way of explaining an automated decision is to use a technique that researchers call input perturbation. If you want to understand which inputs caused a system to approve or deny a loan, you can create several copies of the loan application with the inputs modified in various ways. Maybe one version ascribes a different gender to the applicant, while another indicates slightly different income. If you submit all of these applications and record the judgments, you can figure out which inputs have influence.

    That could provide a reasonable explanation of how some otherwise mysterious decision-making systems work. But a group of researchers at Harvard University led by Himabindu Lakkaraju have developed a decision-making system that detects such probing and adjusts its output accordingly. When it is being tested, the system remains on its best behavior, ignoring off-limits factors like race or gender. At other times, it reverts to its inherently biased approach. Sophie Hilgard, one of the authors on that study, likens the use of such a scheme, which is so far just a theoretical concern, to what Volkswagen actually did to detect when a car was undergoing emissions tests, temporarily adjusting the engine parameters to make the exhaust cleaner than it would normally be.

    Another way of explaining a judgment is to output a simple decision tree: a list of if-then rules. The tree doesn’t summarize the whole algorithm, though; instead it includes only the factors used to make the one decision in question. In 2019, Erwan Le Merrer and Gilles Trédan at the French National Center for Scientific Research described a method that constructs these trees in a deceptive way, so that they could explain a credit rating in seemingly objective terms, while hiding the system’s reliance on the applicant’s gender, age, and immigration status.

    Whether any of these deceptions have or ever will be deployed is an open question. Perhaps some degree of deception is already common, as in the case for the algorithms that explain how advertisements are targeted. Schneider of the University of Liechtenstein says that the deceptions in place now might not be so flagrant, “just a little bit misguiding.” What’s more, he points out, current laws requiring explanations aren’t hard to satisfy. “If you need to provide an explanation, no one tells you what it should look like.”

    Despite the possibility of trickery in XAI, Duke’s Rudin takes a hard line on what to do about the potential problem: She argues that we shouldn’t depend on any decision-making system that requires an explanation. Instead of explainable AI, she advocates for interpretable AI—algorithms that are inherently transparent. “People really like their black boxes,” she says. “For every data set I’ve ever seen, you could get an interpretable [system] that was as accurate as the black box.” Explanations, meanwhile, she says, can induce more trust than is warranted: “You’re saying, ‘Oh, I can use this black box because I can explain it. Therefore, it’s okay. It’s safe to use.’ ”

    What about the notion that transparency makes these systems easier to game? Rudin doesn’t buy it. If you can game them, they’re just poor systems, she asserts. With product ratings, for example, you want transparency. When ratings algorithms are left opaque, because of their complexity or a need for secrecy, everyone suffers: “Manufacturers try to design a good car, but they don’t know what good quality means,” she says. And the ability to keep intellectual property private isn’t required for AI to advance, at least for high-stakes applications, she adds. A few companies might lose interest if forced to be transparent with their algorithms, but there’d be no shortage of others to fill the void.

    Lipton, of Carnegie Mellon, disagrees with Rudin. He says that deep neural networks—the blackest of black boxes—are still required for optimal performance on many tasks, especially those used for image and voice recognition. So the need for XAI is here to stay. But he says that the possibility of deceptive XAI points to a larger problem: Explanations can be misleading even when they are not manipulated.

    Ultimately, human beings have to evaluate the tools they use. If an algorithm highlights factors that we would ourselves consider during decision-making, we might judge its criteria to be acceptable, even if we didn’t gain additional insight and even if the explanation doesn’t tell the whole story. There’s no single theoretical or practical way to measure the quality of an explanation. “That sort of conceptual murkiness provides a real opportunity to mislead,” Lipton says, even if we humans are just misleading ourselves.

    In some cases, any attempt at interpretation may be futile. The hope that we’ll understand what some complex AI system is doing reflects anthropomorphism, Lipton argues, whereas these systems should really be considered alien intelligences—or at least abstruse mathematical functions—whose inner workings are inherently beyond our grasp. Ask how a system thinks, and “there are only wrong answers,” he says.

    And yet explanations are valuable for debugging and enforcing fairness, even if they’re incomplete or misleading. To borrow an aphorism sometimes used to describe statistical models: All explanations are wrong—including simple ones explaining how AI black boxes work—but some are useful.

    This article appears in the February 2021 print issue as “Lyin’ AIs.”

    Why Aren’t COVID Tracing Apps More Widely Used?

    Post Syndicated from Michelle Hampson original

    As the COVID-19 pandemic began to sweep around the globe in early 2020, many governments quickly mobilized to launch contract tracing apps to track the spread of the virus. If enough people downloaded and used the apps, it would be much easier to identify people who have potentially been exposed. In theory, contact tracing apps could play a critical role in stemming the pandemic.

    In reality, adoption of contract tracing apps by citizens was largely sporadic and unenthusiastic. A trio of researchers in Australia decided to explore why contract tracing apps weren’t more widely adopted. Their results, published on 23 December in IEEE Software, emphasize the importance of social factors such as trust and transparency.

    Muneera Bano is a senior lecturer of software engineering at Deakin University, in Melbourne. Bano and her co-authors study human aspects of technology adoption. “Coming from a socio-technical research background, we were intrigued initially to study the contact tracing apps when the Australian Government launched the CovidSafe app in April 2020,” explains Bano. “There was a clear resistance from many citizens in Australia in downloading the app, citing concerns regarding trust and privacy.”

    To better understand the satisfaction—or dissatisfaction—of app users, the researchers analyzed data from the Apple and Google app stores. At first, they looked at average star ratings, number of downloads, and conducted a sentiment analysis of app reviews.

    However, just because a person downloads an app doesn’t guarantee that they will use it. What’s more, Bano’s team found that sentiment scores—which are often indicative of an app’s popularity, success, and adoption—were not an effective means for capturing the success of COVID-19 contract tracing apps.

    “We started to dig deeper into the reviews to analyze the voices of users for particular requirements of these apps. More or less all the apps had issues related to the Bluetooth functionality, battery consumption, reliability and usefulness during pandemic.”

    For example, apps that relied on Bluetooth for tracing had issues related to range, proximity, signal strength, and connectivity. A significant number of users also expressed frustration over battery drainage. Some efforts have been made to address this issue; for example, Singapore launched an updated version of its TraceTogether app that allows it to operate with Bluetooth while running in the background, with the goal of improving battery life.

    But, technical issues were just one reason for lack of adoption. Bano emphasizes that, “The major issues around the apps were social in nature, [related to] trust, transparency, security, and privacy.”

    In particular, the researchers found that resistance to downloading and using the app was high in countries with a voluntary adoption model and low trust-index on their governments such as Australia, the United Kingdom, and Germany.  

    “We observed slight improvement only in the case of Germany because the government made sincere efforts to increase trust. This was achieved by increasing transparency during ‘Corona-Warn-App’ development by making it open source from the outset and by involving a number of reputable organizations,” says Bano. “However, even as the German officials were referring to their contact tracing app as the ‘best app’ in the world, Germany was struggling to avoid the second wave of COVID-19 at the time we were analyzing the data, in October 2020.”

    In some cases, even when measures to improve trust and address privacy issues were taken by governments and app developers, people were hesitant to adopt the apps. For example, a Canadian contract tracing app called COVID Alert is open source, requires no identifiable information from users, and all data are deleted after 14 days. Nevertheless, a survey of Canadians found that two thirds would not download any contact tracing app because it still “too invasive.” (The survey covered tracing apps in general, and was not specific to the COVID Alert app).

    Bano plans to continue studying how politics and culture influence the adoption of these apps in different countries around the world. She and her colleagues are interested in exploring how contact tracing apps can be made more inclusive for diverse groups of users in multi-cultural countries.

    Field Notes: Improving Call Center Experiences with Iterative Bot Training Using Amazon Connect and Amazon Lex

    Post Syndicated from Marius Cealera original

    This post was co-written by Abdullah Sahin, senior technology architect at Accenture, and Muhammad Qasim, software engineer at Accenture. 

    Organizations deploying call-center chat bots are interested in evolving their solutions continuously, in response to changing customer demands. When developing a smart chat bot, some requests can be predicted (for example following a new product launch or a new marketing campaign). There are however instances where this is not possible (following market shifts, natural disasters, etc.)

    While voice and chat bots are becoming more and more ubiquitous, keeping the bots up-to-date with the ever-changing demands remains a challenge.  It is clear that a build>deploy>forget approach quickly leads to outdated AI that lacks the ability to adapt to dynamic customer requirements.

    Call-center solutions which create ongoing feedback mechanisms between incoming calls or chat messages and the chatbot’s AI, allow for a programmatic approach to predicting and catering to a customer’s intent.

    This is achieved by doing the following:

    • applying natural language processing (NLP) on conversation data
    • extracting relevant missed intents,
    • automating the bot update process
    • inserting human feedback at key stages in the process.

    This post provides a technical overview of one of Accenture’s Advanced Customer Engagement (ACE+) solutions, explaining how it integrates multiple AWS services to continuously and quickly improve chatbots and stay ahead of customer demands.

    Call center solution architects and administrators can use this architecture as a starting point for an iterative bot improvement solution. The goal is to lead to an increase in call deflection and drive better customer experiences.

    Overview of Solution

    The goal of the solution is to extract missed intents and utterances from a conversation and present them to the call center agent at the end of a conversation, as part of the after-work flow. A simple UI interface was designed for the agent to select the most relevant missed phrases and forward them to an Analytics/Operations Team for final approval.

    Figure 1 – Architecture Diagram

    Amazon Connect serves as the contact center platform and handles incoming calls, manages the IVR flows and the escalations to the human agent. Amazon Connect is also used to gather call metadata, call analytics and handle call center user management. It is the platform from which other AWS services are called: Amazon Lex, Amazon DynamoDB and AWS Lambda.

    Lex is the AI service used to build the bot. Lambda serves as the main integration tool and is used to push bot transcripts to DynamoDB, deploy updates to Lex and to populate the agent dashboard which is used to flag relevant intents missed by the bot. A generic CRM app is used to integrate the agent client and provide a single, integrated, dashboard. For example, this addition to the agent’s UI, used to review intents, could be implemented as a custom page in Salesforce (Figure 2).

    Figure 2 – Agent feedback dashboard in Salesforce. The section allows the agent to select parts of the conversation that should be captured as intents by the bot.

    A separate, stand-alone, dashboard is used by an Analytics and Operations Team to approve the new intents, which triggers the bot update process.


    The typical use case for this solution (Figure 4) shows how missing intents in the bot configuration are captured from customer conversations. These intents are then validated and used to automatically build and deploy an updated version of a chatbot. During the process, the following steps are performed:

    1. Customer intents that were missed by the chatbot are automatically highlighted in the conversation
    2. The agent performs a review of the transcript and selects the missed intents that are relevant.
    3. The selected intents are sent to an Analytics/Ops Team for final approval.
    4. The operations team validates the new intents and starts the chatbot rebuild process.

    Figure 3 – Use case: the bot is unable to resolve the first call (bottom flow). Post-call analysis results in a new version of the bot being built and deployed. The new bot is able to handle the issue in subsequent calls (top flow)

    During the first call (bottom flow) the bot fails to fulfil the request and the customer is escalated to a Live Agent. The agent resolves the query and, post call, analyzes the transcript between the chatbot and the customer, identifies conversation parts that the chatbot should have understood and sends a ‘missed intent/utterance’ report to the Analytics/Ops Team. The team approves and triggers the process that updates the bot.

    For the second call, the customer asks the same question. This time, the (trained) bot is able to answer the query and end the conversation.

    Ideally, the post-call analysis should be performed, at least in part, by the agent handling the call. Involving the agent in the process is critical for delivering quality results. Any given conversation can have multiple missed intents, some of them irrelevant when looking to generalize a customer’s question.

    A call center agent is in the best position to judge what is or is not useful and mark the missed intents to be used for bot training. This is the important logical triage step. Of course, this will result in the occasional increase in the average handling time (AHT). This should be seen as a time investment with the potential to reduce future call times and increase deflection rates.

    One alternative to this setup would be to have a dedicated analytics team review the conversations, offloading this work from the agent. This approach avoids the increase in AHT, but also introduces delay and, possibly, inaccuracies in the feedback loop.

    The approval from the Analytics/Ops Team is a sign off on the agent’s work and trigger for the bot building process.


    The following section focuses on the sequence required to programmatically update intents in existing Lex bots. It assumes a Connect instance is configured and a Lex bot is already integrated with it. Navigate to this page for more information on adding Lex to your Connect flows.

    It also does not cover the CRM application, where the conversation transcript is displayed and presented to the agent for intent selection.  The implementation details can vary significantly depending on the CRM solution used. Conceptually, most solutions will follow the architecture presented in Figure1: store the conversation data in a database (DynamoDB here) and expose it through an (API Gateway here) to be consumed by the CRM application.

    Lex bot update sequence

    The core logic for updating the bot is contained in a Lambda function that triggers the Lex update. This adds new utterances to an existing bot, builds it and then publishes a new version of the bot. The Lambda function is associated with an API Gateway endpoint which is called with the following body:

    	“intent”: “INTENT_NAME”,
    	“utterances”: [“UTTERANCE_TO_ADD_1”, “UTTERANCE_TO_ADD_2” …]

    Steps to follow:

    1. The intent information is fetched from Lex using the getIntent API
    2. The existing utterances are combined with the new utterances and deduplicated.
    3. The intent information is updated with the new utterances
    4. The updated intent information is passed to the putIntent API to update the Lex Intent
    5. The bot information is fetched from Lex using the getBot API
    6. The intent version present within the bot information is updated with the new intent

    Figure 4 – Representation of Lex Update Sequence


    7. The update bot information is passed to the putBot API to update Lex and the processBehavior is set to “BUILD” to trigger a build. The following code snippet shows how this would be done in JavaScript:

    const updateBot = await lexModel
            processBehavior: "BUILD"

    9. The last step is to publish the bot, for this we fetch the bot alias information and then call the putBotAlias API.

    const oldBotAlias = await lexModel
            name: config.botAlias,
    return lexModel
            name: config.botAlias,
            botVersion: updatedBot.version,
            checksum: oldBotAlias.checksum,


    In this post, we showed how a programmatic bot improvement process can be implemented around Amazon Lex and Amazon Connect.  Continuously improving call center bots is a fundamental requirement for increased customer satisfaction. The feedback loop, agent validation and automated bot deployment pipeline should be considered integral parts to any a chatbot implementation.

    Finally, the concept of a feedback-loop is not specific to call-center chatbots. The idea of adding an iterative improvement process in the bot lifecycle can also be applied in other areas where chatbots are used.

    Accelerating Innovation with the Accenture AWS Business Group (AABG)

    By working with the Accenture AWS Business Group (AABG), you can learn from the resources, technical expertise, and industry knowledge of two leading innovators, helping you accelerate the pace of innovation to deliver disruptive products and services. The AABG helps customers ideate and innovate cloud solutions with customers through rapid prototype development.

    Connect with our team at [email protected] to learn and accelerate how to use machine learning in your products and services.

    Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.


    Abdullah Sahin

    Abdullah Sahin

    Abdullah Sahin is a senior technology architect at Accenture. He is leading a rapid prototyping team bringing the power of innovation on AWS to Accenture customers. He is a fan of CI/CD, containerization technologies and IoT.

    Muhammad Qasim

    Muhammad Qasin

    Muhammad Qasim is a software engineer at Accenture and excels in development of voice bots using services such as Amazon Connect. In his spare time, he plays badminton and loves to go for a run.

    Operating Lambda: Design principles in event-driven architectures – Part 2

    Post Syndicated from James Beswick original

    In the Operating Lambda series, I cover important topics for developers, architects, and systems administrators who are managing AWS Lambda-based applications. This three-part section discusses event-driven architectures and how these relate to Lambda-based applications.

    Part 1 covers the benefits of the event-driven paradigm and how it can improve throughput, scale and extensibility. This post explains some of the design principles and best practices that can help developers gain the benefits of building Lambda-based applications.


    Many of the best practices that apply to software development and distributed systems also apply to serverless application development. The broad principles are consistent with the Well-Architected Framework. The overall goal is to develop workloads that are:

    • Reliable: offering your end users a high level of availability. AWS serverless services are reliable because they are also designed for failure.
    • Durable: providing storage options that meet the durability needs of your workload.
    • Secure: following best practices and using the tools provided to secure access to workloads and limit the blast radius, if any issues occur.
    • Performant: using computing resources efficiently and meeting the performance needs of your end users.
    • Cost-efficient: designing architectures that avoid unnecessary cost that can scale without overspending, and also be decommissioned, if necessary, without significant overhead.

    When you develop Lambda-based applications, there are several important design principles that can help you build workloads that meet these goals. You may not apply every principle to every architecture and you have considerable flexibility in how you approach building with Lambda. However, they should guide you in general architecture decisions.

    Use services instead of custom code

    Serverless applications usually comprise several AWS services, integrated with custom code run in Lambda functions. While Lambda can be integrated with most AWS services, the services most commonly used in serverless applications are:

    CategoryAWS service
    ComputeAWS Lambda
    Data storageAmazon S3
    Amazon DynamoDB
    Amazon RDS
    APIAmazon API Gateway
    Application integrationAmazon EventBridge
    Amazon SNS
    Amazon SQS
    OrchestrationAWS Step Functions
    Streaming data and analyticsAmazon Kinesis Data Firehose

    There are many well-established, common patterns in distributed architectures that you can build yourself or implement using AWS services. For most customers, there is little commercial value in investing time to develop these patterns from scratch. When your application needs one of these patterns, use the corresponding AWS service:

    PatternAWS service
    QueueAmazon SQS
    Event busAmazon EventBridge
    Publish/subscribe (fan-out)Amazon SNS
    OrchestrationAWS Step Functions
    APIAmazon API Gateway
    Event streamsAmazon Kinesis

    These services are designed to integrate with Lambda and you can use infrastructure as code (IaC) to create and discard resources in the services. You can use any of these services via the AWS SDK without needing to install applications or configure servers. Becoming proficient with using these services via code in your Lambda functions is an important step to producing well-designed serverless applications.

    Understanding the level of abstraction

    The Lambda service limits your access to the underlying operating systems, hypervisors, and hardware running your Lambda functions. The service continuously improves and changes infrastructure to add features, reduce cost and make the service more performant. Your code should assume no knowledge of how Lambda is architected and assume no hardware affinity.

    Similarly, the integration of other services with Lambda is managed by AWS with only a small number of configuration options exposed. For example, when API Gateway and Lambda interact, there is no concept of load balancing available since it is entirely managed by the services. You also have no direct control over which Availability Zones the services use when invoking functions at any point in time, or how and when Lambda execution environments are scaled up or destroyed.

    This abstraction allows you to focus on the integration aspects of your application, the flow of data, and the business logic where your workload provides value to your end users. Allowing the services to manage the underlying mechanics helps you develop applications more quickly with less custom code to maintain.

    Implementing statelessness in functions

    When building Lambda functions, you should assume that the environment exists only for a single invocation. The function should initialize any required state when it is first started – for example, fetching a shopping cart from a DynamoDB table. It should commit any permanent data changes before exiting to a durable store such as S3, DynamoDB, or SQS. It should not rely on any existing data structures or temporary files, or any internal state that would be managed by multiple invocations (such as counters or other calculated, aggregate values).

    Lambda provides an initializer before the handler where you can initialize database connections, libraries, and other resources. Since execution environments are reused where possible to improve performance, you can amortize the time taken to initialize these resources over multiple invocations. However, you should not store any variables or data used in the function within this global scope.

    Lambda function design

    Most architectures should prefer many, shorter functions over fewer, larger ones. Making Lambda functions highly specialized for your workload means that they are concise and generally result in shorter executions. The purpose of each function should be to handle the event passed into the function, with no knowledge or expectations of the overall workflow or volume of transactions. This makes the function agnostic to the source of the event with minimal coupling to other services.

    Any global-scope constants that change infrequently should be implemented as environment variables to allow updates without deployments. Any secrets or sensitive information should be stored in AWS Systems Manager Parameter Store or AWS Secrets Manager and loaded by the function. Since these resources are account-specific, this allows you to create build pipelines across multiple accounts. The pipelines load the appropriate secrets per environment, without exposing these to developers or requiring any code changes.

    Building for on-demand data instead of batches

    Many traditional systems are designed to run periodically and process batches of transactions that have built up over time. For example, a banking application may run every hour to process ATM transactions into central ledgers. In Lambda-based applications, the custom processing should be triggered by every event, allowing the service to scale up concurrency as needed, to provide near-real time processing of transactions.

    While you can run cron tasks in serverless applications by using scheduled expressions for rules in Amazon EventBridge, these should be used sparingly or as a last-resort. In any scheduled task that processes a batch, there is the potential for the volume of transactions to grow beyond what can be processed within the 15-minute Lambda timeout. If the limitations of external systems force you to use a scheduler, you should generally schedule for the shortest reasonable recurring time period.

    For example, it’s not best practice to use a batch process that triggers a Lambda function to fetch a list of new S3 objects. This is because the service may receive more new objects in between batches than can be processed within a 15-minute Lambda function.

    S3 fetch anti-pattern

    Instead, the Lambda function should be invoked by the S3 service each time a new object is put into the S3 bucket. This approach is significantly more scalable and also invokes processing in near-real time.

    S3 to Lambda events

    Orchestrating workflows

    Workflows that involve branching logic, different types of failure models and retry logic typically use an orchestrator to keep track of the state of the overall execution. Avoid using Lambda functions for this purpose, since it results in tightly coupled groups of functions and services and complex code handling routing and exceptions.

    With AWS Step Functions, you use state machines to manage orchestration. This extracts the error handling, routing, and branching logic from your code, replacing it with state machines declared using JSON. Apart from making workflows more robust and observable, it allows you to add versioning to workflows and make the state machine a codified resource that you can add to a code repository.

    It’s common for simpler workflows in Lambda functions to become more complex over time, and for developers to use a Lambda function to orchestrate the flow. When operating a production serverless application, it’s important to identify when this is happening, so you can migrate this logic to a state machine.

    Developing for retries and failures

    AWS serverless services, including Lambda, are fault-tolerant and designed to handle failures. In the case of Lambda, if a service invokes a Lambda function and there is a service disruption, Lambda invokes your function in a different Availability Zone. If your function throws an error, the Lambda service retries your function.

    Since the same event may be received more than once, functions should be designed to be idempotent. This means that receiving the same event multiple times does not change the result beyond the first time the event was received.

    For example, if a credit card transaction is attempted twice due to a retry, the Lambda function should process the payment on the first receipt. On the second retry, either the Lambda function should discard the event or the downstream service it uses should be idempotent.

    A Lambda function implements idempotency typically by using a DynamoDB table to track recently processed identifiers to determine if the transaction has been handled previously. The DynamoDB table usually implements a Time To Live (TTL) value to expire items to limit the storage space used.

    Idempotent microservice

    For failures within the custom code of a Lambda function, the service offers a number of features to help preserve and retry the event, and provide monitoring to capture that the failure has occurred. Using these approaches can help you develop workloads that are resilient to failure and improve the durability of events as they are processed by Lambda functions.


    This post discusses the design principles that can help you develop well-architected serverless applications. I explain why using services instead of code can help improve your application’s agility and scalability. I also show how statelessness and function design also contribute to good application architecture. I cover how using events instead of batches helps serverless development, and how to plan for retries and failures in your Lambda-based applications.

    Part 3 of this series will look at common anti-patterns in event-driven architectures and how to avoid building these into your microservices.

    For more serverless learning resources, visit Serverless Land.

    SAF Products Integration into Zabbix

    Post Syndicated from Tatjana Dunce original

    Top of the line point-to-point microwave equipment manufacturer SAF Tehnika has partnered with Zabbix to provide NMS capabilities to its end customers. SAF Tehnika appreciates Zabbix’s customizability, scalability, ease of template design, and SAF products integration.


    I. SAF Tehnika (1:14)
    II. SAF point-to-point microwave systems (3:20)
    III. SAF product lines (5:37)

    √ Integra (5:54)
    √ PhoeniX-G2 (6:41)

    IV. SAF services (7:21)
    V. SAF partnership with Zabbix (8:48)

    √  Zabbix templates for SAF equipment (10:50)
    √ Zabbix Maps view for Phoenix G2 (15:00)

    VI. Zabbix services provided by SAF (17:56)
    VII. Questions & Answers (20:00)

    SAF Tehnika

    SAF Tehnika comes from a really small country — Latvia, as well as Zabbix.

    SAF Tehnika:

    ✓ has been around for over 20 years,
    ✓ has been profitable/has no debt balance sheet,
    ✓ is present in 130+ countries,
    ✓ has manufacturing facilities in the European Union,
    ✓ is ISO 9001 certified,
    ✓ is Zabbix Certified Partner since August 2020,
    ✓ is publicly traded on NASDAQ Riga Stock Exchange,
    ✓ has flexible R&D, and is able to provide custom solutions based on customer requirements.

    SAF Tehnika is primarily manufacturing:

    • point-to-point systems,
    • hand-held MW spectrum analyzers,
    • Aranet wireless sensors and solutions.

    SAF Tehnika main product groups

    SAF point-to-point microwave systems

    Point-to-point microwave systems are an alternative to a fiber line. Instead of a fiber line, we have two radio systems with the antenna installed on two towers. The distance between those towers could be anywhere from a few km up to 50 or even 100 km. The data is transmitted from one point to another wirelessly.

    SAF Tehnika point-to-point MW system technology provides:

    • long-distance wireless links;
    • free and excellent technical support;
    • fast & easy deployment;
    • The 5-year standard warranty for SAF products as SAF Tehnika ensures their top quality thanks to using high-quality materials and manufacturing reliable chipsets, as well as chamber testing of all products;
    • solutions for:

    √ WISPs,
    √ TV and broadcasting (No.1 in the USA),
    √ public safety,
    √ utilities & mining,
    √ enterprise networks,
    √ local government & military,
    √ low-latency/HFT (No.1 globally).

    SAF product lines

    The primary PTP Microwave product series manufactured by SAF Tehnika are Integra and Phoenix G2

    SAF Tehnika main radio products


    Integra is a full-outdoor radio, which can be attached directly to the antenna, so there is nothing indoors besides the power supply.

    Integra-E — wireless fiber solution specifically tailored for dense urban deployment:

    • operates in E-band range,
    • can achieve the throughput of up to 10 Gbps per second,
    • operates in 2GHz bandwidth.

    Integra-X — a powerful dual-core system for network backbone deployment, incorporates two radios in a single enclosure and two modem chains allowing this system to operate with built-in 2+0 XPIC, reaching the maximum data transmission capacity of up to 2.2 gigabits per second.


    The PhoeniX G2 product line can either be split-mount with the modem installed indoors and radio — outdoors or full-indoors solution. For instance, the broadcast market is mostly using full-indoor solutions as they prefer to have all the equipment to be indoors and then have a long elliptical line going up to the antenna. Phoenix G2 product line features native ASI transmission in parallel with the IP traffic – a crucial requirement of our broadcast customers.

    SAF services

    SAF has also been offering different sets of services:

    product training.
    link planning.
    technical support.
    staging and configuration — enables the customers to have all the equipment labeled and configured before it gets to the customer.
    FCC coordination — recently added to the SAF portfolio and offered only to the customers in the USA. This provides an opportunity to save time on link planning, FCC coordination, pre-configuration, and hardware purchase from the one-stop-shop – SAF Tehnika.

    Zabbix deployment and support.

    SAF partnership with Zabbix

    Before partnering with Zabbix, SAF Tehnika had developed its own network management system and used it for many years. Unfortunately, this software was limited to SAF products. Adding other vendors’ products was a difficult and complicated process.

    More and more SAF customers were inquiring about the possibility of adding other vendors’ products to the network management system. That is where Zabbix came in handy, as besides monitoring SAF products, Zabbix can also monitor other vendors’ products just by adding appropriate templates.

    Zabbix is an open-source, advanced and robust platform with high customizability and scalability – there are virtually no limits to the number of devices Zabbix can monitor. I am confident our customers will appreciate all of these benefits and enjoy the ability to add SAF and other vendor products to the list of monitored devices.

    Finally, SAF Tehnika and Zabbix are located in the same small town, so the partnership was easy and natural.

    Zabbix templates for SAF equipment

    Following training at Zabbix, SAF engineers obtained the certified specialist status and developed Zabbix SNMP-based templates for main product lines:

    • Integra-X, Integra-E, Integra-G, Integra-W, and Phoenix G2.

    SAF main product line templates are available free of charge to all of SAF customers on the SAF webpage:
    (registration required).

    Users proficient in Linux and familiar with Zabbix can definitely install and deploy these templates themselves. Otherwise, SAF specialists are ready to assist in the deployment and integration of Zabbix templates and tuning of the required parameters.

    Zabbix dashboard for Integra X

    We have an Integra-X link Zabbix dashboard shown as an example below. As Integra-X is a dual-core radio, we provide the monitoring parameters for two radios in a single enclosure.

    Zabbix dashboard for Integra-X link

    On the top, we display the main health parameters of the link, current received signal level, MSE or so-called noise level, the transmit power, and the IP address – a small summary of the link.

    On the left, we display the parameters of the radio and the graphs for the last couple of minutes — the live graphs of the received signal level and MSE, the noise level of the RF link.

    On the right, we have the same parameters for the remote side. In the middle, we have added a few parameters, which should be monitored, such as CPU load percentage, the current traffic over the link, and diagnostic parameters, such as the temperature of each of the modems.

    At the bottom, we have added the alarm widget. In this example, the alarm of too low received signal level is shown. These alarms are also colored by their severity: red alarms are for disaster-level issues, blue alarms — for information.

    From this dashboard, the customers are able to estimate the current status of the link and any issues that have appeared in the past. Note that Zabbix graphs can be easily customized to display the widgets or graphs of customer choice.

    Zabbix Maps view for Phoenix G2

    Zabbix Maps view for Phoenix G2 1+1 system

    In the map, our full-indoor Phoenix G2 system is displayed in duplicate, as this is a 1+1 protected link. Each of the IDU,  ASI module, and radio module is protected by the second respective module.

    Zabbix allows for naming each of these modules and for monitoring every module’s performance individually. In this example, the ASI module is colored in red as one of the ASI ports has lost the connection, while the radio unit’s red color shows that the received signal is lower than expected.

    Zabbix dashboard for Phoenix G2 1+1 system

    Besides the maps view, the dashboard for Phoenix G2 1+1 system shows the historical data like alarm log and graphs. The data in red indicates that an issue hasn’t been cleared yet. The data in green – an issue was resolved, for instance, a low signal level was restored after going down for a short period of time.

    In the middle we see a summary graph of all four radios’ performance — two on the local side and two on the remote side. Here, we are monitoring the most important parameters — the received signal level and MSE i.e., the noise level.

    The graph at the bottom is important for broadcast customers as the majority of them transmit ASI traffic besides Ethernet and IP traffic. Here they’re able to monitor how much traffic was going through this link in the past.

    Zabbix services provided by SAF

    Since SAF Tehnika has experienced Zabbix-certified specialists, who have developed these multiple templates, we are ready to provide Zabbix-related services to our end customers, such as:

    • Zabbix deployment on other customer’s machine, integration and configuration of all the parameters, and fine-tuning according to our customers’ requirements.
    • consulting services and provide technical support on an annual contract basis.

    NOTE. SAF Zabbix support services are limited to SAF products.

    SAF Tehnika is ready and eager to provide Zabbix related services to our customers. If you already have a SAF network and would be willing to integrate it into Zabbix or plan to deploy a new SAF network integrated with Zabbix, you can contact our offices.

    SAF contact details

    Questions & Answers

    Question. You told us that you provide Zabbix for your customers and create templates to pass to them, and so on. But do you use Zabbix in your own environment, for instance, in your offices, to monitor your own infrastructure?

    Answer. SAF has been using Zabbix for almost 10 years and we use it to monitor our internal infrastructure. Currently, SAF has three separate Zabbix networks: one for SAF IT system monitoring, the other for Kubernetes system monitoring (Aranet Cloud services), and a separate Zabbix server for testing purposes, where we are able to test SAF equipment as well as experiment with Zabbix server deployments, templates, etc.

    Question. You have passed the specialist courses. Do you have any plans on becoming certified professionals?

    Answer. Our specialists are definitely interested in Zabbix certified professionals’ courses. We will make the decision about that based on the revenue Zabbix will bring us and interest of our customers.

    Question. You have already provided a couple of templates for Zabbix and for your customers. Do you have any interesting templates you are working on? Do you have plans to create or upgrade some existing templates?

    Answer. So far, we have released the templates for the main product lines — Integra and Phoenix G2. We have a few product lines that are are more specialized, such as low-latency products and some older products, such as CFIP Lumina. In case any of our customers are interested in integrating these older products or low-latency products, we might create more templates.

    Question. Do you plan to refine the templates to make them a part of the Zabbix out-of-the-box solution?

    Answer. If Zabbix is going to approve our templates to make them a part of the out-of-the-box solution, it will benefit our customers in using and monitoring our products. We’ll be delighted to provide the templates for this purpose.


    The collective thoughts of the interwebz

    By continuing to use the site, you agree to the use of cookies. more information

    The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.