Tag Archives: serverless

The Network is the Computer: A Conversation with John Gage

Post Syndicated from John Graham-Cumming original https://blog.cloudflare.com/john-gage/

The Network is the Computer: A Conversation with John Gage

The Network is the Computer: A Conversation with John Gage

To learn more about the origins of The Network is the Computer®, I spoke with John Gage, the creator of the phrase and the 21st employee of Sun Microsystems. John had a key role in shaping the vision of Sun and had a lot to share about his vision for the future. Listen to our conversation here and read the full transcript below.


[00:00:13]

John Graham-Cumming: I’m talking to John Gage who was what, the 21st employee of Sun Microsystems, which is what Wikipedia claims and it also claims that you created this phrase “The Network is the Computer,” and that’s actually one of the things I want to talk about with you a little bit because I remember when I was in Silicon Valley seeing that slogan plastered about the place and not quite understanding what it meant. So do you want to tell me what you meant by it or what Sun meant by it at the time?

[00:00:40]

John Gage: Well, in 2019, recalling what it meant in 1982 or 83’ will be colored by all our experience since then but at the time it seemed so obvious that when we introduced the first scientific workstations, they were not very powerful computers. The first Suns had a giant screen and they were on the Internet but they were designed as a complementary component to supercomputers. Bill Joy and I had a series of diagrams for talks we’d give, and Bill had the bi-modal, the two node picture. The serious computing occurred on the giant machines where you could fly into the heart of a black hole and the human interface was the workstation across the network. So each had to complement the other, each built on the strengths of the other, and each enhanced the other because to deal in those days with a supercomputer was very ugly. And to run all your very large computations, you could run them on a Sun because we had virtual memory and series of such advanced things but not fast. So the speed of scientific understanding is deeply affected by the tools the scientist has — is it a microscope, is it an optical telescope, is it a view into the heart of a star by running a simulation on a supercomputer? You need to have the loop with the human and the science constantly interacting and constantly modifying each other, and that’s what the network is for, to tie those different nodes together in as seamless a way as possible. Then, the instant anyone that’s ever created a programming language says, “so if I have to create a syntax of this where I’m trying to let you express, do this, how about the delay on the network, the latency”? Does your phrase “The Network is the Computer” really capture this hundreds, thousands, tens of thousands, millions perhaps at that time, now billions and billions and billions today, all these devices interacting and exchanging state with latency, with delay. It’s sort of an oversimplification, and that we would point out, but it’s just network is the computer. Four words, you know, what we tried to do is give a metaphor that allows you to explore it in your mind and think of new things to do and be inspired.

[00:03:35]

Graham-Cumming: And then by a sort of strange sequence of events, that was a trademark of Sun. It got abandoned. And now Cloudflare has swooped in and trademarked it again. So now it’s our trademark which sort of brings us full circle, I suppose.

[00:03:51]

Gage: Well, trademarks are dealing with the real world, but the inspiration of Cloudflare is to do exactly what Bill Joy and I were talking about in 1982. It’s to build an environment in which every participant globally can share with security, and we were not as strong. Bill wrote most of the code of TCP/IP implemented by every other computer vendor, and still these questions of latency, these questions of distributed denial of service which was, how do you block that? I was so happy to see that Cloudflare invests real money and real people in addressing those kinds of critical problems, which are at the core, what will destroy the Internet.

[00:14:48]

Graham-Cumming: Yes, I agree. I mean, it is a significant investment to actually deal with it and what I think people don’t appreciate about the DDoS attack situation is that they are going on all the time and it’s just a continuous, you know, just depends who the target is. It’s funny you mentioned TCP/IP because about 10 years after, so in about ‘92, my first real job, I had to write a TCP/IP stack for an obscure network card. And this was prior to the Internet really being available everywhere. And so I didn’t realize I could go and get the BSD implementation and recompile it. So I did it from scratch from the RFCs.

[00:05:23]

Gage: You did!

[00:05:25]

Graham-Cumming: And the thing I recommend here is that nobody ever does that because, you know, the real world, real code that really interacts is really hard when you’re trying to work it with other things, so.

[00:05:36]

Gage: Do you still, John, do you have that code?

[00:05:42]

Graham-Cumming: I wonder. I have the binary for it.

[00:05:46]

Gage: Do hunt for it, because our story was at the time DARPA, the Defense Advanced Research Projects Agency, that had funded networking initiatives around the world. I just had a discussion yesterday with Norway and they were one of the first entities to implement using essentially Bill Joy’s code, but to be placed on the ARPANET. And a challenge went out, and at that time the slightly older generation, the Bolt Beranek and Newman Group, Vint Cerf, Bob Con, those names, as Vint Cerf was a grad student at UCLA where he had built one of the four first Internet sites and the DARPA offices were in Arlington, Virginia, they had massive investments in detection of nuclear underground tests, so seismological data, and the moment we made the very first Suns, I shipped them to DARPA, we got the network up and began serving seismic data globally. Really lovely visualization of events. If you’re trying to detect something, those things go off and then there’s a distinctive signature, a collapse of the underground cavern after. So DARPA had tried to implement, as you did, from the spec, from the RFC, the components, and Vint had designed a lot of this, all the acknowledgement codes and so forth that you had to implement in TCP/IP. So Bill, as a graduate student at Berkeley, we had a meeting in Arlington at DARPA headquarters where BBN and AT&T Bell Labs and a number of other people were in the room. Their code didn’t work, this graduate student from Berkeley named Bill Joy, his code did work, and when Bob Kahn and Vint Cerf asked Bill, “Well, so how did you do it?” What he said was exactly what you just said, he said, “I just read the spec and wrote the code.”

[00:08:12]

Graham-Cumming: I do remember very distinctly because the company I was working at didn’t have a TCP/IP stack and we didn’t have any IP machines, right, we were doing actually stuff that was all IBM networking, SMA stuff. Somehow we bought what was at that point a HP machine, it was an Apollo workstation and a Sun workstation. I had them on Ethernet and talking to each other. And I do distinctly remember the first time a ping packet came back from that Sun box, saying, yes I managed to send you an IP packet, you managed to send me ICMP response and that was pretty magical. And then I got to TCP and that was hard.

[00:08:55]

Gage: That was hard. Yeah. When you get down to the details, the spec can be wrong. I mean, it will want you to do something that’s a stupid thing to do. So Bill has such good taste in these things. It would be interesting to do a kind of a diff across the various implementations of the stack. Years and years later we had maybe 50 companies all assemble in a room, only engineers, throw out all the marketing people and all the Ps and VPs and every company in this room—IBM, Hewlett-Packard—oh my God, Hewlett-Packard, fix your TCP—and we just kept going until everybody could work with everybody else in sort of a pact. We’re not going to reveal, Honeywell, that you guys were great with earlier absolute assembly code, determinate time control stuff but you have no clue about how packets work, we’ll help you, so that all of us can make every machine interoperate, which yielded the network show, Interop. Every year we would go put a bunch of fiber inside whatever, you know, Geneva, or pick some, Las Vegas, some big venue.

[00:10:30]

Graham-Cumming: I used to go to Vegas all the time and that was my great introduction to Vegas was going there for Interop, year after year.

[00:10:35]

Gage: Oh, you did! Oh, great.

[00:10:36]

Graham-Cumming: Yes, yes, yes.

[00:10:39]

Gage: You know in a way, what you’re doing with, for example, just last week with the Verizon problem, everybody implementing what you’re doing now that is not open about their mistakes and what they’ve learned and is not sharing this, it’s a problem. And your global presence to me is another absolutely critical thing. We had about, I forget, 600 engineers in Beijing at the East Gate of Tsinghua a lot of networking expertise and lots of those people are at Tencent and Huawei and those network providers throughout the rest of the world, politics comes and goes but the engineering has to be done in a way that protects us. And so these conversations globally are critical.

[00:11:33]

Graham-Cumming: Yes, that’s one of the things that’s fascinating actually about doing real things on the real Internet is there is a global community of people making computers talk to each other and you know, that it’s a tremendously complicated thing to actually make that work, and you do it across countries, across languages. But you end up actually making them work, and that’s the Internet we’re sitting on, that you and I are talking on right now that is based on those conversations around the world.

[00:12:01]

Gage: And only by doing it do you understand more deeply how to do it. It’s very difficult in the abstract to say what should happen as we begin to spread. As Sun grew, every major city in Africa had installations and for network access, you were totally dependent on an often very corrupt national telco or the complications dealing with these people just to make your packet smooth. And as it turned out, many of the intelligence and military entities in all of these countries had very little understanding of any of this. That’s changed to some degree. But the dangerous sides of the Internet. Total surveillance, IPv6, complete control of exact identity of origins of packets. We implemented, let’s see, you had an early Sun. We probably completed our IPv6 implementation, was it still fluid in the 90s, but I remember 10 years after we finished a complete implementation of IPv6, the U.S. was still IPv4, it’s still IPv4.

[00:13:25]

Graham-Cumming: It still is, it still is. Pretty much. Except for the mobile carriers right now. I think in general the mobile phone operators are the ones who’ve gone more into IPv6 than anybody else.

[00:13:37]

Gage: It was remarkable in China. We used to have a conference. We’d bring a thousand Chinese universities into a room. Professor Wu from Tsinghua who built the Chinese Education and Research Network, CERNET. And now a thousand universities have a building on campus doing Internet research. We would get up and show this map of China and he kept his head down politically, but he managed at the point when there was a big fight between the Minister of Telecom and the Minister of Railways. The Minister of Railways said, look, I have continuity throughout China because I have railines. I’ve just made a partnership with the People’s Liberation Army, and they are essentially slave labor, and they’re going to dig the ditches, and I’m going to run fiber alongside the railways and I don’t care what you, the Minister of Telecommunications, has to say about it, because I own the territory. And that created a separate pathway for the backbone IPv6 network in China. Cheap, cheap, cheap, get everybody doing things.

[00:14:45]

Graham-Cumming: Yes, now of course in China that’s resulted in an interesting situation where you have China Telecom and China Unicom, who sort of cooperate with each other but they’re almost rivals which makes IP packets quite difficult to route inside China.

[00:14:58]

Gage: Yes exactly. At one point I think we had four hunks of China. Everyone was geographically divided. You know there were meetings going on, I remember the moment they merged the telecom ministry with the electronics ministry and since we were working with both of them, I walk in a room and there’s a third group, people I didn’t know, it turns out that’s the People’s Liberation Army.

[00:15:32]

Graham-Cumming: Yes, they’re part of the team. So okay, going back to this “Network is the Computer” notion. So you were talking about the initial things that you were doing around that, why is it that it’s okay that Cloudflare has gone out and trademarked that phrase now, because you seem to think that we’ve got a leg to stand on, I guess.

[00:15:56]

Gage: Frankly, I’d only vaguely heard of Cloudflare. I’ve been working in areas, I’ve got a project in the middle of Nairobi in the slum where I’ve spent the last 15 years or so learning a lot about clean water and sewage treatment because we have almost 400,000 people in a very small area, biggest slum in East Africa. How can you introduce sanitary water and clean sewage treatment into a very, an often corrupt, a very difficult environment, and so that’s been a fascination of mine and I’ve been spending a lot of time. What’s a computer person know about fluid dynamics and pathogens? There’s a lot to learn. So as you guys grew so rapidly, I vaguely knew of you but until I started reading your blog about post-quantum crypto and how do we devise a network in these resilient denial of service attacks and all these areas where you’re a growing company, it’s very hard to take time to do serious advanced research-level work on distributed computing and distributed security, and yet you guys are doing it. When Bill created Java, the subsequent step from Java for billions and billions of devices to share resources and share computations was something we call Genie which is a framework for validation of who you are, movement of code from device to device in a secure way, total memory control so that someone is not capable of taking over memory in your device as we’ve seen with Spectre and the failures of these billions of Intel chips out there that all have a flaw on take all branches parallel compute implementations. So the very hardware you’re using can be insecure so your operating systems are insecure, the hardware is insecure, and yet you’re trying to build on top with fallible pieces in infallible systems. And you’re in the middle of this, John, which I’m so impressed by.

[00:18:13]

Graham-Cumming: And Jini sort of lives on as called Apache River now. It moved away from Sun and into an Apache project.

[00:18:21]

Gage: Yes, very few people seem to realize that the name Apache is a poetic phrasing of “a patchy system.” We patch everything because everything is broken. We moved a lot of it, Brian Behlendorf and the Apache group. Well, many of the innovations at Sun, Java is one, file systems that are far more secure and far more resilient than older file systems, the SPARC  implementation, I think the SPARC processor, even though you’re using the new ARM processors, but Fujitsu, I still think keeps the SPARC architecture as the world’s fastest microprocessor.  

[00:19:16]

Graham-Cumming: Right. Yes. Being British of course, ARM is a great British success. So I’m honor-bound to use that particular architecture. Clearly.

[00:19:25]

Gage: Oh, absolutely. And the power. That was the one always in a list of what our engineering goals are. We wanted to make, we were building supercomputers, we were building very large file servers for the telcos and the banks and the intelligence agencies and all these different people, but we always wanted to make a low power and it just fell off the list of what you could accomplish and the ARM chips, their ratios of wattage to packets treated are—you have a great metric on your website someplace about measuring these things at a very low level—that’s key.

[00:20:13]

Graham-Cumming: Yes, and we had Sophie Wilson, who of course is one of the founders of ARM and actually worked on the original chip, tell this wonderful story at our Internet Summit about how the first chip they hooked up was operating fine until they realized they hadn’t hooked the power up and they were asked to. It was so low power that it was able to use the power that was coming in over the logic lines to actually power the whole chip. And they said to me, wait a minute, we haven’t plugged the power in but the thing is running, which was really, I mean that was an amazing achievement to have done that.

[00:20:50]

Gage: That’s amazing. We open sourced SPARC, the instruction set, so that anybody doing crypto that also had Fab capabilities could implement detection of ones and zeroes, sheep and goats, or other kinds of algorithms that are necessary for very high speed crypto. And that’s another aspect that I’m so impressed by Cloudflare. Cloudflare is paying attention at a machine instruction level because you’re implementing with your own hardware packages in what, 180 cities? You’re moving logistically a package into Ulan Bator, or into Mombasa and you’re coming up live.

[00:21:38]

Graham-Cumming: And we need that to be inexpensive and fast because we’re promising people that we will make their Internet properties faster and secure at the same time and that’s one of the interesting challenges which is not trading those two things off. Which means your crypto better be fast, for example, and that requires a lot of fiddling around at the hardware level and understanding it. In our case because we’re using Intel, really what Intel chips are doing at the low level.

[00:22:10]

Gage: Intel did implement a couple of things in one or another of the more recent chips that were very useful for crypto. We had a group of the SPARC engineers, probably 30, at a dinner five or six months ago discussing, yes, we set the world standard for parallel execution branching optimizations for pipelines and chips, and when the overall design is not matched by an implementation that pays attention to protecting the memory, it’s a fundamental, exploitable flaw. So a lot of discussion about this. Selecting precisely which instructions are the most important, the risk analysis with the ability to make a chip specifically to implement a particular algorithm, there’s a lot more to go. We have multiples of performance ahead of us for specific algorithms based on a more fluid way to add instructions that are necessary into a specific piece of hardware. And then we jump to quantum. Oh my.

[00:23:32]

Graham-Cumming: Yes. To talk about that a little bit, the ever-increasing speed of processors and the things we can do; Do you think we actually need that given that we’re now living in this incredibly distributed world where we are actually now running very distributed algorithms and do we really need beefier machines?

[00:23:49]

Gage: At this moment, in a way, it’s you making fun of Bill Joy for only wanting a megabit in Aspen. When Steve Jobs started NeXT, sadly his hardware was just terrible, so we sent a group over to boost NeXT. In fact we sort of secretly slipped him $30 million to keep him afloat. And I’d say, “Jobs, if you really understood something about hardware, it would really be useful here.” So one of the main team members that we sent over to NeXT came to live in Aspen and ended up networking the entire valley. At a point, megabit for what you needed to do, seemed reasonable, so at this moment, as things become alive by the introduction of a little bit of intelligence in them, some little flickering chip that’s able to execute an algorithm, many tasks don’t require. If you really want to factor things fast, quantum, quantum. Which will destroy our existing crypto systems. But if you are just bringing the billions of places where a little bit of knowledge can alter locally a little bit of performance, we could do very well with the compute power that we have right now. But making it live on the network, securely, that’s the key part. The attacks that are going on, simple errors as you had yesterday, are simple errors. In a way, across Cloudflare’s network, you’re watching the challenges of the 21st century take place: attacks, obscure, unknown exploits of devices in the power and water control systems. And so, you are in exactly the right spot to not get much sleep and feel a heavy responsibility.

[00:26:20]

Graham-Cumming: Well it certainly felt like it yesterday when we were offline for 27 minutes, and that’s when we suddenly discovered, we sort of know how many customers we have, and then we really discover when they start phoning us. Our support line had his own DDoS basically where it didn’t work anymore because so many people signed in. But yes, I think that it’s interesting your point about a little bit extra on a device somewhere can do something quite magical and then you link it up to the network and you can do a lot. What we think is going on partly is some things around AI, where large amounts of machine learning are happening on big beefy machines, perhaps in the cloud, perhaps groups of machines, and then devices are doing their own little bits of inference or recognizing faces and stuff like that. And that seems to be an interesting future where we have these devices that are actually intelligent in our pockets.

[00:27:17]

Gage: Oh, I think that’s exactly right. There’s so much power in your pocket. I’m spending a lot of time trying to catch up that little bit of mathematics that you thought you understood so many years ago and it turns out, oh my, I need a little bit of work here. And I’ve been reading Michael Jordan’s papers and watching his talks and he’s the most cited computer scientist in machine learning and he will always say, “Be very careful about the use of the phrase, ‘Artificial Intelligence’.” Maybe it’s a metaphor like “The Network is the Computer.” But, we’re doing gradient descent optimization. Is the slope going up, or is the slope going down? That’s not smart. It’s useful and the real time language translation and a lot of incredible work can occur when you’re doing phrases. There’s a lot of great pattern work you can do, but he’s out in space essentially combining differentiation and integration in a form of integral. And off we go. Are your hessians rippling in the wind? And what’s the shape of this slope? And is this actually the fastest path from here to there to constantly go downhill. Maybe it’s sometimes going uphill and going over and then downhill that’s faster. So there’s just a huge amount of new mathematics coming in this territory and each time, as we move from 2G to 3G to 4G to 5G, many people don’t appreciate that the compression algorithms changed between 2G, 3G, 4G and 5G and as a result, so much more can move into your mobile device for the same amount of power. 10 or 20 times more for the same about of power. And mathematics leads to insights and applications of it. And you have a working group in that area, I think. I tried to probe around to see if you’re hiring.

[00:30:00]

Graham-Cumming: Well you could always just come around to just ask us because we’ll probably tell you because we tend to be fairly transparent. But yes, I mean compression is definitely an area where we are interested in doing things. One of the things I first worked on at Cloudflare was a thing that did differential compression based on the insight that web pages don’t actually change that much when you hit ‘refresh’. And so it turns out that if you if you compress based on the delta from the last thing you served to someone you can actually send many orders of magnitude less data and so there’s lots of interesting things you can do with that kind of insight to save a tremendous amount of bandwidth. And so yeah, definitely compression is interesting, crypto is interesting to us. We’ve actually open sourced some of our compression improvements in zlib which was very popular compression algorithm and now it’s been picked up. It turns out that in neuroscience, because there’s a tremendous amount of data which needs compression and there are pipelines used in neuroscience where actually having better compression algorithms makes you work a lot faster. So it’s fascinating to see the sort of overspill of things we’re doing into other areas where I know nothing about what goes on inside the brain.

[00:31:15]

Gage: Well isn’t that fascinating, John. I mean here you are, the CTO of Cloudflare working on a problem that deeply affects the Internet, enabling a lot more to move across the Internet in less time with less power, and suddenly it turns into a tool for brain modeling and neuroscientists. This is the benefit. There’s a terrific initiative. I’m at Berkeley. The Jupiter notebooks created by Fernando Perez, this environment in which you can write text and code and share things. That environment, taken up by machine learning. I think it’s a major change. And the implementation of diagrams that are causal. These forms of analysis of what caused what. These are useful across every discipline and for you to model traffic and see patterns emerge and find webpages and see the delta has changed and then intelligently change the pattern of traffic in response to it, it’s all pretty much the same thing here.

[00:32:53]

Graham-Cumming: Yes and then as a mathematician, when I see things that are the same thing, I can’t help wondering what the real deep structure is underneath. There must be another layer another layer down or something. So as you know it’s this thing. There’s some other deeper layer below all this stuff.  

[00:33:12]

Gage: I think this is just endlessly fascinating. So my only recommendations to Cloudflare: first, double what you’re doing. That’s so hard because as you go from 10 people to 100 people to 1,000 people to 10,000 people, it’s a different world. You are a prime example, you are global. Suddenly you’re able to deal with local authorities in 60-70 countries and deal with some of the world’s most interesting terrain and with network connectivity and moving data, surveillance, and some security of the foundation infrastructure of all countries. You couldn’t be engaged in more exciting things.

[00:34:10]

Graham-Cumming: It’s true. I mean one of the most interesting things to me is that I have grown up with the Internet when I you know I got an email address using actually the crazy JANET scheme in the UK where the DNS names were backwards. I was in Oxford and they gave me an email address and it was I think it was JGC at uk dot ac dot ox dot prg and that then at some point it flipped around and it went to DNS looked like it had won. For a long time my address was the wrong way around. I think that’s a typically British decision to be slightly different to everybody else.

[00:35:08]

Gage: Well, Oxford’s always had that style, that we’re going to do things differently. There’s an Oxford Center for the 21st century that was created by the money from a wonderful guy who had donated maybe $100 million. And they just branched out into every possible research area. But when you went to meetings, you would enter a building that was built at the time of the Raj. It was the India temple of colonialism.

[00:35:57]

Graham-Cumming: There’s quite a few of those in the UK. Are you thinking of the Martin School? James Martin. And he gave a lot of money to Oxford. Well the funny thing about that was the programming research group. The one thing they didn’t teach us really as an undergraduate was how to program which was one of the most fascinating things they have because that was a bit getting your hands dirty so you needed to let all the theory. So we learnt all the theory we did a little bit of functional programming and that was the extent of it which set me really up badly for a career in an industry. My first job I had to pretend I knew how to program and see and learn very quickly.

[00:36:42]

Gage: Oh my. Well now you’ve been writing code in Go.

[00:36:47]

Graham-Cumming: Yes. Well the thing about Go, the other Oxford thing of course is Tony Hoare, who is a professor of computer science there. He had come up with this thing called CSP (Communicating Sequential Processes) so that was a whole theory around how you do parallel execution. And so of course everybody used his formalism and I did in my doctoral thesis and so when Go came along and they said oh this how Go works, I said, well clearly that’s CSP and I know how to do this. So I can do it again.

[00:37:23]

Gage: Tony Hoare occasionally would issue a statement about something and it was always a moment. So few people seem to realize the birth of so much of what we took in the 60s, 70s, 80s, in Silicon Valley and Berkeley, derived from the Manchester Group, the virtual memory work, these innovations. Today, Whit Diffie. He used to love these Bletchley stories, they’re so far advanced. That generation has died off.

[00:38:37]

Graham-Cumming: There’s a very peculiar thing in computer science and the real application of computing which is that we both somehow sit on this great knowledge of the past of computing and at the same time we seem to willfully forget it and reinvent everything every few years. We go through these cycles where it’s like, let’s do centralized computing, now distributed computing. No, let’s have desktop PCs, now let’s have the cloud. We seem to have this collective amnesia and then on occasion people go, “Oh, Leslie Lamport wrote this thing in 1976 about this problem”. What other subject do we willfully forget the past and then have to go and doing archaeology to discover again?

[00:39:17]

Gage: As a sociological phenomenon it means that the older crowd in a company are depressing because they’ll say, “Oh we tried that and it didn’t work”. Over the years as Sun grew from 15 people or so and ended up being like 45,000 people before we were sold off to Oracle and then everybody dumped out because Oracle didn’t know too much about computing. So Ivan Sutherland, Whit Diffie. Ivan actually stayed on. He may actually still have an Oracle email. Almost all of the research groups, certainly the chip group went off to Intel, Fujitsu, Microsoft. It’s funny to think now that Microsoft’s run by a Sun person.

[00:40:19]

Graham-Cumming: Well that’s the same thing. Everyone’s forgotten that Microsoft was the evil empire not that long ago. And so now it’s not. Right now it’s cool again.

[00:40:28]

Gage: Well, all of the embedded stuff from Microsoft is still that legacy that Bill Gates who’s now doing wonderful things with the Gates Foundation. But the embedded insecurity of the global networks is due to, in large part, the insecurities, that horrible engineering of Microsoft embedded everywhere. You go anywhere in China to some old industrial facility and there is some old not updated junky PC running totally insecure software. And it’s controlling the grid. It’s discouraging. It’s like a lot of the SCADA systems.

[00:41:14]

Graham-Cumming: I’m completely terrified of SCADA systems.

[00:41:20]

Gage: The simplest exploits. I mean, it’s nothing even complicated. There are a series of emerging journalists today that are paying attention to cybersecurity and people have come out with books even very recently. Well, now because we’re in this China, US, Iran nightmare, a United States presidential directive taking the cybersecurity crowd and saying, oops, now you’re an offensive force. Which means we got some 20-year-old lieutenant somewhere who suddenly might just for fun turn off Tehran’s water supply or something. This is scary because the SCADA systems are embedded everywhere, and they’re, I don’t know, would you say totally insecure? Just the simple things, just simple exploits. One of the journalists described, I guess it was the Russians who took a bunch of small USB sticks and at a shopping center near a military base just gave them away. And people put them into their PCs inside SIPRNet, inside the secure U.S. Department of Defense network. Instantly the network was taken over just by inserting a USB device to something on the net. And there you are, John, protecting against this.

[00:43:00]

Graham-Cumming: Trying hard to protect against these things, yes absolutely. It’s very interesting because you mentioned before how rapidly Cloudflare had grown over the last few years. And of course Sun also really got going pretty rapidly, didn’t it?

[00:43:00]

Gage: Well, yes. The first year we were just some students from Berkeley, hardware from Stanford, Andy Bechtolsheim, software from Berkeley, Berkeley Unix BSD, Bill Joy. Combine the two, and 10 of us or so, and we were, I think the first year was 12 million booked, the second year was 50 or 60 million booked, and the third year was 150 or so million booked and then we hit 500 million and then we hit a billion. And now, it’s selling boxes, we were a manufacturing company so that’s different from software or services, but we also needed lots of people and so we instantly raided the immense benefit of variety of people in the San Francisco Bay Area, with Berkeley and Stanford. We had students in computer science, and mechanical engineering, and physics, and mathematics from every country in the world and we recruited from every country in the world. So a great part of Sun’s growth came, as you are, expanding internationally, and at one point I think we ran most of the telcos of the world, we ran China Mobile. 900 million subscribers on China Mobile, all Sun stuff in the back. Throughout Africa, every telco was running Sun and Cisco until Huawei knocked Cisco out. It was an amazing time.

[00:44:55]

Graham-Cumming: You ran the machine that ran Latek, that let me get my doctoral thesis done.

[00:45:01]

Gage: You know that’s how I got into it, actually. I was in econometrics and mathematics at Berkeley, and I walk down a hallway and outside a room was that funny smell from photographic paper from something, and there was perfectly typeset mathematics. Troff and nroff, all those old UNIX utilities for the Bell Technical Journal, and I open the door and I’ve got to get in there. There’s two hundred people sitting in front of these beehive-like little terminals all typing away on a UNIX system. And I want to get an account and I walk down the hall and there’s this skinny guy who types about 200 words a minute named Bill Joy. And I said, I need an account, I’ve got to type set integral signs, and he said, what’s your name. I tell him my name, John Gage, and he goes voop, and I’ve never seen anybody type as fast as him in my life. This is a new world, here.

[00:45:58]

Graham-Cumming: So he was rude then?

[00:46:01]

Gage: Yeah he was, he was. Well, it’s interesting since the arrival of a device at Berkeley to complement the arrival of an MIT professor who had implemented in LISP, mathematical, not typesetting of mathematics, but actual maxima. To get Professor fetman, maxima god from MIT, to come to Berkeley and live a UNIX environment, we had to put a LISP up outside on the PDP. So Bill took that machine which had virtual memory and implemented the environment for significant computational mathematics. And Steve Wolfram took that CalTech, and Princeton Institute for Advanced Studies, and now we have Mathematica. So in a way, all of Sun and the UNIX world derived from attempting to do executable mathematics.

[00:47:17]

Graham-Cumming: Which in some ways is what computers are doing. I think one of the things that people don’t really appreciate is the extent to which all numbers underneath.

[00:47:28]

Gage: Well that’s just this discrete versus continuous problem that Michael Jordan is attempting to address. To my current total puzzlement and complete ignorance, is what in the world is symplectic integration? And how do Lyapunov functions work? Oh, no clue.

[00:47:50]

Graham-Cumming: Are we going to do a second podcast on that? Are you going to come back and teach us?

[00:47:55]

Gage: Try it. We’re on, you’re on, you’re on. Absolutely. But you’ve got to run a company.

[00:48:00]

Graham-Cumming: Well I’ve got some things to do. Yeah. But you can go do that and come tell us about it.

[00:48:05]

Gage: All right, Great John. Well it was terrific to talk to you.

[00:48:08]

Graham-Cumming: So yes it was wonderful speaking to you as well. Thank you for helping me dig up memories of when I was first fooling around with Sun Systems and, you know, some of the early days and of course “The Network is the Computer,” I’m not sure I fully yet understand quite the metaphor or even if maybe I do somehow deeply in my soul get it, but we’re going to try and make it a reality, whatever it is.

[00:48:30]

Gage: Well, I count it as a complete success, because you count as one of our successes because you‘re doing what you’re doing, therefore the phrase, “The Network is the Computer,” resides in your brain and when you get up in the morning and decide what to do, a little bit nudges you toward making the network work.

[00:48:51]

Graham-Cumming: I think that’s probably true. And there’s the dog, the dog is saying you’ve been yakking for an hour and now we better stop. So listen, thank you so much for taking the time. It was wonderful talking to you. You have a good day. Thank you very much.


Interested in hearing more? Listen to my conversations with Ray Rothrock and Greg Papadopoulos of Sun Microsystems:

To learn more about Cloudflare Workers, check out the use cases below:

  • Optimizely – Optimizely chose Workers when updating their experimentation platform to provide faster responses from the edge and support more experiments for their customers.
  • Cordial – Cordial used a “stable of Workers” to do custom Black Friday load shedding as well as using it as a serverless platform for building scalable customer-facing services.
  • AO.com – AO.com used Workers to avoid significant code changes to their underlying platform when migrating from a legacy provider to a modern cloud backend.
  • Pwned Passwords – Troy Hunt’s popular “Have I Been Pwned” project benefits from cache hit ratios of 94% on its Pwned Passwords API due to Workers.
  • Timely – Using Workers and Workers KV, Timely was able to safely migrate application endpoints using simple value updates to a distributed key-value store.
  • Quintype – Quintype was an eager adopter of Workers to cache content they previously considered un-cacheable and improve the user experience of their publishing platform.

The Network is the Computer: A Conversation with Ray Rothrock

Post Syndicated from John Graham-Cumming original https://blog.cloudflare.com/ray-rothrock/

The Network is the Computer: A Conversation with Ray Rothrock

The Network is the Computer: A Conversation with Ray Rothrock

Last week I spoke with Ray Rothrock, former Director of CAD/CAM Marketing at Sun Microsystems, to discuss his time at Sun and how the Internet has evolved. In this conversation, Ray discusses the importance of trust as a principle, the growth of Sun in sales and marketing, and that time he gave Vice President Bush a Sun demo. Listen to our conversation here and read the full transcript below.

[00:00:07]

John Graham-Cumming: Here I am very lucky to get to talk with Ray Rothrock who was I think one of the first investors in Cloudflare, a Series A investor and got the company a little bit of money to get going, but if we dial back a few earlier years than that, he was also at Sun as the Director of CAD/CAM Marketing. There is a link between Sun and Cloudflare. At least one, but probably more than one, which is that Cloudflare has recently trademarked, “The Network is the Computer”. And that was a Sun trademark, wasn’t it?

[00:00:43]

Ray Rothrock: It was, yes.

[00:00:46]

Graham-Cumming: I talked to John Gage and I asked him about this as well and I asked him to explain to me what it meant. And I’m going to ask you the same thing because I remember walking around the Valley thinking, that sounds cool; I’m not sure I totally understand it. So perhaps you can tell me, was I right that it was cool, and what does it mean?

[00:01:06]

Rothrock: Well it certainly was cool and it was extraordinarily unique at the time. Just some quick background. In those early days when I was there, the whole concept of networking computers was brand new. Our competitor Apollo had a proprietary network but Sun chose to go with TCP/IP which was a standard at the time but a brand new standard that very few people know about right. So when we started connecting computers and doing some intensive computing which is what I was responsible for—CAD/CAM in those days was extremely intensive whether it was electrical CAD/camera, or mechanical CAD/CAM, or even simulation solid design modeling and things—having a little extra power from other computers was a big deal. And so this concept of “The Network is the Computer” essentially said that you had one window into the network through your desktop computer in those days—there was no mobile computing at that time, this was like 84’, 85’, 86’ I think. And so if you had the appropriate software you could use other people’s computers (for CPU power) and so you could do very hard problems at that single computer could not do because you could offload some of that CPU to the other computers. Now that was very nerdy, very engineering intensive, and not many people did it. We’d go to the SIGGRAPH, which was a huge graphics show in those days and we would demonstrate ten Sun computers for example, doing some graphic rendering of a 3D wireframe that had been created in the CAD/CAM software of some sort. And it was, it was hard, and that was in the mechanical side. On the electrical side, Berkeley had some software that was called Magic—it’s still around and is a very popular EDA software that’s been incorporated in those concepts. But to imagine calculating the paths in a very complicated PCB or a very complicated chip, one computer couldn’t do it, but Sun had the fundamental technology. So from my seat at Sun at the time, I had access to what could be infinite computing power, even though I had a single application running, and that was a big selling point for me when I was trying to convince EDA and MDA companies to put their software on the Sun. That was my job.

[00:03:38]

Graham-Cumming: And hearing it now, it doesn’t sound very revolutionary, because of course we’re all doing that now. I mean I get my phone out of my pocket and connect to goodness knows what computing power which does image recognition and spots faces and I can do all sorts of things. But walk me through what it felt like at the time.

[00:03:56]

Rothrock: Just doing a Google search, I mean, how many data stores are being spun up for that? At the time it was incredible, because you could actually do side by side comparisons. We created some demonstrations, where one computer might take ten hours to do a calculation, two computers might take three hours, five computers might take 30 minutes. So with this demo, you could turn on computers and we would go out on the TCP/IP network to look for an available CPU that could give me some time. Let’s go back even further. Probably 15 years before that, we had time sharing. So you had a terminal into a big mainframe and did all this swapping in and out of stuff to give you a time slice computing. We were doing the exact same thing except we were CPU slicing, not just time slicing. That’s pretty nerdy, but that’s what we did. And I had to work with the engineering department, with all these great engineers in those days, to make this work for a demo. It was so unique, you know, their eyes would get big. You remember Novell…

[00:05:37]

Graham-Cumming: I was literally just thinking about Novell because I actually worked on IPX and SPX networking stuff at the time. I was going to ask you actually, to what extent do you think TCP/IP was a very important part of this revolution?

[00:05:55]

Rothrock: It was huge. It was fundamentally huge because it was a standard, so it was available and if you implemented it, you didn’t have to pay for it. When Bob Metcalfe did Ethernet, it was on top of the TCP stack. Sun, in my memory, and I could be wrong, was the first company to put a TCP/IP stack on the computer. And so you just plugged in the back, an RJ45 into this TCP/IP network with a switch or a router on it and you were golden. They made it so simple and so cheap that you just did it. And of course if you give an engineer that kind of freedom and it opens up. By the way, as the marketing guy at Sun, this was my first non-engineering job. I came from a very technical world of nuclear physics into Sun. And so it was stunning, just stunning.

[00:06:59]

Graham-Cumming: It’s interesting that you mentioned Novell and then you mentioned Apollo before that and obviously IBM had SNA networking and there were attempts to do all those networking things. It’s interesting that these open standards have really enabled the explosion of everything else we’ve seen and with everything that’s going on in the Internet.

[00:07:23]

Rothrock: Sun was open, so to speak, but this concept of open source now that just dominates the conversation. As a venture capitalist, every deal I ever invested in had open source of some sort in it. There was a while when it was very problematic in an M&A event, but the world’s gotten used to it. So open, is very powerful. It’s like freedom. It’s like liberty. Like today, July 4th, it’s a big deal.

[00:07:52]

Graham-Cumming: Yes, absolutely. It’s just interesting to see it explode today because I spent a lot of my career looking at so many different networking protocols. The thing that really surprises me, or perhaps shouldn’t surprise me when you’ve got these open things, is that you harness so many people’s intelligence that you just end up with something that’s just better. It seems simple.

[00:08:15]

Rothrock: It seems simple. I think part of the magic of Sun is that they made it easy. Easy is the most powerful thing you can do in computing. Computing can be so nerdy and so difficult. But if you just make it easy, and Cloudflare has done a great job with that at that; they did it with their DNS service, they did it with all the stuff we worked on back when I was on the board and actively involved in the company. You’ve got to make it easy. I mean, I remember when Matthew and Lee worked like 20 hours a day on how to switch your DNS from whoever your provider was to Cloudflare. That was supposed to be one click, done. A to B. And that DNA was part of the magic. And whether we agree that Sun did it that way, to me at least, Sun did it that way as well. So it’s huge, a huge lift.

[00:09:08]

Graham-Cumming: It’s funny you talk about that because at the time, how that actually worked is that we just asked people to give us their username and password. And we logged in and did it for them. Early on, Matthew asked me if I’d be interested in joining Cloudflare when it was brand new and because of other reasons I’d moved back to the UK and I wasn’t ready to change jobs and I’d just taken another job. And I remember thinking, this thing is crazy this Cloudflare thing. Who’s going to hand over their DNS and their traffic to these four or five people above a nail salon in Palo Alto? And Matthew’s response was, “They’re giving us their passwords, let alone their traffic.” Because they were so desperate for it.

[00:09:54]

Rothrock: It tells you a lot about Matthew and you know as an attorney, I mean he was very sensitive to that and believes that one of the one of the founding principles is trust. His view was that, if I ever lose the customer’s trust, Cloudflare is toast. And so everything focused around that key value. And he was right.

[00:10:18]

Graham-Cumming: And you must have, at Sun, been involved with some high performance computing things that involved sensitive customers doing cryptography and things like that. So again trust is another theme that runs through there as well.

[00:10:33]

Rothrock: Yeah, very true. As the marketing guy of CAD/CAM, I was in the field two-thirds of the time, showing customers what was possible with them. My job was to get third party software onto the Sun box and then to turn that into a presentation to a customer. So I visited many government customers, many aerospace, power, all these very high falutin sort of behind the firewall kinds of guys in those days. So yes, trust was huge. It would come up: “Okay, so I’m using your CPU, how is it that you can’t use mine. And how do you convince me that you’ve not violated something.” In those days it was a whole different conversation that it is today but it was nonetheless just as important. In fact I remember I spent quite a bit of time at NCSA at the University of Illinois Urbana-Champaign. Larry Smarr was the head of NCSA. We spent a lot of time with Larry. I think John was there with me. John Gage and Vinod and some others but it was a big deal taking about high performance computing because that’s what they were doing and doing it with Sun.

[00:11:50]

Graham-Cumming: So just to dial forward, so you’re at Venrock and you decide to invest in Cloudflare. What was it that made you think that this was worth investing in? Presumably you saw some things that were in some of Sun’s vision. Because Sun had a very wide-ranging visions about what was going to be possible with computing.

[00:12:11]

Rothrock: Yeah. Let me sort of touch on a few points probably. Certainly Sun was my first computer company I worked for after I got out of the nuclear business and the philosophy of the company was very powerful. Not only we had this cool 19 inch black and white giant Macintosh essentially although the Mac wasn’t even born yet, but it had this ease of use that was powerful and had this open, I mean it was we preached that all the time and we made that possible. And Cloudflare—the related philosophy of Matthew and Michelle’s genius—was they wanted to make security and distribution of data as free and easy as possible for the long tail. That was the first thinking because you didn’t have access if you were in the long tail you were a small company you or you’re just going to get whipped around by the big boys. And so there was a bit of, “We’re here to help you, we’re going to do it.” It’s a good thing that the long tail get mobilized if you will or emboldened to use the Internet like the big boys do. And that was part of the attractiveness. I didn’t say, “Boy, Matthew, this sounds like Sun,” but the concept of open and liberating which is what they were trying to do with this long tail DNS and CDN stuff was very compelling and seemed easy. But nothing ever is. But they made it look easy.

[00:13:52]

Graham-Cumming: Yeah, it never is. One of the parallels that I’ve noticed is that I think early on at Sun, a lot of Sun equipment went to companies that later became big companies. So some of these small firms that were using crazy work stations ended up becoming some of the big names in the Valley. To your point about the long tail, they were being ignored and couldn’t buy from IBM even if they wanted to.

[00:14:25]

Rothrock: They couldn’t afford SNA and they couldn’t do lots of things. So Sun was an enabler for these companies with cool ideas for products and software to use Sun as the underpinning. workstations were all the rage, because PCs were very limited in those days. Very very limited, they were all Intel based. Sun was 68000-based originally and then it was their own stuff, SPARC. You know in the beginning it was a cheap microprocessor from Motorola.

[00:15:04]

Graham-Cumming: What was the growth like at Sun? Because it was very fast, right?

[00:15:09]

Rothrock: Oh yes, it was extraordinarily fast. I think I was employee 130 or something like that. I left Sun in 1986 to go to business school and they gave me a leave of absence. Carol Bartz was my boss at that moment. The company was like at 2000 people just two and a half years later. So it was growing like a weed. I measured my success by how thick the catalyst—that was our catalog name and our program—how thick and how quickly I could add bonafide software developers to our catalog. We published on one sheet of paper front to back. When I first got there, our catalyst catalogue was a sheet of paper, and when I left, it was a book. It was about three-quarters of an inch thick. My group grew from me to 30 people in about a year and a half. It was extraordinary growth. We went public during that time, had a lot of capital and a lot of buzz. That openness, that our competition was all proprietary just like you were citing there, John. IBM and Apollo were all proprietary networks. You could buy a NIC card and stick it into your PC and talk to a Sun. And vise versa. And you couldn’t do that with IBM or Apollo. Do you remember those?

[00:16:48]

Graham-Cumming: I do because I was talking to John Gage. In my first job out of college, I wrote a TCP/IP stack from scratch, for a manufacturer of network cards. The test of this stack was I had an HP Apollo box and I had a Sun workstation and there was a sort of magical, can I talk to these devices? And can I ping them? And then that was already magical the first ping as it went across the network. And then, can I Telnet to one of these? So you know, getting the networking actually running was sort of the key thing. How important was networking for Sun in the early days? Was it always there?

[00:17:35]

Rothrock: Yeah, it was there from the beginning, the idea of having a network capability. When I got there it was network; the machine wasn’t standalone at all. We sort of mimicked the mainframe world where we had green screens hooked into a Sun in a department for example. And there was time sharing. But as soon as you got a Sun on your desk, which was rare because we were shipping as many as we could build, it was fantastic. I was sharing information with engineering and we were working back and forth on stuff. But I think it was fundamental: you have a microprocessor, you’ve got a big screen, you’ve got a graphic UI, and you have a network that hooks into the greater universe. In those days, to send an all-Sun email around the world, modems spun up everywhere. The network wasn’t what it is now.

[00:18:35]

Graham-Cumming: I remember in about 89’, I was at a conference and Whit Diffie was there. I asked him what he was doing. He was in a little computer room. I was trying to typeset something. And he said, “I’m telnetting into a machine which is in San Diego.” It was the first time I’d seen this and I stepped over and he was like, “look at this.” And he’s hitting the keyboard and the keys are getting echoed back. And I thought, oh my goodness, this is incredible. It’s right across the Atlantic and across the country as well.

[00:19:10]

Rothrock: I think, and this is just me talking having lived the last years and with all the investing and stuff I did, but you know it enabled the Internet to come about, the TCP/IP standard. You may recall that Microsoft tried to modify the TCP/IP stack slightly, and the world rejected it, because it was just too powerful, too pervasive. And then along comes HTTP and all the other protocols that followed. Telnetting, FTPing, all that file transfer stuff, we were doing that left, right, and center back in the 80s. I mean you know Cloudflare just took all this stuff and made it better, easier, and literally lower friction. That was the core investment thesis at the time and it just exploded. Much like when Sun adopted TCP/IP, it just exploded. You were there when it happened. My little company that I’m the CEO of now, we use Cloudflare services. First thing I did when I got there was switched to Cloudflare.

[00:20:18]

Graham-Cumming: And that was one of the things when I joined, we really wanted people get to a point where if you’re putting something on the web, you just say, well I’m going to put Cloudflare or a thing like Cloudflare just on it. Because it protects it, it makes it faster, etc. And of course now what we’ve done is we’ve given people compute facility. Right now you can write code and run it in our in our machines worldwide which is another whole thing.

[00:20:43]

Rothrock: And that is “The Network is the Computer”. The other thing that Sun was pitching then was a paperless office. I remember we had posters of paper flying out of a computer window on a Sun workstation and I don’t think we’ve gotten there yet. But certainly, the network is the computer.

[00:21:04]

Graham-Cumming: It was probably the case that the paperless office was one of those things that was about to happen for quite a long time.

[00:21:14]

Rothrock: It’s still about to happen if you ask me. I think e-commerce and the sort of the digital transformation has driven it harder than just networking. You know, the fact that we can now sign legal documents over the Internet without paper and things like that. People had to adopt. People have to trust. People have to adopt these standards and accept them. And lo and behold we are because we made it easy, we made it cheap, and we made it trustworthy.

[00:21:42]

Graham-Cumming: If you dial back through Sun, what was the hardest thing? I’m asking because I’m at a 1,000-person company and it feels hard some days, so I’m curious. What do I need to start worrying about?

[00:22:03]

Rothrock: Well yeah, at 1,000 people, I think that’s when John came into the company and sort of organized marketing. I would say, holding engineering to schedules; that was hard. That was hard because we were pushing the envelope our graphics was going from black and white to color. The networking stuff the performance of all the chips into the boards and just the performance was a big deal. And I remember, for me personally, I would go to a trade show. I’d go to Boston to the Association of Mechanical Engineers with the team there and would show up at these workstations and of course the engineers want to show off the latest. So I would be bringing with me tapes that we had of the latest operating system. But getting the engineers to be ready for a tradeshow was very hard because they were always experimenting. I don’t believe the word “code freeze” meant much to them, frankly, but we would we would be downloading the software and building a trade show thing that had to run for three days on the latest and greatest and we knew our competitor would be there right across the aisle from us sort of showing their hot stuff. And working with Eric Schmidt in those days, you know, Eric you just got to be done on this date. But trade shows were wonderful. They focused the company’s endpoints if you will. And marketing and sales drove Sun; Scott McNealy’s culture there was big on that. But we had to show. It’s different today than it was then, I don’t know about the Cloudflare competition, but back then, there were a dozen workstation companies and we were fighting for mindshare and market share every day. So you didn’t dare sort of leave your best jewels at home. You brought them with you. I will give John Gage high, high marks. He showed me how to dance through a reboot in case the code crashed and he’s marvelous and I learned how to work that stuff and to survive.

[00:24:25]

Rothrock: Can I tell you one sort of sales story?  

[00:24:28]

Graham-Cumming: Yes, I’m very interested in hearing the non-technical stories. As an engineer, I can hear engineering stories all the time, but I’m curious what it was like being in sales and marketing in such an engineering heavy company as Sun.

[00:24:48]

Rothrock: Yeah. Well it was challenging of course. One of the strategies that Sun had in those days was to get anyone who was building their own computer. This was Computer Vision and Data General and all those guys to adopt the Sun as their hardware platform and then they could put on whatever they wanted. So because I was one of the demo gods, my job was to go along with the sales guys when they wanted to try to convince somebody. So one of the companies we went after was Data General (DG) in Massachusetts. And so I worked for weeks on getting this whole demo suite running MDA, EDA, word processing, I had everything. And this was a big, big, big deal. And I mean like hundreds of millions of dollars of revenue. And so I went out a couple of days early and we were going to put up a bunch of Suns and I had a demo room at DG. So all the gear showed up and I got there at like 5:30 in the morning and started downloading everything, downloading software, making it dance. And at about 8:00 a.m. in the morning the CEO of Data General walks in. I didn’t know who he was but it turned out to be Ed de Castro. And he introduces himself and I didn’t know who he was and he said, “What are you doing?” And I explained, “I’m from Sun, I’m getting ready for a big demo. We’ve got a big executive presentation. Mr. McNealy will be here shortly, etc.” And he said, “Well, show me what you’ve got.” So I’m sort of still in the middle of downloading this software and I start making this thing dance. I’ve got these machines talking to each other and showing all kinds of cool stuff. And he left. And the meeting was about 10 or 11 in the morning. And so when the executive team from Sun showed up they said, “Well, how’s it going?” I said, “Well I gave a demo to a guy,” and they asked, “Who’s the guy,” and I said, “It was Ed de Castro.” And they went, “Oh my God, that was the CEO.” Well, we got the deal. I thought Ed had a little tactic there to come in early, see what he could see, maybe get the true skinny on this thing and see what’s real. I carried the day. But anyway, I got a nice little bonus for that. But Vinod and I would drop into Lockheed down in Southern California. They wanted to put Suns on P-3 airplanes and we’d go down there with an engineer and we’d figure out how to make it. Those were just incredible times. You may remember back in the 80s everyone dressed up except on Fridays. It was dress-down Fridays. And one day I dressed down and Carol Bartz, my boss, saw me wearing blue jeans and just an open collared shirt and she said, “Rothrock, you go home and put on a suit! You never know when a customer is going to walk in the front door.” She was quite right. Kodak shows up. Kodak made a big investment in Sun when it was still private. And I gave that demo and then AT&T, and then interestingly Vice President Bush back in the Reagan administration came to Sun to see the manufacturing and I gave the demo to the Vice President with Scott and Andy and Bill and Vinod standing there.

[00:28:15]

Graham-Cumming: Do you remember what he saw?

[00:28:18]

Rothrock: It was my standard two minute Sun demo that I can give in my sleep. We were on the manufacturing floor. We picked up a machine and I created a demo for it and my executive team was there. We have a picture of it somewhere, but it was fun. As John Gage would say, he’d say, “Ray, your job is to make the computer dance.” So I did.  

[00:28:44]

Graham-Cumming: And one of the other things I wanted to ask you about is at some point Sun was almost Amazon Web Services, wasn’t it. There was a rent-a-computer service, right?  

[00:28:53]

Rothrock: I don’t know. I don’t remember the rent-a-computer service. I remember we went after the PC business aggressively and went after the data centers which were brand new in those days pretty aggressively, but I don’t remember the rent-a-computer business that much. It wasn’t in my domain.

[00:29:14]

Graham-Cumming: So what are you up to these days?

[00:29:18]

Rothrock: I’m still investing. I do a lot of security investing. I did 15 deals while I was at Venrock. Cloudflare was the last one I did, which turned out really well of course. More to come, I hope. And I’m CEO of one of Venrock’s portfolio companies that had a little trouble a few years back but I fixed that and it’s moving up nicely now. But I’ve started thinking about more of a science base. I’m on the board of the Carnegie Institute of Science. I’m on the board of MIT and I just joined the board of the Nuclear Threat Initiative in Washington which is run by Secretary Ernie Moniz, former secretary of energy. So I’m doing stuff like that. John would be pleased with how well that played through. But I’ll tell you it is this these fundamental principles, just tying it all back to Sun and Cloudflare, and this sort of open, cheap, easy, enabling humans to do things without too much friction, that is exciting. I mean, look at your phone. Steve Jobs was the master of design to make this thing as sweet as it is.

[00:30:37]

Graham-Cumming: Yes, and as addictive.

[00:30:39]

Rothrock: Absolutely, right. I haven’t been to a presentation from Cloudflare in two years, but every time I see an announcement like the DNS service, I immediately switched all my DNS here at the house to 1.1.1.1. Stuff like that. Because I know it’s good and I know it’s trustworthy, and it’s got that philosophy built in the DNA.

[00:31:09]

Graham-Cumming: Yes definitely. Taking it back to what we talked about at the beginning, it’s definitely the trustworthiness is something that Cloudflare has cared about from the beginning and continues to care about. We’re sort of the guardians of the traffic that passes through it.

[00:31:25]

Rothrock: Back when the Internet started happening and when Sun was doing Java, I mean, all those things in the 90s, I was of course at Venrock, but I was still pretty connected to [Edward] Zander and [Scott] McNealy. We were hoping that it would be liberating, that it would create a world which was much more free and open to conversation and we’ve seen the dark side of some of that. But I continue to believe that transparency and openness is a good thing and we should never shut it down. I don’t mean to get it all waxing philosophical here but way more good comes from being open and transparent than bad.

[00:32:07]

Graham-Cumming: Listen it’s July 4th. It’s evening here in London. We can be waxing philosophical as much as we like. Well listen, thank you for taking the time to chat with me. Are there any other reminiscences of Sun that you think the public needs to know in this oral history of “The Network is the Computer.”

[00:32:28]

Rothrock: Well you know the only thing I’d say is having landed in the Silicon Valley in 1981 and getting on with Sun, I can say this given my age and longevity here, everything is built on somebody else’s great ideas. And starting with TCP/IP and then we went to this HTML protocol and browsers, it’s just layer on layer on layer on layer and so Cloudflare is just one of the latest to climb on the shoulders of those giants who put it all together. I mean, we don’t even think about the physical network anymore. But it is there and thank goodness companies like Cloudflare keep providing that fundamental service on which we can build interesting, cool, exciting, and mind-changing things. And without a Cloudflare, without Sun, without Apollo, without all those guys back in the day, it would be different. The world would just be so, so different. I did the New York Times crossword puzzle. I could not do it without Google because I have access to information I would not have unless I went to the library. It’s exponential and it just gets better. Thanks to Michelle and Matthew and Lee for starting Cloudflare and allowing Venrock to invest in it.

[00:34:01]

Graham-Cumming: Well thank you for being an investor. I mean, it helped us get off the ground and get things moving. I very much agree with you about the standing on the shoulders of giants because people don’t appreciate the extent to which so much of this fundamental work that we did was done in the 70s and 80s.

[00:34:19]

Rothrock: Yea, it’s just like the automobile and the airplane. We reminisce about the history but boy, there were a lot of giants in those industries as well. And computing is just the latest.

[00:34:32]

Graham-Cumming: Yep, absolutely. Well, Ray, thank you. Have a good afternoon.


Interested in hearing more? Listen to my conversations with John Gage and Greg Papadopoulos of Sun Microsystems:

To learn more about Cloudflare Workers, check out the use cases below:

  • Optimizely – Optimizely chose Workers when updating their experimentation platform to provide faster responses from the edge and support more experiments for their customers.
  • Cordial – Cordial used a “stable of Workers” to do custom Black Friday load shedding as well as using it as a serverless platform for building scalable customer-facing services.
  • AO.com – AO.com used Workers to avoid significant code changes to their underlying platform when migrating from a legacy provider to a modern cloud backend.
  • Pwned Passwords – Troy Hunt’s popular “Have I Been Pwned” project benefits from cache hit ratios of 94% on its Pwned Passwords API due to Workers.
  • Timely – Using Workers and Workers KV, Timely was able to safely migrate application endpoints using simple value updates to a distributed key-value store. Quintype – Quintype was an eager adopter of Workers to cache content they previously considered un-cacheable and improve the user experience of their publishing platform.

The Network is the Computer: A Conversation with Greg Papadopoulos

Post Syndicated from John Graham-Cumming original https://blog.cloudflare.com/greg-papadopoulos/

The Network is the Computer: A Conversation with Greg Papadopoulos

The Network is the Computer: A Conversation with Greg Papadopoulos

I spoke with Greg Papadopoulos, former CTO of Sun Microsystems, to discuss the origins and meaning of The Network is the Computer®, as well as Cloudflare’s role in the evolution of the phrase. During our conversation, we considered the inevitability of latency, the slowness of the speed of light, and the future of Cloudflare’s newly acquired trademark. Listen to our conversation here and read the full transcript below.


[00:00:08]

John Graham-Cumming: Thank you so much for taking the time to chat with me. I’ve got Greg Papadopoulos who was CTO of Sun and is currently a venture capitalist. Tell us about “The Network is the Computer.”

[00:00:22]

Greg Papadopoulos: Well, from certainly a Sun perspective, the very first Sun-1 was connected via Internet protocols and at that time there was a big war about what should win from a networking point of view. And there was a dedication there that everything that we made was going to interoperate on the network over open standards, and from day one in the company, it was always that thought. It’s really about the collection of these machines and how they interact with one another, and of course that puts the network in the middle of it. And then it becomes hard to, you know, where’s the line? But it is one of those things that I think even if you ask most people at Sun, you go, “Okay explain to me ‘The Network is the Computer.’” It would get rather meta. People would see that phrase and sort of react to it in their own way. But it would always come back to something similar to what I had said I think in the earlier days.

[00:01:37]

Graham-Cumming: I remember it very well because it was obviously plastered everywhere in Silicon Valley for a while. And it sounded incredibly cool but I was never quite sure what it meant. It sounded like it was one of those things that was super deep but I couldn’t dig deep enough. But it sort of seems like this whole vision has come true because if you dial back to I think it’s 2006, you wrote a blog post about how the world was only going to need five or seven or some small number of computers. And that was also linked to this as well, wasn’t it?

[00:02:05]

Papadopoulos: Yeah, I think as things began to evolve into what we would call cloud computing today, but that you could put substantial resources on the other side of the network and from the end user’s perspective and those could be as effective or more effective than something you’d have in front of you. And so this idea that you really could provide these larger scale computing services in early days — you know, grid was the term used before cloud — but if you follow that logic, and you watch what was happening to the improvements of the network. Dave Patterson at Cal was very fond of saying in that era and in the 90s, networks are getting to the place where the desk connected to another machine is transparent to you. I mean it could be your own, in fact, somebody else’s memory may in fact be closer to you than your own disk. And that’s a pretty interesting thought. And so where we ended up going was really a complete realization that these things we would call servers were actually just components of this network computer. And so it was very mysterious, “The Network is the Computer,” and it actually grew into itself in this way. And I’ll say looking at Cloudflare, you see this next level of scale happening. It’s not just, what are those things that you build inside a data center, how do you connect to it, but in fact, it’s the network that is the computer that is the network.

[00:04:26]

Graham-Cumming: It’s interesting though that there have been these waves of centralization and then push the computing power to the edge and the PCs at some point and then Larry Ellison came along and he was going to have this networked computer thing, and it sort of seems to swing back and forth, so where do you think we are in this swinging?

[00:04:44]

Papadopoulos: You know, I don’t think so much swinging. I think it’s a spiral upwards and we come to a place and we look down and it looks familiar. You know, where you’ll say, oh I see, here’s a 3270 connected to a mainframe. Well, that looks like a browser connected to a web server. And you know, here’s the device, it’s connected to the web service. And they look similar but there are some very important differences as we’re traversing this helix of sorts. And if you look back, for example the 3270, it was inextricably bound to a single server that was hosted. And now our devices have really the ability to connect to any other computer on the network. And so then I think we’re seeing something that looks like a pendulum there, it’s really a refactoring question on what software belongs where and how hard is it to maintain where it is, and naturally I think that the Internet protocol clearly is a peer to peer protocol, so it doesn’t take sides on this. And so that we end up in one state, with more on the client or less on the client. I think it really has to do with how well we’ve figured out distributed computing and how well we can deliver code in a management-free way. And that’s a longer conversation.

[00:06:35]

Graham-Cumming: Well, it’s an interesting conversation. One thing is what you talked about with Sun Grid which then we end up with Amazon Web Services and things like that, is that there was sort of the device, be it your handheld or your laptop talking to some cloud computing, and then what Cloudflare has done with this Workers product to say, well, actually I think there’s three places where code could exist. There’s something you can put inside the network.

[00:07:02]

Papadopoulos: Yes. And by extension that could grow to another layer too. And it goes back to, I think it’s Dave Clark who I first remember saying you can get all the bandwidth you want, that’s money, but you can’t reduce latency. That’s God, right. And so I think there are certainly things and as I see the Workers architecture, there are two things going on. There’s clearly something to be said about latency there, and having distributed points of presence and getting closer to the clients. And there’s IBM with interaction there too, but it is also something that is around management of software and how we should be thinking in delivery of applications, which ultimately I believe, in the limit, become more distributed-looking than they are now. It’s just that it’s really hard to write distributed applications in kind of the general way we think about it.

[00:08:18]

Graham-Cumming: Yes, that’s one of these things isn’t it, it is exceedingly hard to actually write these things which is why I think we’re going through a bit of a transition right now where people are trying to figure out where that code should actually execute and what should execute where.

[00:08:31]

Papadopoulos: Yeah. You had graciously pointed out this blog from a dozen years ago on, hey this is inevitable that we’re going to have this concentration of computing, for a lot of economic reasons as anything else. But it’s both a hammer and a nail. You know, cloud stuff in some ways is unnatural in that why should we expect computing to get concentrated like it is. If you really look into it more deeply, I think it has to do with management and control and capital cycles and really things that are kind of on the economic and the administrative side of things, are not about what’s truth and beauty and the destination for where applications should be.

[00:09:27]

Graham-Cumming: And I think you also see some companies are now starting to wrestle with the economics of the cloud where they realize that they are kind of locked into their cloud provider and are paying rent kind of thing; it becomes entirely economic at that point.

[00:09:41]

Papadopoulos: Well it does, and you know, this was also something I was pretty vocal about, although I got misinterpreted for a while there as being, you know, anti-cloud or something which I’m not, I think I’m pragmatic about it. One of the dangers is certainly as people yield particularly to SaaS products, that in fact, your data in many ways, unless you have explicit contracts and abilities to disgorge that data from that service, that data becomes more and more captive. And that’s the part that I think is actually the real question here, which is like, what’s the switching cost from one service to another, from one cloud to another.

[00:10:35]

Graham-Cumming: Yes, absolutely. That’s one of the things that we faced, one of the reasons why we worked on this thing called the Bandwidth Alliance, which is one of the ways in which stuff gets locked into clouds is the egress fee is so large that you don’t want to get your data out.

[00:10:50]

Papadopoulos: Exactly. And then there is always the, you know, well we have these particular features in our particular cloud that are very seductive to developers and you write to them and it’s kind of hard to undo, you know, just the physics of moving things around. So what you all have been doing there is I think necessary and quite progressive. But we can do more.

[00:11:17]

Graham-Cumming: Yes definitely. Just to go back to the thought about latency and bandwidth, I have a jokey pair of slides where I show the average broadband network you can buy over time and it going up, and then the change in the speed of light over the same period, which of course is entirely flat, zero progress in the speed of light. Looking back through your biography, you wrote thinking machines and I assume that fighting latency at a much shorter distance of cabling must have been interesting in those machines because of the speeds at which they were operating.

[00:11:54]

Papadopoulos: Yes, it surprises most people when you say it, but you know, computer architects complain that the speed of light is really slow. And you know, Grace Hopper who is really one of the founders, the pioneers of modern programming languages and COBOL. I think she was a vice admiral. And she would walk around with a wire that was a foot long and say, “this is a nanosecond”. And that seemed pretty short for a while but, you know a nanosecond is an eternity these days.

[00:12:40]

Graham-Cumming: Yes, it’s an eternity. People don’t quite appreciate it if they’re not thinking about it, how long it is. I had someone who was new to the computing world learning about it, come to me with a book which was talking about fiber optics, and in the book it said there is a laser that flashes on and off a billion times a second to send data down the fiber optic. And he came to me and said, “This can’t possibly be true; it’s just too fast.”

[00:13:09]

Papadopoulos: No, it’s too slow!

[00:013:12]

Graham-Cumming: Right? And I thought, well that’s slow. And then I stepped back and thought, you know, to the average person, that is a ridiculous statement, that somehow we humans have managed to control time at this ridiculously small level. And then we keep pushing and pushing and pushing it and people don’t appreciate how fast and actually how slow the light is, really.

[00:13:33]

Papadopoulos: Yeah. And I think if it actually comes down to it, if you want to get into a very pure reckoning of this is latency is the only thing that matters. And one can look at bandwidth as a component of latency, so you can see bandwidth as a serialization delay and that kind of goes back to Clark thing, you know that, yeah I can buy that, I can’t bribe God on the other side so you know I’m fundamentally left with this problem that we have. Thank you, Albert Einstein, right? It’s kind of hopeless to think about sending information faster than that.

[00:14:09]

Graham-Cumming: Yeah exactly. There’s information limits, which is driving why we have such powerful phones, because in fact the latency to the human is very low if you have it in your hand.

[00:14:23]

Papadopoulos: Yes, absolutely. This is where the edge architecture and the Worker structure that you guys are working on, and I think that’s where it becomes really interesting too because it gives me — you talked about earlier, well we’re now introducing this new tier — but it gives me a really closer place from a latency point of view to have some intimate relationship with a device, and at the same time be well-connected to the network.

[00:14:55]

Graham-Cumming: Right. And I think the other thing that is interesting about that is that your device fundamentally is an insecure thing, so you know if you put code on that thing, you can’t put secrets in it, like a cryptographic secrets, because the end user has access to them. Normally you would keep that in the server somewhere, but then the other funny thing is if you have this intermediary tier which is both secure and low latency to the end user, you suddenly have a different world in which you can put secrets, you can put code that is privileged, but it can interact with the user very very rapidly because the low latency.

[00:15:30]

Papadopoulos: Yeah. And that essence of where’s my trust domain. Now I’ve seen all kinds of like, oh my gosh, I cannot believe somebody is doing it, like putting their S3 credentials, putting it down on a device and having it talk, you know, the log in for a database or something. You must be kidding. I mean that trust proxy point at low latency is a really key thing.

[00:16:02]

Graham-Cumming: Yes, I think it’s just people need to start thinking about that architecture. Is there a sort of parallel with things that were going on with very high-performance computing with sort of the massively parallel stuff and what’s happening today? What lessons can we take from work done in the 70s and 80s and apply it to the Internet of today?

[00:16:24]

Papadopoulos: Well, we talked about this sort of, there are a couple of fundamental issues here. And one we’ve been speaking about is latency. The other one is synchronization, and this comes up in a bunch of different ways. You know, whether it’s when one looks at the cap theorem kinds of things that Eric Brewer has been famous for, can I get consistency and availability and survive partitionability, all that, at the same time. And so you end up in this kind of place of—goes back to maybe Einstein a bit—but you know, knowing when things have happened and when state has been actually changed or committed is a pretty profound problem.

[00:17:15]

Graham-Cumming: It is, and what order things have happened.

[00:17:18]

Papadopoulos: Yes. And that order is going to be relative to an observer here as well. And so if you’re insisting on some total ordering then you’re insisting on slowing things down as well. And that really is fundamental. We were pushing into that in the massively parallel stuff and you’ll see that at Internet scale. You know there’s another thing, if I could. This is one of my greatest “aha”s about networks and it’s due to a fellow at Sun, Rob Gingell, who actually ended up being chief engineer at Sun and was one of the real pioneers of the software development framework that brought Solaris forward. But Rob would talk about this thing that I label as network entropy. It’s basically what happens when you connect systems to networks, what do networks kind of do to those systems? And this is a little bit of a philosophical question; it’s not a physical one. And Rob observed that over time networks have this property of wanting to decompose things into constituent parts, have those parts get specialized and then reintegrated. And so let me make that less abstract. So in the early days of connecting systems to networks, one of the natural observations were, well why don’t we take the storage out of those desktop systems or server systems and put them on the other side of at least a local network and into a file server or storage server. And so you could see that computer sort of get pulled apart between its computing and its storage piece. And then that storage piece, you know in Rob’s step, that would go on and get specialized. So we had whole companies start like Network Appliances, Pure Storage, EMC. And so, you know like big pieces of industry or look the original routers were RADb you know running on workstations and you know Cisco went and took that and made that into something and so you now see this effect happen at the next scale. One of the things that really got me excited when I first saw Cloudflare a decade ago was, wow okay in those early days, well we can take a component like a network firewall and that can get pulled away and created as its own network entity and specialized. And I think one of the things, at least from my history of Cloudflare, one of the most profound things was, particularly as you guys went in and separated off these functions early on, the fear of people was this is going to introduce latency, and in fact things got faster. Figure that.

[00:20:51]

Graham-Cumming: Part of that of course is caching and then there’s dealing with the speed of light by being close to people. But also if you say your company makes things faster and you do all these different things including security, you are forced to optimize the whole thing to live up to the claim. Whereas if you try and chain things together, nobody’s really responsible for that overall latency budget. It becomes natural that you have to do it.

[00:21:18]

Papadopoulos: Yes. And you all have done it brilliantly, you know, to sort of Gingell’s view. Okay so this piece got decomposed and now specialized, meaning optimized like heck, because that’s what you do. And so you can see that over and over again and you see it in terms of even Twilio or something. You know, here’s a messaging service. I’m just pulling my applications apart, letting people specialize. But the final piece, and this is really the punchline. The final piece is, Rob will talk about it, the value is in the reintegration of it. And so you know what are those unifying forces that are creating, if you will, the operating system for “The Network is the Computer.” You were asking about the massively parallel scale. Well, we had an operating system we wrote for this. As you get up to the higher scale, you get into these more distributed circumstances where the complexity goes up by some important number of orders of magnitude, and now what’s that reintegration? And so I come back and I look at what Cloudflare is doing here. You’re entering into that phase now of actually being that re-integrator, almost that operating system for the computer that is the network.

[00:23:06]

Graham-Cumming: I think that’s right. We often talk about actually being an operating system on the Internet, so very similar kind of thoughts.

[00:23:14]

Papadopoulos: Yes. And you know as we were talking earlier about how developers make sense of this pendulum or cycle or whatever it is. Having this idea of an operating system or of a place where I can have ground truths and trust and sort of fixed points in this are terribly important.

[00:23:44]

Graham-Cumming: Absolutely. So do you have any final thoughts on, what, it must be 30 years on from when “The Network is the Computer” was a Sun trademark. Now it’s a Cloudflare trademark. What’s the future going to look of that slogan and who’s going to trademark it in 30 years time now?

[00:24:03]

Papadopoulos: Well, it could be interplanetary at that point.

[00:24:13]

Graham-Cumming: Well, if you talk about the latency problems of going interplanetary, we definitely have to solve the latency.

[00:24:18]

Papadopoulos: Yeah. People do understand that. They go, wow it’s like seven minutes within here and Mars, hitting close approach.

[00:24:28]

Graham-Cumming: The earthly equivalent of that is New Zealand. If you speak to people from New Zealand and they come on holiday to Europe or they move to the US, they suddenly say that the Internet works so much better here. And it’s just that it’s closer. Now the Australians have figured this out because Australia is actually drifting northwards so they’re actually going to get within. That’s going to fix it for them but New Zealand is stuck.

[00:24:56]

Papadopoulos: I do ask my physicist friends for one of two things. You know, either give me a faster speed of light — so far they have not delivered — or another dimension I can cut through. Maybe we’ll keep working on the latter.

[00:25:16]

Graham-Cumming: All right. Well listen Greg, thank you for the conversation. Thank you for thinking about this stuff many many years ago. I think we’re getting there slowly on some of this work. And yeah, good talking to you.

[00:25:27]

Papadopoulos: Well, you too. And thank you for carrying the torch forward. I think everyone from Sun who listens to this, and John, and everybody should feel really proud about what part they played in the evolution of this great invention.

[00:25:48]

Graham-Cumming: It’s certainly the case that a tremendous amount of work was done at Sun that was really fundamental and, you know, perhaps some of that was ahead of its time but here we are.

[00:25:57]

Papadopoulos: Thank you.

[00:25:58]

Graham-Cumming: Thank you very much.

[00:25:59]

Papadopoulos: Cheers.


Interested in hearing more? Listen to my conversations with John Gage and Ray Rothrock of Sun Microsystems:

To learn more about Cloudflare Workers, check out the use cases below:

  • Optimizely – Optimizely chose Workers when updating their experimentation platform to provide faster responses from the edge and support more experiments for their customers.
  • Cordial – Cordial used a “stable of Workers” to do custom Black Friday load shedding as well as using it as a serverless platform for building scalable customer-facing services.
  • AO.com – AO.com used Workers to avoid significant code changes to their underlying platform when migrating from a legacy provider to a modern cloud backend.
  • Pwned Passwords – Troy Hunt’s popular “Have I Been Pwned” project benefits from cache hit ratios of 94% on its Pwned Passwords API due to Workers.
  • Timely – Using Workers and Workers KV, Timely was able to safely migrate application endpoints using simple value updates to a distributed key-value store.
  • Quintype – Quintype was an eager adopter of Workers to cache content they previously considered un-cacheable and improve the user experience of their publishing platform.

The Network is the Computer

Post Syndicated from John Graham-Cumming original https://blog.cloudflare.com/the-network-is-the-computer/

The Network is the Computer

The Network is the Computer

We recently registered the trademark for The Network is the Computer®, to encompass how Cloudflare is utilizing its network to pave the way for the future of the Internet.

The phrase was first coined in 1984 by John Gage, the 21st employee of Sun Microsystems, where he was credited with building Sun’s vision around “The Network is the Computer.” When Sun was acquired in 2010, the trademark was not renewed, but the vision remained.

Take it from him:

“When we built Sun Microsystems, every computer we made had the network at its core. But we could only imagine, over thirty years ago, today’s billions of networked devices, from the smallest camera or light bulb to the largest supercomputer, sharing their packets across Cloudflare’s distributed global network.

We based our vision of an interconnected world on open and shared standards. Cloudflare extends this dedication to new levels by openly sharing designs for security and resilience in the post-quantum computer world.

Most importantly, Cloudflare is committed to immediate, open, transparent accountability for network performance. I’m a dedicated reader of their technical blog, as the network becomes central to our security infrastructure and the global economy, demanding even more powerful technical innovation.”

Cloudflare’s massive network, which spans more than 180 cities in 80 countries, enables the company to deliver its suite of security, performance, and reliability products, including its serverless edge computing offerings.

In March of 2018, we launched our serverless solution Cloudflare Workers, to allow anyone to deploy code at the edge of our network. We also recently announced advancements to Cloudflare Workers in June of 2019 to give application developers the ability to do away with cloud regions, VMs, servers, containers, load balancers—all they need to do is write the code, and we do the rest. With each of Cloudflare’s data centers acting as a highly scalable application origin to which users are automatically routed via our Anycast network, code is run within milliseconds of users worldwide.

In honor of registering Sun’s former trademark, I spoke with John Gage, Greg Papadopoulos, former CTO of Sun Microsystems, and Ray Rothrock, former Director of CAD/CAM Marketing at Sun Microsystems, to learn more about the history of the phrase and what it means for the future:

To learn more about Cloudflare Workers, check out the use cases below:

  • Optimizely – Optimizely chose Workers when updating their experimentation platform to provide faster responses from the edge and support more experiments for their customers.
  • Cordial – Cordial used a “stable of Workers” to do custom Black Friday load shedding as well as using it as a serverless platform for building scalable customer-facing services.
  • AO.com – AO.com used Workers to avoid significant code changes to their underlying platform when migrating from a legacy provider to a modern cloud backend.
  • Pwned Passwords – Troy Hunt’s popular “Have I Been Pwned” project benefits from cache hit ratios of 94% on its Pwned Passwords API due to Workers.
  • Timely – Using Workers and Workers KV, Timely was able to safely migrate application endpoints using simple value updates to a distributed key-value store.
  • Quintype – Quintype was an eager adopter of Workers to cache content they previously considered un-cacheable and improve the user experience of their publishing platform.

Amazon Aurora PostgreSQL Serverless – Now Generally Available

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/amazon-aurora-postgresql-serverless-now-generally-available/

The database is usually the most critical part of a software architecture and managing databases, especially relational ones, has never been easy. For this reason, we created Amazon Aurora Serverless, an auto-scaling version of Amazon Aurora that automatically starts up, shuts down and scales up or down based on your application workload.

The MySQL-compatible edition of Aurora Serverless has been available for some time now. I am pleased to announce that the PostgreSQL-compatible edition of Aurora Serverless is generally available today.

Before moving on with details, I take the opportunity to congratulate the Amazon Aurora development team that has just won the 2019 Association for Computing Machinery’s (ACM) Special Interest Group on Management of Data (SIGMOD) Systems Award!

When you create a database with Aurora Serverless, you set the minimum and maximum capacity. Your client applications transparently connect to a proxy fleet that routes the workload to a pool of resources that are automatically scaled. Scaling is very fast because resources are “warm” and ready to be added to serve your requests.

 

There is no change with Aurora Serverless on how storage is managed by Aurora. The storage layer is independent from the compute resources used by the database. There is no need to provision storage in advance. The minimum storage is 10GB and, based on the database usage, the Amazon Aurora storage will automatically grow, up to 64 TB, in 10GB increments with no impact to database performance.

Creating an Aurora Serverless PostgreSQL Database
Let’s start an Aurora Serverless PostgreSQL database and see the automatic scalability at work. From the Amazon RDS console, I select to create a database using Amazon Aurora as engine. Currently, Aurora serverless is compatible with PostgreSQL version 10.5. Selecting that version, the serverless option becomes available.

I give the new DB cluster an identifier, choose my master username, and let Amazon RDS generate a password for me. I will be able to retrieve my credentials during database creation.

I can now select the minimum and maximum capacity for my database, in terms of Aurora Capacity Units (ACUs), and in the additional scaling configuration I choose to pause compute capacity after 5 minutes of inactivity. Based on my settings, Aurora Serverless automatically creates scaling rules for thresholds for CPU utilization, connections, and available memory.

Testing Some Load on the Database
To generate some load on the database I am using sysbench on an EC2 instance. There are a couple of Lua scripts bundled with sysbench that can help generate an online transaction processing (OLTP) workload:

  • The first script, parallel_prepare.lua, generates 100,000 rows per table for 24 tables.
  • The second script, oltp.lua, generates workload against those data using 64 worker threads.

By using those scripts, I start generating load on my database cluster. As you can see from this graph, taken from the RDS console monitoring tab, the serverless database capacity grows and shrinks to follow my requirements. The metric shown on this graph is the number of ACUs used by the database cluster. First it scales up to accommodate the sysbench workload. When I stop the load generator, it scales down and then pauses.

Available Now
Aurora Serverless PostgreSQL is available now in US East (N. Virginia), US East (Ohio), US West (Oregon), EU (Ireland), and Asia Pacific (Tokyo). With Aurora Serverless, you pay on a per-second basis for the database capacity you use when the database is active, plus the usual Aurora storage costs.

For more information on Amazon Aurora, I recommend this great post explaining why and how it was created:

Amazon Aurora ascendant: How we designed a cloud-native relational database

It’s never been so easy to use a relational database in production. I am so excited to see what you are going to use it for!

Best Practices for Developing on AWS Lambda

Post Syndicated from George Mao original https://aws.amazon.com/blogs/architecture/best-practices-for-developing-on-aws-lambda/

In our previous post we discussed the various ways you can invoke AWS Lambda functions. In this post, we’ll provide some tips and best practices you can use when building your AWS Lambda functions.

One of the benefits of using Lambda, is that you don’t have to worry about server and infrastructure management. This means AWS will handle the heavy lifting needed to execute your Lambda functions. You should take advantage of this architecture with the tips below.

Tip #1: When to VPC-Enable a Lambda Function

Lambda functions always operate from an AWS-owned VPC. By default, your function has full ability to make network requests to any public internet address — this includes access to any of the public AWS APIs. For example, your function can interact with AWS DynamoDB APIs to PutItem or Query for records. You should only enable your functions for VPC access when you need to interact with a private resource located in a private subnet. An RDS instance is a good example.

RDS instance: When to VPC enable a Lambda function

Once your function is VPC-enabled, all network traffic from your function is subject to the routing rules of your VPC/Subnet. If your function needs to interact with a public resource, you will need a route through a NAT gateway in a public subnet.

Tip #2: Deploy Common Code to a Lambda Layer (i.e. the AWS SDK)

If you intend to reuse code in more than one function, consider creating a Layer and deploying it there. A great candidate would be a logging package that your team is required to standardize on. Another great example is the AWS SDK. AWS will include the AWS SDK for NodeJS and Python functions (and update the SDK periodically). However, you should bundle your own SDK and pin your functions to a version of the SDK you have tested.

Tip #3: Watch Your Package Size and Dependencies

Lambda functions require you to package all needed dependencies (or attach a Layer) — the bigger your deployment package, the slower your function will cold-start. Remove all unnecessary items, such as documentation and unused libraries. If you are using Java functions with the AWS SDK, only bundle the module(s) that you actually need to use — not the entire SDK.

Good:

<dependency>
    <groupId>software.amazon.awssdk</groupId>
    <artifactId>dynamodb</artifactId>
    <version>2.6.0</version>
</dependency>

Bad:

<!-- https://mvnrepository.com/artifact/software.amazon.awssdk/aws-sdk-java -->
<dependency>
    <groupId>software.amazon.awssdk</groupId>
    <artifactId>aws-sdk-java</artifactId>
    <version>2.6.0</version>
</dependency>

Tip #4: Monitor Your Concurrency (and Set Alarms)

Our first post in this series talked about how concurrency can effect your down stream systems. Since Lambda functions can scale extremely quickly, this means you should have controls in place to notify you when you have a spike in concurrency. A good idea is to deploy a CloudWatch Alarm that notifies your team when function metrics such as ConcurrentExecutions or Invocations exceeds your threshold. You should create an AWS Budget so you can monitor costs on a daily basis. Here is a great example of how to set up automated cost controls.

Tip #5: Over-Provision Memory (in some use cases) but Not Function Timeout

Lambda allocates compute power in proportion to the memory you allocate to your function. This means you can over provision memory to run your functions faster and potentially reduce your costs. You should benchmark your use case to determine where the breakeven point is for running faster and using more memory vs running slower and using less memory.

However, we recommend you do not over provision your function time out settings. Always understand your code performance and set a function time out accordingly. Overprovisioning function timeout often results in Lambda functions running longer than expected and unexpected costs.

About the Author

George MaoGeorge Mao is a Specialist Solutions Architect at Amazon Web Services, focused on the Serverless platform. George is responsible for helping customers design and operate Serverless applications using services like Lambda, API Gateway, Cognito, and DynamoDB. He is a regular speaker at AWS Summits, re:Invent, and various tech events. George is a software engineer and enjoys contributing to open source projects, delivering technical presentations at technology events, and working with customers to design their applications in the Cloud. George holds a Bachelor of Computer Science and Masters of IT from Virginia Tech.

Understanding the Different Ways to Invoke Lambda Functions

Post Syndicated from George Mao original https://aws.amazon.com/blogs/architecture/understanding-the-different-ways-to-invoke-lambda-functions/

In our first post, we talked about general design patterns to enable massive scale with serverless applications. In this post, we’ll review the different ways you can invoke Lambda functions and what you should be aware of with each invocation model.

Synchronous Invokes

Synchronous invocations are the most straight forward way to invoke your Lambda functions. In this model, your functions execute immediately when you perform the Lambda Invoke API call. This can be accomplished through a variety of options, including using the CLI or any of the supported SDKs.

Here is an example of a synchronous invoke using the CLI:

aws lambda invoke —function-name MyLambdaFunction —invocation-type RequestResponse —payload  “[JSON string here]”

The Invocation-type flag specifies a value of “RequestResponse”. This instructs AWS to execute your Lambda function and wait for the function to complete. When you perform a synchronous invoke, you are responsible for checking the response and determining if there was an error and if you should retry the invoke.

Many AWS services can emit events that trigger Lambda functions. Here is a list of services that invoke Lambda functions synchronously:

Asynchronous Invokes

Here is an example of an asynchronous invoke using the CLI:

aws lambda invoke —function-name MyLambdaFunction —invocation-type Event —payload  “[JSON string here]”

Notice, the Invocation-type flag specifies “Event.” If your function returns an error, AWS will automatically retry the invoke twice, for a total of three invocations.

Here is a list of services that invoke Lambda functions asynchronously:

Asynchronous invokes place your invoke request in Lambda service queue and we process the requests as they arrive. You should use AWS X-Ray to review how long your request spent in the service queue by checking the “dwell time” segment.

Poll based Invokes

This invocation model is designed to allow you to integrate with AWS Stream and Queue based services with no code or server management. Lambda will poll the following services on your behalf, retrieve records, and invoke your functions. The following are supported services:

AWS will manage the poller on your behalf and perform Synchronous invokes of your function with this type of integration. The retry behavior for this model is based on data expiration in the data source. For example, Kinesis Data streams store records for 24 hours by default (up to 168 hours). The specific details of each integration are linked above.

Conclusion

In our next post, we’ll provide some tips and best practices for developing Lambda functions. Happy coding!

 

About the Author

George MaoGeorge Mao is a Specialist Solutions Architect at Amazon Web Services, focused on the Serverless platform. George is responsible for helping customers design and operate Serverless applications using services like Lambda, API Gateway, Cognito, and DynamoDB. He is a regular speaker at AWS Summits, re:Invent, and various tech events. George is a software engineer and enjoys contributing to open source projects, delivering technical presentations at technology events, and working with customers to design their applications in the Cloud. George holds a Bachelor of Computer Science and Masters of IT from Virginia Tech.

Configuring user creation workflows with AWS Step Functions and AWS Managed Microsoft AD logs

Post Syndicated from Rachel Richardson original https://aws.amazon.com/blogs/compute/configuring-user-creation-workflows-with-aws-step-functions-and-aws-managed-microsoft-ad-logs/

This post is contributed by Taka Matsumoto, Cloud Support Engineer

AWS Directory Service lets you run Microsoft Active Directory as a managed service. Directory Service for Microsoft Active Directory, also referred to as AWS Managed Microsoft AD, is powered by Microsoft Windows Server 2012 R2. It manages users and makes it easy to integrate with compatible AWS services and other applications. Using the log forwarding feature, you can stay aware of all security events in Amazon CloudWatch Logs. This helps monitor events like the addition of a new user.

When new users are created in your AWS Managed Microsoft AD, you might go through the initial setup workflow manually. However, AWS Step Functions can coordinate new user creation activities into serverless workflows that automate the process. With Step Functions, AWS Lambda can be also used to run code for the automation workflows without provisioning or managing servers.

In this post, I show how to create and trigger a new user creation workflow in Step Functions. This workflow creates a WorkSpace in Amazon WorkSpaces and a user in Amazon Connect using AWS Managed Microsoft AD, Step Functions, Lambda, and Amazon CloudWatch Logs.

Overview

The following diagram shows the solution graphically.

Configuring user creation workflows with AWS Step Functions and AWS Managed Microsoft AD logs

Walkthrough

Using the following procedures, create an automated user creation workflow with AWS Managed Microsoft AD. The solution requires the creation of new resources in CloudWatch, Lambda, and Step Functions, and a new user in Amazon WorkSpaces and Amazon Connect. Here’s the list of steps:

  1. Enable log forwarding.
  2. Create the Lambda functions.
  3. Set up log streaming.
  4. Create a state machine in Step Functions.
  5. Test the solution.

Requirements

To follow along, you need the following resources:

  • AWS Managed Microsoft AD
    • Must be registered with Amazon WorkSpaces
    • Must be registered with Amazon Connect

In this example, you use an Amazon Connect instance with SAML 2.0-based authentication as identity management. For more information, see Configure SAML for Identity Management in Amazon Connect.

Enable log forwarding

Enable log forwarding for your AWS Managed Microsoft AD.  Use /aws/directoryservice/<directory id> for the CloudWatch log group name. You will use this log group name when creating a Log Streaming in Step 3.

Create Lambda functions

Create two Lambda functions. The first starts a Step Functions execution with CloudWatch Logs. The second performs a user registration process with Amazon WorkSpaces and Amazon Connect within a Step Functions execution.

Create the first function with the following settings:

  • Name: DS-Log-Stream-Function
  • Runtime: Python 3.7
  • Memory: 128 MB
  • Timeout: 3 seconds
  • Environment variables:
    • Key: stateMachineArn
    • Value: arn:aws:states:<Region>:<AccountId>:stateMachine:NewUserWorkFlow
  • IAM role with the following permissions:
    • AWSLambdaBasicExecutionRole
    • The following permissions policy
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "states:StartExecution",
            "Resource": "*"
        }
    ]
}
import base64
import boto3
import gzip
import json
import re
import os
def lambda_handler(event, context):
    logEvents = DecodeCWPayload(event)
    print('Event payload:', logEvents)
    returnResultDict = []
    
    # Because there can be more than one message pushed in a single payload, use a for loop to start a workflow for every user
    for logevent in logEvents:
        logMessage = logevent['message']
        upnMessage =  re.search("(<Data Name='UserPrincipalName'>)(.*?)(<\/Data>)",logMessage)
        if upnMessage != None:
            upn = upnMessage.group(2).lower()
            userNameAndDomain = upn.split('@')
            userName = userNameAndDomain[0].lower()
            userNameAndDomain = upn.split('@')
            domainName = userNameAndDomain[1].lower()
            sfnInputDict = {'Username': userName, 'UPN': upn, 'DomainName': domainName}
            sfnResponse = StartSFNExecution(json.dumps(sfnInputDict))
            print('Username:',upn)
            print('Execution ARN:', sfnResponse['executionArn'])
            print('Execution start time:', sfnResponse['startDate'])
            returnResultDict.append({'Username': upn, 'ExectionArn': sfnResponse['executionArn'], 'Time': str(sfnResponse['startDate'])})

    returnObject = {'Result':returnResultDict}
    return {
        'statusCode': 200,
        'body': json.dumps(returnObject)
    }

# Helper function decode the payload
def DecodeCWPayload(payload):
    # CloudWatch Log Stream event 
    cloudWatchLog = payload['awslogs']['data']
    # Base 64 decode the log 
    base64DecodedValue = base64.b64decode(cloudWatchLog)
    # Uncompress the gzipped decoded value
    gunzipValue = gzip.decompress(base64DecodedValue)
    dictPayload = json.loads(gunzipValue)
    decodedLogEvents = dictPayload['logEvents']
    return decodedLogEvents

# Step Functions state machine execution function
def StartSFNExecution(sfnInput):
    sfnClient = boto3.client('stepfunctions')
    try:
        response = sfnClient.start_execution(
            stateMachineArn=os.environ['stateMachineArn'],
            input=sfnInput
        )
        return response
    except Exception as e:
        return e

For the other function used to perform a user creation task, use the following settings:

  • Name: SFN-New-User-Flow
  • Runtime: Python 3.7
  • Memory: 128 MB
  • Timeout: 3 seconds
  • Environment variables:
    • Key: nameDelimiter
    • Value: . [period]

This delimiter is used to split the username into a first name and last name, as Amazon Connect instances with SAML-based authentication require both a first name and last name for users. For more information, see CreateUser API and UserIdentity Info.

  • Key: bundleId
  • Value: <WorkSpaces bundle ID>

Run the following AWS CLI command to return Amazon-owned WorkSpaces bundles. Use one of the bundle IDs for the key-value pair.

aws workspaces describe-workspace-bundles –owner AMAZON

  • Key: directoryId
  • Value: <WorkSpaces directory ID>

Run the following AWS CLI command to return Amazon WorkSpaces directories. Use your directory ID for the key-value pair.

aws workspaces describe-workspace-directories

  • Key: instanceId
  • Value: <Amazon Connect instance ID>

Find the Amazon Connect instance ID the Amazon Connect instance ID.

  • Key: routingProfile
  • Value: <Amazon Connect routing profile>

Run the following AWS CLI command to list routing profiles with their IDs. For this walkthrough, use the ID for the basic routing profile.

aws connect list-routing-profiles –instance-id <instance id>

  • Key: securityProfile
  • Value: <Amazon Connect security profile>

Run the following AWS CLI command to list security profiles with their IDs. For this walkthrough, use the ID for an agent security profile.

aws connect list-security-profiles –instance-id  <instance id>

  • IAM role permissions:
    • AWSLambdaBasicExecutionRole

The following permissions policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "connect:CreateUser",
                "workspaces:CreateWorkspaces"
            ],
            "Resource": "*"
        }
    ]
}
import json
import os
import boto3

def lambda_handler(event, context):
    userName = event['input']['User']
    nameDelimiter = os.environ['nameDelimiter']
    if nameDelimiter in userName:
        firstName = userName.split(nameDelimiter)[0]
        lastName = userName.split(nameDelimiter)[1]
    else:
        firstName = userName
        lastName = userName
    domainName = event['input']['Domain']
    upn = event['input']['UPN']
    serviceName = event['input']['Service']
    if serviceName == 'WorkSpaces':
        # Setting WorkSpaces variables
        workspacesDirectoryId = os.environ['directoryId']
        workspacesUsername = upn
        workspacesBundleId = os.environ['bundleId']
        createNewWorkSpace = create_new_workspace(
            directoryId=workspacesDirectoryId,
            username=workspacesUsername,
            bundleId=workspacesBundleId
        )
        return createNewWorkSpace
    elif serviceName == 'Connect':
        createConnectUser = create_connect_user(
            connectUsername=upn,
            connectFirstName=firstName,
            connectLastName=lastName,
            securityProfile=os.environ['securityProfile'], 
            routingProfile=os.environ['routingProfile'], 
            instanceId=os.environ['instanceId']
        )
        return createConnectUser
    else:
        print(serviceName, 'is not recognized...')
        print('Available service names are WorkSpaces and Connect')
        unknownServiceException = {
            'statusCode': 500,
            'body': json.dumps(f'Service name, {serviceName}, is not recognized')}
        raise Exception(unknownServiceException)

class FailedWorkSpaceCreationException(Exception):
    pass

class WorkSpaceResourceExists(Exception):
    pass

def create_new_workspace(directoryId, username, bundleId):
    workspacesClient = boto3.client('workspaces')
    response = workspacesClient.create_workspaces(
        Workspaces=[{
                'DirectoryId': directoryId,
                'UserName': username,
                'BundleId': bundleId,
                'WorkspaceProperties': {
                    'RunningMode': 'AUTO_STOP',
                    'RunningModeAutoStopTimeoutInMinutes': 60,
                    'RootVolumeSizeGib': 80,
                    'UserVolumeSizeGib': 100,
                    'ComputeTypeName': 'VALUE'
                    }}]
                    )
    print('create_workspaces response:',response)
    for pendingRequest in response['PendingRequests']:
        if pendingRequest['UserName'] == username:
            workspacesResultObject = {'UserName':username, 'ServiceName':'WorkSpaces', 'Status': 'Success'}
            return {
                'statusCode': 200,
                'body': json.dumps(workspacesResultObject)
                }
    for failedRequest in response['FailedRequests']:
        if failedRequest['WorkspaceRequest']['UserName'] == username:
            errorCode = failedRequest['ErrorCode']
            errorMessage = failedRequest['ErrorMessage']
            errorResponse = {'Error Code:', errorCode, 'Error Message:', errorMessage}
            if errorCode == "ResourceExists.WorkSpace": 
                raise WorkSpaceResourceExists(str(errorResponse))
            else:
                raise FailedWorkSpaceCreationException(str(errorResponse))
                
def create_connect_user(connectUsername, connectFirstName,connectLastName,securityProfile,routingProfile,instanceId):
    connectClient = boto3.client('connect')
    response = connectClient.create_user(
                    Username=connectUsername,
                    IdentityInfo={
                        'FirstName': connectFirstName,
                        'LastName': connectLastName
                        },
                    PhoneConfig={
                        'PhoneType': 'SOFT_PHONE',
                        'AutoAccept': False,
                        },
                    SecurityProfileIds=[
                        securityProfile,
                        ],
                    RoutingProfileId=routingProfile,
                    InstanceId = instanceId
                    )
    connectSuccessResultObject = {'UserName':connectUsername,'ServiceName':'Connect','FirstName': connectFirstName, 'LastName': connectLastName,'Status': 'Success'}
    return {
        'statusCode': 200,
        'body': json.dumps(connectSuccessResultObject)
        }

Set up log streaming

Create a new CloudWatch Logs subscription filter that sends log data to the Lambda function DS-Log-Stream-Function created in Step 2.

  1. In the CloudWatch console, choose Logs, Log Groups, and select the log group, /aws/directoryservice/<directory id>, for the directory set up in Step 1.
  2. Choose Actions, Stream to AWS Lambda.
  3. Choose Destination, and select the Lambda function DS-Log-Stream-Function.
  4. For Log format, choose Other as the log format and enter “<EventID>4720</EventID>” (include the double quotes).
  5. Choose Start streaming.

If there is an existing subscription filter for the log group, run the following AWS CLI command to create a subscription filter for the Lambda function, DS-Log-Stream-Function.

aws logs put-subscription-filter \

--log-group-name /aws/directoryservice/<directoryid> \

--filter-name NewUser \

--filter-pattern "<EventID>4720</EventID>" \

--destination-arn arn:aws:lambda:<Region>:<ACCOUNT_NUMBER>:function:DS-Log-Stream-Function

For more information, see Using CloudWatch Logs Subscription Filters.

Create a state machine in Step Functions

The next step is to create a state machine in Step Functions. This state machine runs the Lambda function, SFN-New-User-Flow, to create a user in Amazon WorkSpaces and Amazon Connect.

Define the state machine, using the following settings:

  • Name: NewUserWorkFlow
  • State machine definition: Copy the following state machine definition:
{
    "Comment": "An example state machine for a new user creation workflow",
    "StartAt": "Parallel",
    "States": {
        "Parallel": {
            "Type": "Parallel",
            "End": true,
            "Branches": [
                {
                    "StartAt": "CreateWorkSpace",
                    "States": {
                        "CreateWorkSpace": {
                            "Type": "Task",
                            "Parameters": {
                                "input": {
                                    "User.$": "$.Username",
                                    "UPN.$": "$.UPN",
                                    "Domain.$": "$.DomainName",
                                    "Service": "WorkSpaces"
                                }
                            },
                            "Resource": "arn:aws:lambda:{region}:{account id}:function:SFN-New-User-Flow",
                            "Retry": [
                                {
                                    "ErrorEquals": [
                                        "WorkSpaceResourceExists"
                                    ],
                                    "IntervalSeconds": 1,
                                    "MaxAttempts": 0,
                                    "BackoffRate": 1
                                },
                                {
                                    "ErrorEquals": [
                                        "States.ALL"
                                    ],
                                    "IntervalSeconds": 10,
                                    "MaxAttempts": 2,
                                    "BackoffRate": 2
                                }
                            ],
                            "Catch": [
                                {
                                    "ErrorEquals": [
                                        "WorkSpaceResourceExists"
                                    ],
                                    "ResultPath": "$.workspacesResult",
                                    "Next": "WorkSpacesPassState"
                                },
                                {
                                    "ErrorEquals": [
                                        "States.ALL"
                                    ],
                                    "ResultPath": "$.workspacesResult",
                                    "Next": "WorkSpacesPassState"
                                }
                            ],
                            "End": true
                        },
                        "WorkSpacesPassState": {
                            "Type": "Pass",
                            "Parameters": {
                                "Result.$": "$.workspacesResult"
                            },
                            "End": true
                        }
                    }
                },
                {
                    "StartAt": "CreateConnectUser",
                    "States": {
                        "CreateConnectUser": {
                            "Type": "Task",
                            "Parameters": {
                                "input": {
                                    "User.$": "$.Username",
                                    "UPN.$": "$.UPN",
                                    "Domain.$": "$.DomainName",
                                    "Service": "Connect"
                                }
                            },
                            "Resource": "arn:aws:lambda:{region}:{account id}:function:SFN-New-User-Flow",
                            "Retry": [
                                {
                                    "ErrorEquals": [
                                        "DuplicateResourceException"
                                    ],
                                    "IntervalSeconds": 1,
                                    "MaxAttempts": 0,
                                    "BackoffRate": 1
                                },
                                {
                                    "ErrorEquals": [
                                        "States.ALL"
                                    ],
                                    "IntervalSeconds": 10,
                                    "MaxAttempts": 2,
                                    "BackoffRate": 2
                                }
                            ],
                            "Catch": [
                                {
                                    "ErrorEquals": [
                                        "DuplicateResourceException"
                                    ],
                                    "ResultPath": "$.connectResult",
                                    "Next": "ConnectPassState"
                                },
                                {
                                    "ErrorEquals": [
                                        "States.ALL"
                                    ],
                                    "ResultPath": "$.connectResult",
                                    "Next": "ConnectPassState"
                                }
                            ],
                            "End": true,
                            "ResultPath": "$.connectResult"
                        },
                        "ConnectPassState": {
                            "Type": "Pass",
                            "Parameters": {
                                "Result.$": "$.connectResult"
                            },
                            "End": true
                        }
                    }
                }
            ]
        }
    }
}

After entering the name and state machine definition, choose Next.

Configure the settings by choosing Create an IAM role for me. This creates an IAM role for the state machine to run the Lambda function SFN-New-User-Flow.

Here’s the list of states in the NewUserWorkFlow state machine definition:

  • Start—When the state machine starts, it creates a parallel state to start both the CreateWorkSpace and CreateConnectUser states.
  • CreateWorkSpace—This task state runs the SFN-New-User-Flow Lambda function to create a new WorkSpace for the user. If this is successful, it goes to the End state.
  • WorkSpacesPassState—This pass state returns the result from the CreateWorkSpace state.
  • CreateConnectUse — This task state runs the SFN-New-User-Flow Lambda function to create a user in Amazon Connect. If this is successful, it goes to the End state.
  • ConnectPassState—This pass state returns the result from the CreateWorkSpace state.
  • End

The following diagram shows how these states relate to each other.

Step Functions State Machine

Test the solution

It’s time to test the solution. Create a user in AWS Managed Microsoft AD. The new user should have the following attributes:

This starts a new state machine execution in Step Functions. Here’s the flow:

  1. When there is a user creation event (Event ID: 4720) in the AWS Managed Microsoft AD security log, CloudWatch invokes the Lambda function, DS-Log-Stream-Function, to start a new state machine execution in Step Functions.
  2. To create a new WorkSpace and create a user in the Amazon Connect instance, the state machine execution runs tasks to invoke the other Lambda function, SFN-New-User-Flow.

Conclusion

This solution automates the initial user registration workflow. Step Functions provides the flexibility to customize the workflow to meet your needs. This walkthrough included Amazon WorkSpaces and Amazon Connect; both services are used to register the new user. For organizations that create a number of new users on a regular basis, this new user automation workflow can save time when configuring resources for a new user.

The event source of the automation workflow can be any event that triggers the new user workflow, so the event source isn’t limited to CloudWatch Logs. Also, the integrated service used for new user registration can be any AWS service that offers API and works with AWS Managed Microsoft AD. Other programmatically accessible services within or outside AWS can also fill that role.

In this post, I showed you how serverless workflows can streamline and coordinate user creation activities. Step Functions provides this functionality, with the help of Lambda, Amazon WorkSpaces, AWS Managed Microsoft AD, and Amazon Connect. Together, these services offer increased power and functionality when managing users, monitoring security, and integrating with compatible AWS services.

Increasing real-time stream processing performance with Amazon Kinesis Data Streams enhanced fan-out and AWS Lambda

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/increasing-real-time-stream-processing-performance-with-amazon-kinesis-data-streams-enhanced-fan-out-and-aws-lambda/

Live business data and real-time analytics are critical to informed decision-making and customer service. For example, streaming services like Netflix process billions of traffic flows each day to help you binge-watch your favorite shows. And consumer audio specialists like Sonos monitor a billion events per week to improve listener experiences. These data-savvy businesses collect and analyze massive amounts of real-time data every day.

Kinesis Data Streams overview

To help ingest real-time data or streaming data at large scales, AWS customers turn to Amazon Kinesis Data Streams. Kinesis Data Streams can continuously capture gigabytes of data per second from hundreds of thousands of sources. The data collected is available in milliseconds, enabling real-time analytics.

To provide this massively scalable throughput, Kinesis Data Streams relies on shards, which are units of throughput and represent a parallelism. One shard provides an ingest throughput of 1 MB/second or 1000 records/second. A shard also has an outbound throughput of 2 MB/sec. As you ingest more data, Kinesis Data Streams can add more shards. Customers often ingest thousands of shards in a single stream.

Enhanced fan-out

One of the main advantages of stream processing is that you can attach multiple unique applications, each consuming data from the same Kinesis data stream. For example, one application can aggregate the records in the data stream, batch them, and write the batch to S3 for long-term retention. Another application can enrich the records and write them into an Amazon DynamoDB table. At the same time, a third application can filter the stream and write a subset of the data into a different Kinesis data stream.

Before the adoption of enhanced fan-out technology, users consumed data from a Kinesis data stream with multiple AWS Lambda functions sharing the same 2 MB/second outbound throughput. Due to shared bandwidth constraints, no more than two or three functions could efficiently connect to the data stream at a time, as shown in the following diagram.

Default Method

To achieve greater outbound throughput across multiple applications, you could spread data ingestion across multiple data streams. So, a developer seeking to achieve 10 GB/second of outbound throughput to support five separate applications might resort to math like the following table:

StreamShardsInputOutput
111000 records/second
or 1 MB/second
2 MB/second
22500 ea.5,000,000 records/second
or 5000 MB/second
10,000 MB/second
or 10 GB/second

Due to the practical limitation of two to three applications per stream, you must have at least two streams to support five individual applications. You could attach three applications to the first stream and two applications to the second. However, diverting data into two separate streams adds complexity.

In August of 2018, Kinesis Data Streams announced a solution: support for enhanced fan-out and HTTP/2 for faster streaming. The enhanced fan-out method is an option that you can use for consuming Kinesis data streams at a higher capacity. The enhanced capacity enables you to achieve higher outbound throughput without having to provision more streams or shards in the same stream.

When using the enhanced fan-out option, first create a Kinesis data stream consumer. A consumer is an isolated connection to the stream that provides a 2 MB/second outbound throughput. A Kinesis data stream can support up to five consumers, providing a combined outbound throughput capacity of 10MB/second/shard. As the stream scales dynamically by adding shards, so does the amount of throughput scale through the consumers.

Consider again the requirement of 10 GB of output capacity—but rerun your math using enhanced fan-out.

StreamShardsInputConsumersOutput
111000 records/second
or 1 MB/second
510 MB/second
110001,000,000 records
or 1,000 MB/second
510,000 MB/second
or 10 GB/second

Enhanced fan-out with Lambda functions

Just before re:Invent 2018, AWS Lambda announced support for enhanced fan-out and HTTP/2. Lambda functions can now be triggered using the enhanced fan-out pattern to reduce latency. These improvements increase the amount of real-time data that can be processed in serverless applications, as seen in the following diagram.

Enhanced Fan-Out Method

In addition to using the enhanced fan-out option, you can still attach Lambda functions to the stream using the GetRecords API, as before. You can attach up to five consumers with Lambda functions at 2 MB/second outbound throughput capacity and another two or three Lambda functions sharing a single 2 MB/second outbound throughput capacity. Thus, enhanced fan-out enables you to support up to eight Lambda functions, simultaneously.

HTTP/2

The streaming technology in HTTP/2 increases the output ability of Kinesis data streams. In addition, it allows data delivery from producers to consumers in 70 milliseconds or better (a 65% improvement) in typical scenarios. These new features enable you to build faster, more reactive, highly parallel, and latency-sensitive applications on top of Kinesis Data Streams.

Comparing methods

To demonstrate the advantage of Kinesis Data Streams enhanced fan-out, I built an application with a single shard stream. It has three Lambda functions connected using the standard method and three Lambda functions connected using enhanced fan-out for consumers. I created roughly 76 KB of dummy data and inserted it into the stream at 1,000 records per second. After four seconds, I stopped the process, leaving a total of 4,000 records to be processed.

As seen in the following diagram, each of the enhanced fan-out functions processed the 4000 records in under 2 seconds, averaging at 1,852 MS each. Interestingly, the standard method got a jumpstart in the first function, processing 4,000 records in 1,732 MS. However, because of the shared resources, the other two functions took longer to process the data, at just over 2.5 seconds.

Comparison of Methods

By Kinesis Data Streams standards, 4000 records is a small dataset. But when processing millions of records in real time, the latency between standard and enhanced fan-out becomes much more significant.

Cost

When using Kinesis Data Streams, a company incurs an hourly cost of $0.015 per shard and a PUT fee of $0.014 per one million units. You can purchase enhanced fan-out for a consumer-shard per hour fee of $0.015 and $0.013 per GB data retrieval fee. These fees are for the us-east-1 Region only. To see a full list of prices, see Kinesis Data Streams pricing.

Show me the code

To demonstrate the use of Kinesis Data Streams enhanced fan-out with Lambda functions, I built a simple application. It ingests simulated IoT sensor data and stores it in an Amazon DynamoDB table as well as in an Amazon S3 bucket for later use. I could have conceivably done this in a single Lambda function. However, to keep things simple, I broke it into two separate functions.

Deploying the application

I built the Kinesis-Enhanced-Fan-Out-to-DDB-S3 application and made it available through the AWS Serverless Application Repository.

Deploy the application in your AWS account. The application is only available in the us-east-1 Region.

On the deployment status page, you can monitor the resources being deployed, including policies and capabilities.

Deployment Status

After all the resources deploy, you should see a green banner.

Exploring the application

Take a moment to examine the list of deployed resources. The two Lambda functions, DDBFunction and S3Function, receive data and write to DynamoDB and S3, respectively. Additionally, two roles have been created to allow the functions access to their respective targets.

There are also two consumers, DDBConsumer and S3Consumer, which provide isolated output at 2 MB/second throughput. Each consumer is connected to the KinesisStream stream and triggers the Lambda functions when data occurs.

Also, there is a DynamoDB table called DBRecords and an S3 bucket called S3Records.

Finally, there is a stream consumption app, as shown in the following diagram.

Application Example

Testing the application

Now that you have your application installed, test it by putting data into the Kinesis data stream.

There are several ways to do this. You can build your producer using the Kinesis Producer Library (KPL), or you could create an app that uses the AWS SDK to input data. However, there is an easier way that suits your purposes for this post: the Amazon Kinesis Data Generator. The easiest way to use this tool is to use the hosted generator and follow the setup instructions.

After you have the generator configured, you should have a custom URL to generate data for your Kinesis data stream. In your configuration steps, you created a username and password. Log in to the generator using those credentials.

When you are logged in, you can generate data for your stream test.

  1. For Region, choose us-east-1.
  2. For Stream/delivery stream, select your stream. It should start with serverlessrepo.
  3. For Records per second, keep the default value of 100.
  4. On the Template 1 tab, name the template Sensor1.
  5. Use the following template:
    {
        "sensorId": {{random.number(50)}},
        "currentTemperature": {{random.number(
            {
                "min":10,
                "max":150
            }
        )}},
        "status": "{{random.arrayElement(
            ["OK","FAIL","WARN"]
        )}}"
    }
  6. Choose Send Data.
  7. After several seconds, choose Stop Sending Data.

At this point, if all went according to plan, you should see data in both your DynamoDB table and S3 bucket. Use the following steps to verify that your enhanced fan-out process worked.

  1. On the Lambda console, choose Applications.
  2. Select the application that starts with serverlessrepo-.
  3. Choose Resources, DDBFunction. This opens the DynamoDB console.
  4. Choose Items.

The following screenshot shows the first 100 items that your database absorbed from the DDBFunction attached to KinesisStream through DDBConsumer.

DynamoDB Records

Next, check your S3 bucket,

  1. On the Lambda console, choose Applications.
  2. Select the application that starts with serverlessrepo-.
  3. Choose Resources, S3Records. This opens the S3 console.

As in DynamoDB, you should now see the fake IoT sensor data stored in your S3 bucket for later use.

S3 Records

Now that the demonstration is working, I want to point out the benefits of what you have just done. By using the enhanced fan-out method, you have increased your performance in the following ways.

  1. HTTP/2 has decreased the time from data producers to consumers to <=70 MS, a 65% improvement.
  2. At the consumer level, each consumer has an isolated 2 MB/second outbound throughput speed. Because you are using two consumers, it works out to 2x the performance.

Conclusion

Using Lambda functions in concert with Kinesis Data Streams to collect and analyze massive amounts of data isn’t a new idea. However, the introduction of enhanced fan-out technology and HTTP/2 enables you to use more functions at the same time without losing throughput capacity.

If you only connect one or two Lambda functions to a data stream, then enhanced fan-out might not be a great fit. However, if you attach more than three Lambda functions to a stream for real-time manipulation and data routing, it makes sense to evaluate this option.

I hope this helps. Happy coding!

How to Design Your Serverless Apps for Massive Scale

Post Syndicated from George Mao original https://aws.amazon.com/blogs/architecture/how-to-design-your-serverless-apps-for-massive-scale/

Serverless is one of the hottest design patterns in the cloud today, allowing you to focus on building and innovating, rather than worrying about the heavy lifting of server and OS operations. In this series of posts, we’ll discuss topics that you should consider when designing your serverless architectures. First, we’ll look at architectural patterns designed to achieve massive scale with serverless.

Scaling Considerations

In general, developers in a “serverful” world need to be worried about how many total requests can be served throughout the day, week, or month, and how quickly their system can scale. As you move into the serverless world, the most important question you should understand becomes: “What is the concurrency that your system is designed to handle?”

The AWS Serverless platform allows you to scale very quickly in response to demand. Below is an example of a serverless design that is fully synchronous throughout the application. During periods of extremely high demand, Amazon API Gateway and AWS Lambda will scale in response to your incoming load. This design places extremely high load on your backend relational database because Lambda can easily scale from thousands to tens of thousands of concurrent requests. In most cases, your relational databases are not designed to accept the same number of concurrent connections.

Serverless at scale-1

This design risks bottlenecks at your relational database and may cause service outages. This design also risks data loss due to throttling or database connection exhaustion.

Cloud Native Design

Instead, you should consider decoupling your architecture and moving to an asynchronous model. In this architecture, you use an intermediary service to buffer incoming requests, such as Amazon Kinesis or Amazon Simple Queue Service (SQS). You can configure Kinesis or SQS as out of the box event sources for Lambda. In design below, AWS will automatically poll your Kinesis stream or SQS resource for new records and deliver them to your Lambda functions. You can control the batch size per delivery and further place throttles on a per Lambda function basis.

Serverless at scale - 2

This design allows you to accept extremely high volume of requests, store the requests in a durable datastore, and process them at the speed which your system can handle.

Conclusion

Serverless computing allows you to scale much quicker than with server-based applications, but that means application architects should always consider the effects of scaling to your downstream services. Always keep in mind cost, speed, and reliability when you’re building your serverless applications.

Our next post in this series will discuss the different ways to invoke your Lambda functions and how to design your applications appropriately.

About the Author

George MaoGeorge Mao is a Specialist Solutions Architect at Amazon Web Services, focused on the Serverless platform. George is responsible for helping customers design and operate Serverless applications using services like Lambda, API Gateway, Cognito, and DynamoDB. He is a regular speaker at AWS Summits, re:Invent, and various tech events. George is a software engineer and enjoys contributing to open source projects, delivering technical presentations at technology events, and working with customers to design their applications in the Cloud. George holds a Bachelor of Computer Science and Masters of IT from Virginia Tech.

Deeper Connection with the Local Tech Community in India

Post Syndicated from Tingting (Teresa) Huang original https://blog.cloudflare.com/deeper-connection-with-the-local-tech-community-in-india/

Deeper Connection with the Local Tech Community in India

On June 6th 2019, Cloudflare hosted the first ever customer event in a beautiful and green district of Bangalore, India. More than 60 people, including executives, developers, engineers, and even university students, have attended the half day forum.

Deeper Connection with the Local Tech Community in India

The forum kicked off with a series of presentations on the current DDoS landscape, the cyber security trends, the Serverless computing and Cloudflare’s Workers. Trey Quinn, Cloudflare Global Head of Solution Engineering, gave a brief introduction on the evolution of edge computing.

Deeper Connection with the Local Tech Community in India

We also invited business and thought leaders across various industries to share their insights and best practices on cyber security and performance strategy. Some of the keynote and penal sessions included live demos from our customers.

Deeper Connection with the Local Tech Community in India

At this event, the guests had gained first-hand knowledge on the latest technology. They also learned some insider tactics that will help them to protect their business, to accelerate the performance and to identify the quick-wins in a complex internet environment.

Deeper Connection with the Local Tech Community in India

To conclude the event, we arrange some dinner for the guests to network and to enjoy a cool summer night.

Deeper Connection with the Local Tech Community in India

Through this event, Cloudflare has strengthened the connection with the local tech community. The success of the event cannot be separated from the constant improvement from Cloudflare and the continuous support from our customers in India.

As the old saying goes, भारत महान है (India is great). India is such an important market in the region. Cloudflare will enhance the investment and engagement in providing better services and user experience for India customers.

Deeper Connection with the Local Tech Community in India

Post Syndicated from Tingting (Teresa) Huang original https://blog.cloudflare.com/deeper-connection-with-the-local-tech-community-in-india/

Deeper Connection with the Local Tech Community in India

On June 6th 2019, Cloudflare hosted the first ever customer event in a beautiful and green district of Bangalore, India. More than 60 people, including executives, developers, engineers, and even university students, have attended the half day forum.

Deeper Connection with the Local Tech Community in India

The forum kicked off with a series of presentations on the current DDoS landscape, the cyber security trends, the Serverless computing and Cloudflare’s Workers. Trey Quinn, Cloudflare Global Head of Solution Engineering, gave a brief introduction on the evolution of edge computing.

Deeper Connection with the Local Tech Community in India

We also invited business and thought leaders across various industries to share their insights and best practices on cyber security and performance strategy. Some of the keynote and penal sessions included live demos from our customers.

Deeper Connection with the Local Tech Community in India

At this event, the guests had gained first-hand knowledge on the latest technology. They also learned some insider tactics that will help them to protect their business, to accelerate the performance and to identify the quick-wins in a complex internet environment.

Deeper Connection with the Local Tech Community in India

To conclude the event, we arrange some dinner for the guests to network and to enjoy a cool summer night.

Deeper Connection with the Local Tech Community in India

Through this event, Cloudflare has strengthened the connection with the local tech community. The success of the event cannot be separated from the constant improvement from Cloudflare and the continuous support from our customers in India.

As the old saying goes, भारत महान है (India is great). India is such an important market in the region. Cloudflare will enhance the investment and engagement in providing better services and user experience for India customers.

The Serverlist: Serverless makes a splash at JSConf EU and JSConf Asia

Post Syndicated from Connor Peshek original https://blog.cloudflare.com/the-serverlist-newsletter-6/

The Serverlist: Serverless makes a splash at JSConf EU and JSConf Asia

Check out our sixth edition of The Serverlist below. Get the latest scoop on the serverless space, get your hands dirty with new developer tutorials, engage in conversations with other serverless developers, and find upcoming meetups and conferences to attend.

Sign up below to have The Serverlist sent directly to your mailbox.



The Serverlist: Serverless makes a splash at JSConf EU and JSConf Asia

Post Syndicated from Connor Peshek original https://blog.cloudflare.com/the-serverlist-newsletter-6/

The Serverlist: Serverless makes a splash at JSConf EU and JSConf Asia

Check out our sixth edition of The Serverlist below. Get the latest scoop on the serverless space, get your hands dirty with new developer tutorials, engage in conversations with other serverless developers, and find upcoming meetups and conferences to attend.

Sign up below to have The Serverlist sent directly to your mailbox.



Join Cloudflare & Moz at our next meetup, Serverless in Seattle!

Post Syndicated from Giuliana DeAngelis original https://blog.cloudflare.com/join-cloudflare-moz-at-our-next-meetup-serverless-in-seattle/

Join Cloudflare & Moz at our next meetup, Serverless in Seattle!
Photo by oakie / Unsplash

Join Cloudflare & Moz at our next meetup, Serverless in Seattle!

Cloudflare is organizing a meetup in Seattle on Tuesday, June 25th and we hope you can join. We’ll be bringing together members of the developers community and Cloudflare users for an evening of discussion about serverless compute and the infinite number of use cases for deploying code at the edge.

To kick things off, our guest speaker Devin Ellis will share how Moz uses Cloudflare Workers to reduce time to first byte 30-70% by caching dynamic content at the edge. Kirk Schwenkler, Solutions Engineering Lead at Cloudflare, will facilitate this discussion and share his perspective on how to grow and secure businesses at scale.

Next up, Developer Advocate Kristian Freeman will take you through a live demo of Workers and highlight new features of the platform. This will be an interactive session where you can try out Workers for free and develop your own applications using our new command-line tool.

Food and drinks will be served til close so grab your laptop and a friend and come on by!

View Event Details & Register Here

Agenda:

  • 5:00 pm Doors open, food and drinks
  • 5:30 pm Customer use case by Devin and Kirk
  • 6:00 pm Workers deep dive with Kristian
  • 6:30 – 8:30 pm Networking, food and drinks

Join Cloudflare & Moz at our next meetup, Serverless in Seattle!

Post Syndicated from Giuliana DeAngelis original https://blog.cloudflare.com/join-cloudflare-moz-at-our-next-meetup-serverless-in-seattle/

Join Cloudflare & Moz at our next meetup, Serverless in Seattle!
Photo by oakie / Unsplash

Join Cloudflare & Moz at our next meetup, Serverless in Seattle!

Cloudflare is organizing a meetup in Seattle on Tuesday, June 25th and we hope you can join. We’ll be bringing together members of the developers community and Cloudflare users for an evening of discussion about serverless compute and the infinite number of use cases for deploying code at the edge.

To kick things off, our guest speaker Devin Ellis will share how Moz uses Cloudflare Workers to reduce time to first byte 30-70% by caching dynamic content at the edge. Kirk Schwenkler, Solutions Engineering Lead at Cloudflare, will facilitate this discussion and share his perspective on how to grow and secure businesses at scale.

Next up, Developer Advocate Kristian Freeman will take you through a live demo of Workers and highlight new features of the platform. This will be an interactive session where you can try out Workers for free and develop your own applications using our new command-line tool.

Food and drinks will be served til close so grab your laptop and a friend and come on by!

View Event Details & Register Here

Agenda:

  • 5:00 pm Doors open, food and drinks
  • 5:30 pm Customer use case by Devin and Kirk
  • 6:00 pm Workers deep dive with Kristian
  • 6:30 – 8:30 pm Networking, food and drinks

Updates to Serverless Architectural Patterns and Best Practices

Post Syndicated from Drew Dennis original https://aws.amazon.com/blogs/architecture/updates-to-serverless-architectural-patterns-and-best-practices/

As we sail past the halfway point between re:Invent 2018 and re:Invent 2019, I’d like to revisit some of the recent serverless announcements we’ve made. These are all complimentary to the patterns discussed in the re:Invent architecture track’s Serverless Architectural Patterns and Best Practices session.

AWS Event Fork Pipelines

AWS Event Fork Pipelines was announced in March 2019. Many customers use asynchronous event-driven processing in their serverless applications to decouple application components and address high concurrency needs. And in doing so, they often find themselves needing to backup, search, analyze, or replay these asynchronous events. That is exactly what AWS Event Fork Pipelines aims to achieve. You can plug them into a new or existing SNS topic used by your application and immediately address retention and compliance needs, gain new business insights, or even improve your application’s disaster recovery abilities.

AWS Event Fork Pipelines is a suite of three applications. The first application addresses event storage and backup needs by writing all events to an S3 bucket where they can be queried with services like Amazon Athena. The second is a search and analytics pipeline that delivers events to a new or existing Amazon ES domain, enabling search and analysis of your events. Finally, the third application is an event replay pipeline that can be used to reprocess messages should a downstream failure occur in your application. AWS Event Fork Pipelines is available in AWS Serverless Application Model (SAM) templates and are available in the AWS Serverless Application Repository (SAR). Check out our example e-commerce application on GitHub..

Amazon API Gateway Serverless Developer Portal

If you publish APIs for developers allowing them to build new applications and capabilities with your data, you understand the need for a developer portal. Also, in March 2019, we announced some significant upgrades to the API Gateway Serverless Developer Portal. The portal’s front end is written in React and is designed to be fully customizable.

The API Gateway Serverless Developer Portal is also available in GitHub and the AWS SAR. As you can see from the architecture diagram below, it is integrated with Amazon Cognito User Pools to allow developers to sign-up, receive an API Key, and register for one or more of your APIs. You can now also enable administrative scenarios from your developer portal by logging in as users belonging to the portal’s Admin group which is created when the portal is initially deployed to your account. For example, you can control which APIs appear in a customer’s developer portal, enable SDK downloads, solicit developer feedback, and even publish updates for APIs that have been recently revised.

AWS Lambda with Amazon Application Load Balancer (ALB)

Serverless microservices have been built by our customers for quite a while, with AWS Lambda and Amazon API Gateway. At re:Invent 2018 during Dr. Werner Vogel’s keynote, a new approach to serverless microservices was announced, Lambda functions as ALB targets.

ALB’s support for Lambda targets gives customers the ability to deploy serverless code behind an ALB, alongside servers, containers, and IP addresses. With this feature, ALB path and host-based routing can be used to direct incoming requests to Lambda functions. Also, ALB can now provide an entry point for legacy applications to take on new serverless functionality, and enable migration scenarios from monolithic legacy server or container-based applications.

Use cases for Lambda targets for ALB include adding new functionality to an existing application that already sits behind an ALB. This could be request monitoring by sending http headers to Elasticsearch clusters or implementing controls that manage cookies. Check out our demo of this new feature. For additional details, take a look at the feature’s documentation.

Security Overview of AWS Lambda Whitepaper

Finally, I’d be remiss if I didn’t point out the great work many of my colleagues have done in releasing the Security Overview of AWS Lambda Whitepaper. It is a succinct and enlightening read for anyone wishing to better understand the Lambda runtime environment, function isolation, or data paths taken for payloads sent to the Lambda service during synchronous and asynchronous invocations. It also has some great insight into compliance, auditing, monitoring, and configuration management of your Lambda functions. A must read for anyone wishing to better understand the overall security of AWS serverless applications.

I look forward to seeing everyone at re:Invent 2019 for more exciting serverless announcements!

About the author

Drew DennisDrew Dennis is a Global Solutions Architect with AWS based in Dallas, TX. He enjoys all things Serverless and has delivered the Architecture Track’s Serverless Patterns and Best Practices session at re:Invent the past three years. Today, he helps automotive companies with autonomous driving research on AWS, connected car use cases, and electrification.

Enhancing the Optimizely Experimentation Platform with Cloudflare Workers

Post Syndicated from Remy Guercio original https://blog.cloudflare.com/enhancing-optimizely-with-cloudflare-workers/

Enhancing the Optimizely Experimentation Platform with Cloudflare Workers

Enhancing the Optimizely Experimentation Platform with Cloudflare Workers

This is a joint post by Whelan Boyd, Senior Product Manager at Optimizely and Remy Guercio, Product Marketing Manager for Cloudflare Workers.

Experimentation is an important ingredient in driving business growth: whether you’re iterating on a product or testing new messaging, there’s no substitute for the data and insights gathered from conducting rigorous experiments in the wild.

Optimizely is the world’s leading experimentation platform, with thousands of customers worldwide running tests for over 140 million visitors daily. If Optimizely were a website, it would be the third most trafficked in the US.  And when it came time to experiment with reinvigorating their own platform, Optimizely chose Cloudflare Workers.

Improving Performance and Agility with Cloudflare Workers

Cloudflare Workers is a globally distributed serverless compute platform that runs across Cloudflare’s network of 180 locations worldwide. Workers are designed for flexibility, with many different use cases ranging from customizing configuration of Cloudflare services and features to building full, independent applications.

In this post, we’re going to focus on how Workers can be used to improve performance and increase agility for more complex applications. One of the key benefits of Workers is that they allow developers to move decision logic and data into a highly efficient runtime operating in close proximity to end users — resulting in significant performance benefits and flexibility. Which brings us to Optimizely…

How Optimizely Works

Every week Optimizely delivers billions of experiences to help teams A/B test new products, de-risk new feature launches, and validate alternative designs. Optimizely lets companies test client-side changes like layouts and copy, as well as server-side changes like algorithms and feature rollouts.

Let’s explore how both have challenges that can be overcome with Workers, starting with Optimizely’s client-side A/B testing, or Optimizely Web, product.

Use Case: Optimizely Web

The main benefit of Optimizely Web — Optimizely’s client-side testing framework — is that it supports A/B testing via straightforward insertion of a JavaScript tag on the web page. The test is designed via the Optimizely WYSIWYG editor, and is live within minutes. Common use cases include style updates, image swaps, headlines and other text changes. You can also write any custom JavaScript or CSS you want.

With client-side A/B testing, the browser downloads JavaScript that modifies the page as it’s loading.  To avoid “flash-of-unstyled-content” (FOUC), developers need to implement this JavaScript synchronously in their <head> tag.  This constraint, though, can lead to page performance issues, especially on slower connections and devices.  Downloading and executing JavaScript in the browser has a cost, and this cost increases if the amount of JavaScript is large.  With a normal Optimizely Web implementation, all experiments are included in the JavaScript loaded on every page.

Enhancing the Optimizely Experimentation Platform with Cloudflare Workers
A traditional Optimizely implementation

With Workers, Optimizely can support many of these same use cases, but hoists critical logic to the edge to avoid much of the performance cost. Here’s how it works:

Enhancing the Optimizely Experimentation Platform with Cloudflare Workers
Implementing tests with Optimizely and Cloudflare Workers

This diagram shows how Optimizely customers can execute experiments created in the point-and-click UI through a Cloudflare Worker.  Rather than the browser downloading a large JavaScript file, your Worker handling HTTP/S requests calls out to Optimizely’s Worker.  Optimizely’s Worker determines which A/B tests should be active on this page and returns a small amount of JavaScript back to your Worker.  In fact, it is the JavaScript required to execute A/B test variations on just that specific page load.  Your Worker inlines the code in the page and returns it to the visitor’s browser.  

Not only does this avoid a browser bottleneck downloading a lot of data, the amount of code to execute is a fraction of a normal client-side implementation.  Since the experiments are set up inside the Optimizely interface just like any other Web experiment, you can run as many as you want without waiting for code deploy cycles.  Better yet, your non-technical (e.g. marketing) teams can still run these without depending on developers for each test.  It’s a one-time implementation.

Use Case: Going Further with Feature Rollouts

Optimizely Full Stack is Optimizely’s server-side experimentation and feature flagging platform for websites, mobile apps, chatbots, APIs, smart devices, and anything else with a network connection.  You can deploy code behind feature flags, experiment with A/B tests, and roll out or roll back features immediately.  Optimizely Rollouts is a free version of Full Stack that supports key feature rollout capabilities.

Full Stack SDKs are often implemented and instantiated directly in application code.

Enhancing the Optimizely Experimentation Platform with Cloudflare Workers
An Optimizely full stack experimentation setup

The main blocker to high velocity server-side testing is that experiments and feature rollouts must go through the code-deploy cycle — and to further add to the headache, many sites cache content on CDNs, so experiments or rollouts running at the origin never execute.  

In this example, we’ll consider a new feature you’d like to roll out gradually, exposing more and more users over time between code deploys. With Workers, you can implement feature rollouts by running the Optimizely JavaScript SDK at the edge.  The Worker is effectively a decision service.  Instead of installing the JS SDK inside each application service where you might need to gate or roll out features, centralize instantiation in a Worker.

From your application, simply hit the Worker and the response will tell you whether a feature is enabled for that particular user.  In the example below, we supply via query parameters a userId, feature, and account-specific SDK key and the Worker responds with its decision in result.  Below is a sample Cloudflare Worker:

import { createManager } from '../index'

/// <reference lib="es2015" />
/// <reference lib="webworker" />

addEventListener('fetch', (event: any) => {
  event.respondWith(handleRequest(event.request))
})

/**
 * Fetch and log a request
 * @param {Request} request
 */
async function handleRequest(request: Request): Promise<Response> {
  const url = new URL(request.url)
  const key = url.searchParams.get('key')
  const userId = url.searchParams.get('userId')
  const feature = url.searchParams.get('feature')
  if (!feature || !key || !userId) {
    throw new Error('must supply "feature", "userId" and "key"')
  }

  try {
    const manager = createManager({
      sdkKey: key,
    })

    await manager.onReady().catch(err => {
      return new Response(JSON.stringify({ status: 'error' }))
    })
    const client = manager.getClient()

    const result = await client.feature({
      key: feature,
      userId,
    })

    return new Response(JSON.stringify(result))
  } catch (e) {
    return new Response(JSON.stringify({ status: 'error' }))
  }
}

This kind of setup is common for React applications, which may update store values based on decisions returned by the Worker. No need to force a request all the way back to origin.

All in all, using Workers as a centralized decision service can reduce the complexity of your Full Stack implementation and support applications that rely on heavy caching.

How to Improve Your Experimentation Setup

Both of the examples above demonstrate how Workers can provide speed and flexibility to experimentation and feature flagging.  But this is just the tip of the iceberg!  There are plenty of other ways you can use these two technologies together. We’d love to hear from you and explore them together!

Are you a developer looking for a feature flagging or server-side testing solution? The Optimizely Rollouts product is free and ready for you to sign up!

Or does your marketing team need a high performance A/B testing solution? The Optimizely Web use case is in developer preview.

  • Cloudflare Enterprise Customers: Reach out to your dedicated Cloudflare account manager learn more and start the process.
  • Optimizely Customers and Cloudflare Customers (who aren’t on an enterprise plan): Reach out to your Optimizely contact to learn more and start the process.

You can sign up for and learn more about using Cloudflare Workers here!

Getting started with serverless

Post Syndicated from Rachel Richardson original https://aws.amazon.com/blogs/compute/getting-started-with-serverless/

This post is contributed by Maureen Lonergan, Director, AWS Training and Certification

We consistently hear from customers that they’re interested in building serverless applications to take advantage of the increased agility and decreased total cost of ownership (TCO) that serverless delivers. But we also know that serverless may be intimidating for those who are more accustomed to using instances or containers for compute.

Since we launched AWS Lambda in 2014, our serverless portfolio has expanded beyond event-driven computing. We now have serverless databases, integration, and orchestration tools. This enables you to build end-to-end serverless applications—but it also means that you must learn how to build using a new serverless operational model.

For this reason, AWS Training and Certification is pleased to offer a new course through Coursera entitled AWS Fundamentals: Building Serverless Applications.

This scenario-based course, developed by the experts at AWS, will:

  • Introduce the AWS serverless framework and architecture in the context of a real business problem.
  • Provide the foundational knowledge to become more proficient in choosing and creating serverless solutions using AWS.
  • Provide demonstrations of the AWS services needed for deploying serverless solutions.
  • Help you develop skills in building and deploying serverless solutions using real-world examples of a serverless website and chatbot.

The syllabus allocates more than nine hours of video content and reading material over four weekly lessons. Each lesson has an estimated 2–3 hours per week of study time (though you can set your own pace and deadlines), with suggested exercises in the AWS Management Console. There is an end-of-course assessment that covers all the learning objectives and content.

The course is on-demand and 100% digital; you can even audit it for free. A completion certificate and access to the graded assessments are available for $49.

What can you expect?

In this course you will learn to use the AWS Serverless portfolio to create a chatbot that answers the question, “Can I let my cat outside?” You will build an application using every one of the concepts and services discussed in the class, including:

At the end of the class, you can audibly interact with the application to ask that essential question, “Can my cat go out in Denver?” (See the conversation in the following screenshot.)

Serverless Coursera training app

Across the four weeks of the course, you learn:

  1. What serverless computing is and how to create a chatbot with Amazon Lex using an S3 bucket to host a web application.
  2. How to build a highly scalable API with API Gateway and use Amazon CloudFront as a content delivery network (CDN) for your site and API.
  3. How to use Lambda to build serverless functions that write data to DynamoDB.
  4. How to apply lessons from the previous weeks to extend and add functionality to the chatbot.

Serverless Coursera training

AWS Fundamentals: Building Serverless Applications is now available. This course complements other standalone digital courses by AWS Training and Certification. They include the highly recommended Introduction to Serverless Development, as well as the following:

MultiCloud… flare

Post Syndicated from Zack Bloom original https://blog.cloudflare.com/multicloudflare/

If you want to start an intense conversation in the halls of Cloudflare, try describing us as a “CDN”. CDNs don’t generally provide you with Load Balancing, they don’t allow you to deploy Serverless Applications, and they certainly don’t get installed onto your phone. One of the costs of that confusion is many people don’t realize everything Cloudflare can do for people who want to operate in multiple public clouds, or want to operate in both the cloud and on their own hardware.

Load Balancing

Cloudflare has countless servers located in 180 data centers around the world. Each one is capable of acting as a Layer 7 load balancer, directing incoming traffic between origins wherever they may be. You could, for example, add load balancing between a set of machines you have in AWS’ EC2, and another set you keep in Google Cloud.

This load balancing isn’t just round-robining traffic. It supports weighting to allow you to control how much traffic goes to each cluster. It supports latency-based routing to automatically route traffic to the cluster which is closer (so adding geographic distribution can be as simple as spinning up machines). It even supports health checks, allowing it to automatically direct traffic to the cloud which is currently healthy.

Most importantly, it doesn’t run in any of the provider’s clouds and isn’t dependent on them to function properly. Even better, since the load balancing runs near virtually every Internet user around the world it doesn’t come at any performance cost. (Using our Argo technology performance often gets better!).

Argo Tunnel

One of the hardest components to managing a multi-cloud deployment is networking. Each provider has their own method of defining networks and firewalls, and even tools which can deploy clusters across multiple clouds often can’t quite manage to get the networking configuration to work in the same way. The task of setting it up can often be a trial-and-error game where the final config is best never touched again, leaving ‘going multi-cloud’ as a failed experiment within organizations.

At Cloudflare we have a technology called Argo Tunnel which flips networking on its head. Rather than opening ports and directing incoming traffic, each of your virtual machines (or k8s pods) makes outbound tunnels to the nearest Cloudflare PoPs. All of your Internet traffic then flows over those tunnels. You keep all your ports closed to inbound traffic, and never have to think about Internet networking again.

What’s so powerful about this configuration is is makes it trivial to spin up machines in new locations. Want a dozen machines in Australia? As long as they start the Argo Tunnel daemon they will start receiving traffic. Don’t need them any more? Shut them down and the traffic will be routed elsewhere. And, of course, none of this relies on any one public cloud provider, making it reliable even if they should have issues.

Argo Tunnel makes it trivial to add machines in new clouds, or to keep machines on-prem even as you start shifting workloads into the Cloud.

Access Control

One thing you’ll realize about using Argo Tunnel is you now have secure tunnels which connect your infrastructure with Cloudflare’s network. Once traffic reaches that network, it doesn’t necessarily have to flow directly to your machines. It could, for example, have access control applied where we use your Identity Provider (like Okta or Active Directory) to decide who should be able to access what. Rather than wrestling with VPCs and VPN configurations, you can move to a zero-trust model where you use policies to decide exactly who can access what on a per-request basis.

In fact, you can now do this with SSH as well! You can manage all your user accounts in a single place and control with precision who can access which piece of infrastructure, irrespective of which cloud it’s in.

Our Reliability

No computer system is perfect, and ours is no exception. We make mistakes, have bugs in our code, and deal with the pain of operating at the scale of the Internet every day. One great innovation in the recent history of computers, however, is the idea that it is possible to build a reliable system on top of many individually unreliable components.

Each of Cloudflare’s PoPs is designed to function without communication or support from others, or from a central data center. That alone greatly increases our tolerance for network partitions and moves us from maintaining a single system to be closer to maintaining 180 independent clouds, any of which can serve all traffic.

We are also a system built on anycast which allows us to tap into the fundamental reliability of the Internet. The Internet uses a protocol called BGP which asks each system who would like to receive traffic for a particular IP address to ‘advertise’ it. Each router then will decide to forward traffic based on which person advertising an address is the closest. We advertise all of our IP addresses in every one of our data centers. If a data centers goes down, it stops advertising BGP routes, and the very same packets which would have been destined for it arrive in another PoP seamlessly.

Ultimately we are trying to help build a better Internet. We don’t believe that Internet is built on the back of a single provider. Many of the services provided by these cloud providers are simply too complex to be as reliable as the Internet demands.

True reliability and cost control both require existing on multiple clouds. It is clear that the tools which the Internet of the 80s and 90s gave us may be insufficient to move into that future. With a smarter network we can do more, better.