Tag Archives: Network

DDoS attacks have evolved, and so should your DDoS protection

Post Syndicated from Arun Singh original https://blog.cloudflare.com/ddos-attacks-have-evolved-and-so-should-your-ddos-protection/

DDoS attacks have evolved, and so should your DDoS protection

DDoS attacks have evolved, and so should your DDoS protection

The proliferation of DDoS attacks of varying size, duration, and persistence has made DDoS protection a foundational part of every business and organization’s online presence. However, there are key considerations including network capacity, management capabilities, global distribution, alerting, reporting and support that security and risk management technical professionals need to evaluate when selecting a DDoS protection solution.

Gartner’s view of the DDoS solutions; How did Cloudflare fare?

Gartner recently published the report Solution Comparison for DDoS Cloud Scrubbing Centers (ID G00467346), authored by Thomas Lintemuth, Patrick Hevesi and Sushil Aryal. This report enables customers to view a side-by-side solution comparison of different DDoS cloud scrubbing centers measured against common assessment criteria.  If you have a Gartner subscription, you can view the report here. Cloudflare has received the greatest number of ‘High’ ratings as compared to the 6 other DDoS vendors across 23 assessment criteria in the report.

The vast landscape of DDoS attacks

From our perspective, the nature of DDoS attacks has transformed, as the economics and ease of launching a DDoS attack has changed dramatically. With a rise in cost-effective capabilities of launching a DDoS attack, we have observed a rise in the number of under 10 Gbps DDoS network-level attacks, as shown in the figure below. Even though 10 Gbps from an attack size perspective does not seem that large, it is large enough to significantly affect a majority of the websites existing today.

DDoS attacks have evolved, and so should your DDoS protection

At the same time, larger-sized DDoS attacks are still prevalent and have the capability of crippling the availability of an organization’s infrastructure. In March 2020, Cloudflare observed numerous 300+ Gbps attacks with the largest attack being 550 Gbps in size.

DDoS attacks have evolved, and so should your DDoS protection

In the report Gartner also observes a similar trend, “In speaking with the vendors for this research, Gartner discovered a consistent theme: Clients are experiencing more frequent smaller attacks versus larger volumetric attacks.” In addition, they also observe that “For enterprises with Internet connections up to and exceeding 10 Gbps, frequent but short attacks up to 10 Gbps are still quite disruptive without DDoS protection. Not to say that large attacks have gone away. We haven’t seen a 1-plus Tbps attack since spring 2018, but attacks over 500 Gbps are still common.”

Gartner recommends in the report to “Choose a provider that offers scrubbing capacity of three times the largest documented volumetric attack on your continent.”

From an application-level DDoS attack perspective an interesting DDoS attack observed and mitigated by Cloudflare last year, is shown below. This HTTP DDoS attack had a peak of 1.4M requests per second, which isn’t highly rate-intensive. However, the fact that the 1.1M IPs from which the attack originated were unique and not spoofed made the attack quite interesting. The unique IP addresses were actual clients who were able to complete a TCP and HTTPS handshake.

DDoS attacks have evolved, and so should your DDoS protection

Harness the full power of Cloudflare’s DDoS protection

Cloudflare’s cloud-delivered DDoS solution provides key features that enable security professionals to protect their organizations and customers against even the most sophisticated DDoS attacks. Some of the key features and benefits include:

  • Massive network capacity: With over 35 Tbps of network capacity, Cloudflare ensures that you are protected against even the most sophisticated and largest DDoS attacks. Cloudflare’s network capacity is almost equal to the total scrubbing capacity of the other 6 leading DDoS vendors combined.
  • Globally distributed architecture: Having a few scrubbing centers globally to mitigate DDoS attacks is an outdated approach. As DDoS attacks scale and individual attacks originate from millions of unique IPs worldwide, it’s important to have a DDoS solution that mitigates the attack at the source rather than hauling traffic to a dedicated scrubbing center. With every one of our data centers across 200 cities enabled with full DDoS mitigation capabilities, Cloudflare has more points of presence than the 6 leading DDoS vendors combined.
  • Fast time to mitigation: Automated edge-analyzed and edge-enforced DDoS mitigation capabilities allows us to mitigate attacks at unprecedented speeds. Typical time to mitigate a DDoS attack is less than 10s.
  • Integrated security: A key design tenet while building products at Cloudflare is integration. Our DDoS solution integrates seamlessly with other product offerings including WAF, Bot Management, CDN and many more. A comprehensive and integrated security solution to bolster the security posture while aiding performance. No tradeoffs between security and performance!
  • Unmetered and unlimited mitigation: Cloudflare offers unlimited and unmetered DDoS mitigation. This eliminates the legacy concept of ‘Surge Pricing,’ which is especially painful when a business is under duress and experiencing a DDoS attack. This enables you to avoid unpredictable costs from traffic.

Whether you’re part of a large global enterprise, or use Cloudflare for your personal site, we want to make sure that you’re protected and also have the visibility that you need. DDoS Protection is included as part of every Cloudflare service. Enterprise-level plans include advanced mitigation, detailed reporting, enriched logs, productivity enhancements and fine-grained controls. Enterprise Plan customers also receive access to dedicated customer success and solution engineering.

To learn more about Cloudflare’s DDoS solution contact us or get started.

*Gartner “Solution Comparison for DDoS Cloud Scrubbing Centers,” Thomas Lintemuth,  Patrick Hevesi, Sushil Aryal, 16 April 2020

Remote work, regional lockdowns and migration of Internet usage

Post Syndicated from Vasco Asturiano original https://blog.cloudflare.com/remote-work-regional-lockdowns-and-migration-of-internet-usage/

Remote work, regional lockdowns and migration of Internet usage

The recommendation for social distancing to slow down the spread of COVID-19 has led many companies to adopt a work-from-home policy for their employees in offices around the world, and Cloudflare is no exception.

As a result, a large portion of Internet access shifted from office-focused areas, like city centers and business parks, towards more residential areas like suburbs and outlying towns. We wanted to find out just precisely how broad this geographical traffic migration was, and how different locations were affected by it.

It turns out it is substantial, and the results are quite stunning:

Remote work, regional lockdowns and migration of Internet usage

Gathering the Data

So how can we determine if Internet usage patterns have changed from a geographical perspective?

In each Cloudflare Point of Presence (in more than 200 cities worldwide) there’s an edge router whose responsibility it is to switch Internet traffic to serve the requests of end users in the region.

These edge routers are the network’s entry point and for monitoring and debugging purposes each router samples IP packet information regarding the traffic that traverses them. This data is collected as flow records and contains layer-3 related information, such as the source and destination IP address, port, packet size etc.

These statistical samples allow us to monitor aggregate traffic information, spot unusual activity, and verify that our connections to the greater Internet are working correctly.

By using a geolocation database like Maxmind, we can determine the rough location from which a flow emanates. If the geographic data is too imprecise it is excluded from our set to reduce noise.

By merging together information about flows that come from similar areas, we can infer a geospatial distribution of traffic volume across the globe.

Visual Representation

This distribution of traffic volume is an indication of Internet usage patterns. By plotting a geospatial heatmap of the Internet traffic in New York City for two separate dates four weeks apart (February 19, 2020 and March 18, 2020), we can already notice an erosion of traffic volume from the city center dissipating into the surrounding areas.

All the charts in this blog post show day time traffic in each geography.  This is done to show the difference working from home is having on Internet traffic.

Remote work, regional lockdowns and migration of Internet usage
Remote work, regional lockdowns and migration of Internet usage

But the migration pattern really jumps forward when producing a differential heatmap of these two dates:

Remote work, regional lockdowns and migration of Internet usage

This chart shows a diff of the “before” and “after” scenarios, highlighting only differences in volume between the two dates. The red color represents areas where Internet usage has decreased since February 19, and green where it has increased.

It strongly suggests a migration of Internet usage towards suburban and residential areas.

The data used is comparing working hours periods (1000 to 1600 local time) between two Wednesdays four weeks apart (February 19 and March 18).

Cities

We produced  similar visual representations for other impacted locations around the world. All exhibit similar migration patterns.

Seattle, US

Remote work, regional lockdowns and migration of Internet usage

Additional features can be picked out from the chart. The red dot in Silverdale, WA appears to be Internet access from the Kitsap Mall (which is currently closed). And Sea-Tac Airport is seen as a red area between Burien and Renton.

San Francisco Bay Area, US

Remote work, regional lockdowns and migration of Internet usage

Areas where large Silicon Valley businesses have sent employees home are clearly visible.

Paris, France

Remote work, regional lockdowns and migration of Internet usage

The red area in north east Paris appears to be the airport at Le Bourget and the surrounding industrial zone. To the west there’s another red area: Le Château de Versailles.

Berlin, Germany

Remote work, regional lockdowns and migration of Internet usage

In Berlin a red dot near the Tegeler See is Flughafen Berlin-Tegel likely resulting from fewer passengers passing through the airport.

Network Impact

As shown above, geographical Internet usage changes are visible from Internet traffic exchanged between end users and Cloudflare at various locations around the globe.

Besides the geographical migration in various metropolitan areas, the overall volume of traffic in these locations has also increased between 10% and 40% in just a period of four weeks.

It’s interesting to reflect that the Internet was originally conceived as a communications network for humanity during a crisis, and it’s come a long way since then. But in this moment of crisis, it’s being put to use for that original purpose. The Internet was built for this.

Cloudflare helps power a substantial portion of Internet traffic. During this time of increased strain on networks around the world, our team is working to ensure our service continues to run smoothly.

We are also providing our Cloudflare for Teams products at no cost to companies of any size struggling to support their employees working from home: learn more.

Conntrack tales – one thousand and one flows

Post Syndicated from Marek Majkowski original https://blog.cloudflare.com/conntrack-tales-one-thousand-and-one-flows/

Conntrack tales - one thousand and one flows

At Cloudflare we develop new products at a great pace. Their needs often challenge the architectural assumptions we made in the past. For example, years ago we decided to avoid using Linux’s "conntrack" – stateful firewall facility. This brought great benefits – it simplified our iptables firewall setup, sped up the system a bit and made the inbound packet path easier to understand.

But eventually our needs changed. One of our new products had a reasonable need for it. But we weren’t confident – can we just enable conntrack and move on? How does it actually work? I volunteered to help the team understand the dark corners of the "conntrack" subsystem.

What is conntrack?

"Conntrack" is a part of Linux network stack, specifically part of the firewall subsystem. To put that into perspective: early firewalls were entirely stateless. They could express only basic logic, like: allow SYN packets to port 80 and 443, and block everything else.

The stateless design gave some basic network security, but was quickly deemed insufficient. You see, there are certain things that can’t be expressed in a stateless way. The canonical example is assessment of ACK packets – it’s impossible to say if an ACK packet is legitimate or part of a port scanning attempt, without tracking the connection state.

To fill such gaps all the operating systems implemented connection tracking inside their firewalls. This tracking is usually implemented as a big table, with at least 6 columns: protocol (usually TCP or UDP), source IP, source port, destination IP, destination port and connection state. On Linux this subsystem is called "conntrack" and is often enabled by default. Here’s how the table looks on my laptop inspected with "conntrack -L" command:

Conntrack tales - one thousand and one flows

The obvious question is how large this state tracking table can be. This setting is under "/proc/sys/net/nf_conntrack_max":

$ cat /proc/sys/net/nf_conntrack_max
262144

This is a global setting, but the limit is per per-container. On my system each container, or "network namespace", can have up to 256K conntrack entries.

What exactly happens when the number of concurrent connections exceeds the conntrack limit?

Testing conntrack is hard

In past testing conntrack was hard – it required complex hardware or vm setup. Fortunately, these days we can use modern "user namespace" facilities which do permission magic, allowing an unprivileged user to feel like root. Using the tool "unshare" it’s possible to create an isolated environment where we can precisely control the packets going through and experiment with iptables and conntrack without threatening the health of our host system. With appropriate parameters it’s possible to create and manage a networking namespace, including access to namespaced iptables and conntrack, from an unprivileged user.

This script is the heart of our test:

# Enable tun interface
ip tuntap add name tun0 mode tun
ip link set tun0 up
ip addr add 192.0.2.1 peer 192.0.2.2 dev tun0
ip route add 0.0.0.0/0 via 192.0.2.2 dev tun0

# Refer to conntrack at least once to ensure it's enabled
iptables -t raw -A PREROUTING -j CT
# Create a counter in mangle table
iptables -t mangle -A PREROUTING
# Make sure reverse traffic doesn't affect conntrack state
iptables -t raw -A OUTPUT -p tcp --sport 80 -j DROP

tcpdump -ni any -B 16384 -ttt &
...
./venv/bin/python3 send_syn.py

conntrack -L
# Show iptables counters
iptables -nvx -t raw -L PREROUTING
iptables -nvx -t mangle -L PREROUTING

This bash script is shortened for readability. See the full version here. The accompanying "send_syn.py" is just sending 10 SYN packets over "tun0" interface. Here is the source but allow me to paste it here – showing off "scapy" is always fun:

tun = TunTapInterface("tun0", mode_tun=True)
tun.open()

for i in range(10000,10000+10):
    ip=IP(src="198.18.0.2", dst="192.0.2.1")
    tcp=TCP(sport=i, dport=80, flags="S")
    send(ip/tcp, verbose=False, inter=0.01, socket=tun)

The bash script above contains a couple of gems. Let’s walk through them.

First, please note that we can’t just inject packets into the loopback interface using SOCK_RAW sockets. The Linux networking stack is a complex beast. The semantics of sending packets over a SOCK_RAW are different then delivering a packet over a real interface. We’ll discuss this later, but for now, to avoid triggering unexpected behaviour, we will deliver packets over a tun/tap device which better emulates a real interface.

Then we need to make sure the conntrack is active in the network namespace we wish to use for testing. Traditionally, just loading the kernel module would have done that, but in the brave new world of containers and network namespaces, a method had to be found to allow conntrack to be active in some and inactive in other containers. Hence this is tied to usage – rules referencing conntrack must exist in the namespace’s iptables for conntrack to be active inside the container.

As a side note, containers triggering host to load kernel modules is an interesting subject.

After the "-t raw -A PREROUTING" rule, which we added "-t mangle -A PREROUTING" rule, but notice – it doesn’t have any action! This syntax is allowed by iptables and it is pretty useful to get iptables to report rule counters. We’ll need these counters soon. A careful reader might suggest looking at "policy" counters in iptables to achieve our goal. Sadly, "policy" counters (increased for each packet entering a chain), work only if there is at least one rule inside it.

The rest of the steps are self-explanatory. We set up "tcpdump" in the background, send 10 SYN packets to 127.0.0.1:80 using the "scapy" Python library. Ten we print the conntrack table and iptables counters.

Let’s run this script in action. Remember to run it under networking namespace as fake root with "unshare -Ur -n":

Conntrack tales - one thousand and one flows

This is all nice. First we see a "tcpdump" listing showing 10 SYN packets. Then we see the conntrack table state, showing 10 created flows. Finally, we see iptables counters in two rules we created, each showing 10 packets processed.

Can conntrack table fill up?

Given that the conntrack table is size constrained, what exactly happens when it fills up? Let’s check it out. First, we need to drop the conntrack size. As mentioned it’s controlled by a global toggle – it’s necessary to tune it on the host side. Let’s reduce the table size to 7 entries, and repeat our test:

Conntrack tales - one thousand and one flows

This is getting interesting. We still see the 10 inbound SYN packets. We still see that the "-t raw PREROUTING" table received 10 packets, but this is where similarities end. The "-t mangle PREROUTING" table saw only 7 packets. Where did the three missing SYN packets go?

It turns out they went where all the dead packets go. They were hard dropped. Conntrack on overfill does exactly that. It even complains in the "dmesg":

Conntrack tales - one thousand and one flows

This is confirmed by our iptables counters. Let’s review the famous iptables diagram:

Conntrack tales - one thousand and one flows
image by Jan Engelhardt CC BY-SA 3.0

As we can see, the "-t raw PREROUTING" happens before conntrack, while "-t mangle PREROUTING" is just after it. This is why we see 10 and 7 packets reported by our iptables counters.

Let me emphasize the gravity of our discovery. We showed three completely valid SYN packets being implicitly dropped by "conntrack". There is no explicit "-j DROP" iptables rule. There is no configuration to be toggled. Just the fact of using "conntrack" means that, when it’s full, packets creating new flows will be dropped. No questions asked.

This is the dark side of using conntrack. If you use it, you absolutely must make sure it doesn’t get filled.

We could end our investigation here, but there are a couple of interesting caveats.

Strict vs loose

Conntrack supports a "strict" and "loose" mode, as configured by "nf_conntrack_tcp_loose" toggle.

$ cat /proc/sys/net/netfilter/nf_conntrack_tcp_loose
1

By default, it’s set to "loose" which means that stray ACK packets for unseen TCP flows will create new flow entries in the table. We can generalize: "conntrack" will implicitly drop all the packets that create new flow, whether that’s SYN or just stray ACK.

What happens when we clear the "nf_conntrack_tcp_loose=0" setting? This is a subject for another blog post, but suffice to say – mess. First, this setting is not settable in the network namespace scope – although it should be. To test it you need to be in the root network namespace. Then, due to twisted logic the ACK will be dropped on a full conntrack table, even though in this case it doesn’t create a flow. If the table is not full, the ACK packet will pass through it, having "-ctstate INVALID" from "mangle" table forward.

When a conntrack entry is not created?

There are important situations when conntrack entry is not created. For example, we could replace these line in our script:

# Make sure reverse traffic doesn't affect conntrack state
iptables -t raw -A OUTPUT -p tcp --sport 80 -j DROP

With those:

# Make sure inbound SYN packets don't go to networking stack
iptables -A INPUT -j DROP

Naively we could think dropping SYN packets past the conntrack layer would not interfere with the created flows. This is not correct. In spite of these SYN packets having been seen by conntrack, no flow state is created for them. Packets hitting "-j DROP" will not create new conntrack flows. Pretty magical, isn’t it?

Full Conntrack causes with EPERM

Recently we hit a case when a "sendto()" syscall on UDP socket from one of our applications was erroring with EPERM. This is pretty weird, and not documented in the man page. My colleague had no doubts:

Conntrack tales - one thousand and one flows

I’ll save you the gruesome details, but indeed, the full conntrack table will do that to your new UDP flows – you will get EPERM. Beware. Funnily enough, it’s possible to get EPERM if an outbound packet is dropped on OUTPUT firewall in other ways. For example:

marek:~$ sudo iptables -I OUTPUT -p udp --dport 53 --dst 192.0.2.8 -j DROP
marek:~$ strace -e trace=write nc -vu 192.0.2.8 53
write(3, "X", 1)                        = -1 EPERM (Operation not permitted)
+++ exited with 1 +++

If you ever receive EPERM from "sendto()", you might want to treat it as a transient error, if you suspect a filled conntrack problem, or permanent error if you blame iptables configuration.

This is also why we can’t send our SYN packets directly using SOCK_RAW sockets in our test. Let’s see what happens on conntrack overfill with standard "hping3" tool:

$ hping3 -S -i u10000 -c 10 --spoof 192.18.0.2 192.0.2.1 -p 80 -I lo
HPING 192.0.2.1 (lo 192.0.2.1): S set, 40 headers + 0 data bytes
[send_ip] sendto: Operation not permitted

"send()" even on a SOCK_RAW socket fails with EPERM when conntrack table is full.

Full conntrack can happen on a SYN flood

There is one more caveat. During a SYN flood, the conntrack entries will totally be created for the spoofed flows. Take a look at second test case we prepared, this time correctly listening on port 80, and sending SYN+ACK:

Conntrack tales - one thousand and one flows

We can see 7 SYN+ACK’s flying out of the port 80 listening socket. The final three SYN’s go nowhere as they are dropped by conntrack.

This has important implications. If you use conntrack on publicly accessible ports, during SYN flood mitigation technologies like SYN Cookies won’t help. You are still at risk of running out of conntrack space and therefore affecting legitimate connections.

For this reason, as a rule of thumb consider avoiding conntrack on inbound connections (-j NOTRACK). Alternatively having some reasonable rate limits on iptables layer, doing "-j DROP". This will work well and won’t create new flows, as we discussed above. The best method though, would be to trigger SYN Cookies from a layer before conntrack, like XDP. But this is a subject for another time.

Summary

Over the years Linux conntrack has gone through many changes and has improved a lot. While performance used to be a major concern, these days it’s considered to be very fast. Dark corners remain. Correctly applying conntrack is tricky.

In this blog post we showed how it’s possible to test parts of conntrack with "unshare" and a series of scripts. We showed the behaviour when the conntrack table gets filled – packets might implicitly be dropped. Finally, we mentioned the curious case of SYN floods where incorrectly applied conntrack may cause harm.

Stay tuned for more horror stories as we dig deeper and deeper into the Linux networking stack guts.

On the shoulders of giants: recent changes in Internet traffic

Post Syndicated from Louis Poinsignon original https://blog.cloudflare.com/on-the-shoulders-of-giants-recent-changes-in-internet-traffic/

On the shoulders of giants: recent changes in Internet traffic

As the COVID-19 emergency continues and an increasing number of cities and countries are establishing quarantines or cordons sanitaire, the Internet has become, for many, the primary method to keep in touch with their friends and families. And it’s a vital motor of the global economy as many companies have employees who are now working from home.

Traffic towards video conferencing, streaming services and news, e-commerce websites has surged. We’ve seen growth in traffic from residential broadband networks, and a slowing of traffic from businesses and universities.

The Cloudflare team is fully operational and the Network Operating Center (NOC) is watching the changing traffic patterns in the more than 200 cities in which we operate hardware.

Big changes in Internet traffic aren’t unusual. They often occur around large sporting events like the Olympics or World Cup, cultural events like the Eurovision Song Contest and even during Ramadan at the breaking of the fast each day.

The Internet was built to cope with an ever changing environment. In fact, it was literally created, tested, debugged and designed to deal with changing load patterns.

Over the last few weeks, the Cloudflare Network team has noticed some new patterns and we wanted to share a few of them with you.

Entire countries are watching their leaders

Last Friday evening, the US President announced a State of Emergency in the United States. Not so long after, our US data centers served 20% more traffic than usual. The red line shows Friday, the grey lines the preceding days for comparison.

On the shoulders of giants: recent changes in Internet traffic

On the Sunday, March 15, the Dutch government announced on the radio at 1730 local time closures of the non-essential business (1630 UTC). A sharp dip in the regular Sunday traffic followed:

On the shoulders of giants: recent changes in Internet traffic

The French president made two national announcements, on March 12 (pink curve) and March 16 (red curve) at 2000 local time (1900 UTC). The lockdown announcement on March 16 caused French traffic to dip by half followed by a spike:

On the shoulders of giants: recent changes in Internet traffic

Evolution of traffic in quarantine

Italy has seen a 20-40% increase in daily traffic since the lockdown:

On the shoulders of giants: recent changes in Internet traffic

With universities closing, some national research networks are remaining (almost) as quiet as a weekend (in purple). Current day in red and previous days in grey (overlaps with previous week):

On the shoulders of giants: recent changes in Internet traffic

The Internet Exchange Points, a key part of the Internet infrastructure, where Internet service providers and content providers can exchange data directly (rather than via a third party) have also seen spikes in traffic. Many provide public traffic graphs.

In Amsterdam (AMS-IX), London (LINX) and Frankfurt (DE-CIX), around 10-20% increase is seen around March 9th:

On the shoulders of giants: recent changes in Internet traffic
On the shoulders of giants: recent changes in Internet traffic

In Milan (MXP-IX), the Exchange point shows a 40% increase on Wednesday, 9th of March 2020, the day of the quarantine:

On the shoulders of giants: recent changes in Internet traffic

In Asia, in Hong Kong (HKIX), we can observe a faster increase since the end of January which likely corresponds to the Hubei lockdown on the January 23:

On the shoulders of giants: recent changes in Internet traffic

The emergency has a non-negligible impact on Internet services and our lives. Although it is difficult to quantify exactly the increase, we observe numbers from 10% to 40% depending on the region and the state of government action in those regions.

Even though from time to time individual services, such as a web site or an app, have outages the core of the Internet is robust. Traffic is shifting from corporate and university networks to residential broadband, but the Internet was designed for change.

Check back on the Cloudflare blog for further updates and insights.

Announcing Network Analytics

Post Syndicated from Omer Yoachimik original https://blog.cloudflare.com/announcing-network-analytics/

Our Analytics Platform

Announcing Network Analytics

Back in March 2019, we released Firewall Analytics which provides insights into HTTP security events across all of Cloudflare’s protection suite; Firewall rule matches, HTTP DDoS Attacks, Site Security Level which harnesses Cloudflare’s threat intelligence, and more. It helps customers tailor their security configurations more effectively. The initial release was for Enterprise customers, however we believe that everyone should have access to powerful tools, not just large enterprises, and so in December 2019 we extended those same enterprise-level analytics to our Business and Pro customers.

Announcing Network Analytics
Source: https://imgflip.com/memegenerator

Since then, we’ve built on top of our analytics platform; improved the usability, added more functionality and extended it to additional Cloudflare services in the form of Account Analytics, DNS Analytics, Load Balancing Analytics, Monitoring Analytics and more.

Our entire analytics platform harnesses the powerful GraphQL framework which is also available to customers that want to build, export and share their own custom reports and dashboards.

Extending Visibility From L7 To L3

Until recently, all of our dashboards were mostly HTTP-oriented and provided visibility into HTTP attributes such as the user agent, hosts, cached resources, etc. This is valuable to customers that use Cloudflare to protect and accelerate HTTP applications, mobile apps, or similar. We’re able to provide them visibility into the application layer (Layer 7 in the OSI model) because we proxy their traffic at L7.

Announcing Network Analytics
DDoS Protection for Layer 3-7

However with Magic Transit, we don’t proxy traffic at L7 but rather route it at L3 (network layer). Using BGP Anycast, customer traffic is routed to the closest point of presence of Cloudflare’s network edge where it is filtered by customer-defined network firewall rules and automatic DDoS mitigation systems. Clean traffic is then routed via dynamic GRE Anycast tunnels to the customer’s data-centers. Routing at L3 means that we have limited visibility into the higher layers. So in order to provide Magic Transit customers visibility into traffic and attacks, we needed to extend our analytics platform to the packet-layer.

Announcing Network Analytics
Magic Transit Traffic Flow

On January 16, 2020, we released the Network Analytics dashboard for Magic Transit customers and Bring Your Own IP (BYOIP) customers. This packet and bit oriented dashboard provides near real-time visibility into network- and transport-layer traffic patterns and DDoS attacks that are blocked at the Cloudflare edge in over 200 cities around the world.

Announcing Network Analytics
Network Analytics – Packets & Bits Over Time

Analytics For A Year

The way we’ve architected the analytics data-stores enables us to provide our customers one year’s worth of insights. Traffic is sampled at the edge data-centers. From those samples we structure IP flow logs which are similar to SFlow and bucket one minute’s worth of traffic, grouped by destination IP, port and protocol. IP flows includes multiple packet attributes such as TCP flags, source IPs and ports, Cloudflare data-center, etc. The source IP is considered PII data and is therefore only stored for 30 days, after which the source IPs are discarded and logs are rolled up into one hour groups, and then one day groups. The one hour roll-ups are stored for 6 months and the one day roll-ups for 1 year.

Similarly, attack logs are also stored efficiently. Attacks are stored as summaries with start/end timestamps, min/max/average/total bits and packets per second, attack type, action taken and more. A DDoS attack could easily consist of billions of packets which could impact performance due to the number of read/write calls to the data-store. By storing attacks as summary logs, we’re able to overcome these challenges and therefore provide attack logs for up to 1 year back.

Network Analytics via GraphQL API

We built this dashboard on the same analytics platform, meaning that our packet-level analytics are also available by GraphQL. As an example, below is an attack report query that would show the top attacker IPs, the data-center cities and countries where the attack was observed, the IP version distribution, the ASNs that were used by the attackers and the ports. The query is done at the account level, meaning it would provide a report for all of your IP ranges. In order to narrow the report down to a specific destination IP or port range, you can simply add additional filters. The same filters also exist in the UI.

{
  viewer {
    accounts(filter: { accountTag: $accountTag }) {
      topNPorts: ipFlows1mGroups(
        limit: 5
        filter: $portFilter
        orderBy: [sum_packets_DESC]
      ) {
        sum {
          count: packets
          __typename
        }
        dimensions {
          metric: sourcePort
          ipProtocol
          __typename
        }
        __typename
      }
      topNASN: ipFlows1mGroups(
        limit: 5
        filter: $filter
        orderBy: [sum_packets_DESC]
      ) {
        sum {
          count: packets
          __typename
        }
        dimensions {
          metric: sourceIPAsn
          description: sourceIPASNDescription
          __typename
        }
        __typename
      }
      topNIPs: ipFlows1mGroups(
        limit: 5
        filter: $filter
        orderBy: [sum_packets_DESC]
      ) {
        sum {
          count: packets
          __typename
        }
        dimensions {
          metric: sourceIP
          __typename
        }
        __typename
      }
      topNColos: ipFlows1mGroups(
        limit: 10
        filter: $filter
        orderBy: [sum_packets_DESC]
      ) {
        sum {
          count: packets
          __typename
        }
        dimensions {
          metric: coloCity
          coloCode
          __typename
        }
        __typename
      }
      topNCountries: ipFlows1mGroups(
        limit: 10
        filter: $filter
        orderBy: [sum_packets_DESC]
      ) {
        sum {
          count: packets
          __typename
        }
        dimensions {
          metric: coloCountry
          __typename
        }
        __typename
      }
      topNIPVersions: ipFlows1mGroups(
        limit: 2
        filter: $filter
        orderBy: [sum_packets_DESC]
      ) {
        sum {
          count: packets
          __typename
        }
        dimensions {
          metric: ipVersion
          __typename
        }
        __typename
      }
      __typename
    }
    __typename
  }
}
Attack Report Query Example

After running the query using Altair GraphQL Client, the response is returned in a JSON format:

Announcing Network Analytics

What Do Customers Want?

As part of our product definition and design research stages, we interviewed internal customer-facing teams including Customer Support, Solution Engineering and more. I consider these stakeholders as super-user-aggregators because they’re customer-facing teams and are constantly engaging and helping our users. After the internal research phase, we expanded externally to customers and prospects; particularly network and security engineers and leaders. We wanted to know how they expect the dashboard to fit in their work-flow, what are their use cases and how we can tailor the dashboard to their needs. Long story short, we identified two main use cases: Incident Response and Reporting. Let’s go into each of these use cases in more detail.

Incident Response

I started off by asking them a simple question – “what do you do when you’re paged?” We wanted to better understand their incident response process; specifically, how they’d expect to use this dashboard when responding to an incident and what are the metrics that matter to them to help them make quick calculated decisions.

You’ve Just Been Paged

Let’s say that you’re a security operations engineer. It’s Black Friday. You’re on call. You’ve just been paged. Traffic levels to one of your data-centers has exceeded a safe threshold. Boom. What do you do? Responding quickly and resolving the issue as soon as possible is key.

If your workflows are similar to our customers’, then your objective is to resolve the page as soon as possible. However, before you can resolve it, you need to determine if there is any action that you need to take. For instance, is this a legitimate rise in traffic from excited Black Friday shoppers, perhaps a new game release or maybe an attack that hasn’t been mitigated? Do you need to shift traffic to another data-center or are levels still stable? Our customers tell us that these are the metrics that matter the most:

  1. Top Destination IP and port – helps understand what services are being impacted
  2. Top source IPs, port, ASN, data-center and data-center Country – helps identify the source of the traffic
  3. Real-time packet and bit rates – helps understand the traffic levels
  4. Protocol distribution – helps understand what type of traffic is abnormal
  5. TCP flag distribution – an abnormal distribution could indicate an attack
  6. Attack Log – shows what types of traffic is being dropped/rate-limited
Announcing Network Analytics
Customizable DDoS Attack Log

As network and transport layer attacks can be highly distributed and the packet attributes can be easily spoofed, it’s usually not practical to block IPs. Instead, the dashboard enables you to quickly identify patterns such as an increased frequency of a specific TCP flag or increased traffic from a specific country. Identifying these patterns brings you one step closer to resolving the issue. After you’ve identified the patterns, packet-level filtering can be applied to drop or rate-limit the malicious traffic. If the attack was automatically mitigated by Cloudflare’s systems, you’ll be able to see it immediately along with the attack attributes in the activity log. By filtering by the Attack ID, the entire dashboard becomes your attack report.

Announcing Network Analytics
Packet/Bit Distribution by Source & Destination
Announcing Network Analytics
TCP Flag Distribution

Reporting

During our interviews with security and network engineers, we also asked them what metrics and insights they need when creating reports for their managers, C-levels, colleagues and providing evidence to law-enforcement agencies. After all, processing data and creating reports can consume over a third (36%) of a security team’s time (~3 hours a day) and is also one of the most frequent DDoS asks by our customers.

Announcing Network Analytics
Add filters, select-time range, print and share

On top of all of these customizable insights, we wanted to also provide a one-line summary that would reflect your recent activity. The one-liner is dynamic and changes based on your activity. It tells you whether you’re currently under attack, and how many attacks were blocked. If your CISO is asking for a security update, you can simply copy-paste it and convey the efficiency of the service:

Announcing Network Analytics
Dynamic Summary

Our customers say that they want to reflect the value of Cloudflare to their managers and peers:

  1. How much potential downtime and bandwidth did Cloudflare spare me?
  2. What are my top attacked IPs and ports?
  3. Where are the attacks coming from? What types and what are the trends?

The Secret To Creating Good Reports

What does everyone love? Cool Maps! The key to a good report is adding a map; showing where the attack came from. But given that packet attributes can be easily spoofed, including the source IP, it won’t do us any good to plot a map based on the locations of the source IPs. It would result in a spoofed source country and is therefore useless data. Instead, we decided to show the geographic distribution of packets and bits based on the Cloudflare data-center in which they were ingested. As opposed to legacy scrubbing center solutions with limited network infrastructures, Cloudflare has data-centers in more than 200 cities around the world. This enables us to provide precise geographic distribution with high confidence, making your reports accurate.

Announcing Network Analytics
Packet/Bit Distribution by geography: Data-center City & Country

Tailored For You

One of the main challenges both our customers and we struggle with is how to process and generate actionable insights from all of the data points. This is especially critical when responding to an incident. Under this assumption, we built this dashboard with the purpose of speeding up your reporting and investigation processes. By tailoring it to your needs, we hope to make you more efficient and make the most out of Cloudflare’s services. Got any feedback or questions? Post them below in the comments section.

If you’re an existing Magic Transit or BYOIP customer, then the dashboard is already available to you. Not a customer yet? Click here to learn more.

Magic Transit: Network functions at Cloudflare scale

Post Syndicated from Nick Wondra original https://blog.cloudflare.com/magic-transit-network-functions/

Magic Transit: Network functions at Cloudflare scale

Today we announced Cloudflare Magic Transit, which makes Cloudflare’s network available to any IP traffic on the Internet. Up until now, Cloudflare has primarily operated proxy services: our servers terminate HTTP, TCP, and UDP sessions with Internet users and pass that data through new sessions they create with origin servers. With Magic Transit, we are now also operating at the IP layer: in addition to terminating sessions, our servers are applying a suite of network functions (DoS mitigation, firewalling, routing, and so on) on a packet-by-packet basis.

Over the past nine years, we’ve built a robust, scalable global network that currently spans 193 cities in over 90 countries and is ever growing. All Cloudflare customers benefit from this scale thanks to two important techniques. The first is anycast networking. Cloudflare was an early adopter of anycast, using this routing technique to distribute Internet traffic across our data centers. It means that any data center can handle any customer’s traffic, and we can spin up new data centers without needing to acquire and provision new IP addresses. The second technique is homogeneous server architecture. Every server in each of our edge data centers is capable of running every task. We build our servers on commodity hardware, making it easy to quickly increase our processing capacity by adding new servers to existing data centers. Having no specialty hardware to depend on has also led us to develop an expertise in pushing the limits of what’s possible in networking using modern Linux kernel techniques.

Magic Transit is built on the same network using the same techniques, meaning our customers can now run their network functions at Cloudflare scale. Our fast, secure, reliable global edge becomes our customers’ edge. To explore how this works, let’s follow the journey of a packet from a user on the Internet to a Magic Transit customer’s network.

Putting our DoS mitigation to work… for you!

In the announcement blog post we describe an example deployment for Acme Corp. Let’s continue with this example here. When Acme brings their IP prefix 203.0.113.0/24 to Cloudflare, we start announcing that prefix to our transit providers, peers, and to Internet exchanges in each of our data centers around the globe. Additionally, Acme stops announcing the prefix to their own ISPs. This means that any IP packet on the Internet with a destination address within Acme’s prefix is delivered to a nearby Cloudflare data center, not to Acme’s router.

Let’s say I want to access Acme’s FTP server on 203.0.113.100 from my computer in Cloudflare’s office in Champaign, IL. My computer generates a TCP SYN packet with destination address 203.0.113.100 and sends it out to the Internet. Thanks to anycast, that packet ends up at Cloudflare’s data center in Chicago, which is the closest data center (in terms of Internet routing distance) to Champaign. The packet arrives on the data center’s router, which uses ECMP (Equal Cost Multi-Path) routing to select which server should handle the packet and dispatches the packet to the selected server.

Once at the server, the packet flows through our XDP- and iptables-based DoS detection and mitigation functions. If this TCP SYN packet were determined to be part of an attack, it would be dropped and that would be the end of it. Fortunately for me, the packet is permitted to pass.

So far, this looks exactly like any other traffic on Cloudflare’s network. Because of our expertise in running a global anycast network we’re able to attract Magic Transit customer traffic to every data center and apply the same DoS mitigation solution that has been protecting Cloudflare for years. Our DoS solution has handled some of the largest attacks ever recorded, including a 942Gbps SYN flood in 2018. Below is a screenshot of a recent SYN flood of 300M packets per second. Our architecture lets us scale to stop the largest attacks.

Magic Transit: Network functions at Cloudflare scale

Network namespaces for isolation and control

The above looked identical to how all other Cloudflare traffic is processed, but this is where the similarities end. For our other services, the TCP SYN packet would now be dispatched to a local proxy process (e.g. our nginx-based HTTP/S stack). For Magic Transit, we instead want to dynamically provision and apply customer-defined network functions like firewalls and routing. We needed a way to quickly spin up and configure these network functions while also providing inter-network isolation. For that, we turned to network namespaces.

Namespaces are a collection of Linux kernel features for creating lightweight virtual instances of system resources that can be shared among a group of processes. Namespaces are a fundamental building block for containerization in Linux. Notably, Docker is built on Linux namespaces. A network namespace is an isolated instance of the Linux network stack, including its own network interfaces (with their own eBPF hooks), routing tables, netfilter configuration, and so on. Network namespaces give us a low-cost mechanism to rapidly apply customer-defined network configurations in isolation, all with built-in Linux kernel features so there’s no performance hit from userspace packet forwarding or proxying.

When a new customer starts using Magic Transit, we create a brand new network namespace for that customer on every server across our edge network (did I mention that every server can run every task?). We built a daemon that runs on our servers and is responsible for managing these network namespaces and their configurations. This daemon is constantly reading configuration updates from Quicksilver, our globally distributed key-value store, and applying customer-defined configurations for firewalls, routing, etc, inside the customer’s namespace. For example, if Acme wants to provision a firewall rule to allow FTP traffic (TCP ports 20 and 21) to 203.0.113.100, that configuration is propagated globally through Quicksilver and the Magic Transit daemon applies the firewall rule by adding an nftables rule to the Acme customer namespace:

# Apply nftables rule inside Acme’s namespace
$ sudo ip netns exec acme_namespace nft add rule inet filter prerouting ip daddr 203.0.113.100 tcp dport 20-21 accept

Getting the customer’s traffic to their network namespace requires a little routing configuration in the default network namespace. When a network namespace is created, a pair of virtual ethernet (veth) interfaces is also created: one in the default namespace and one in the newly created namespace. This interface pair creates a “virtual wire” for delivering network traffic into and out of the new network namespace. In the default network namespace, we maintain a routing table that forwards Magic Transit customer IP prefixes to the veths corresponding to those customers’ namespaces. We use iptables to mark the packets that are destined for Magic Transit customer prefixes, and we have a routing rule that specifies that these specially marked packets should use the Magic Transit routing table.

(Why go to the trouble of marking packets in iptables and maintaining a separate routing table? Isolation. By keeping Magic Transit routing configurations separate we reduce the risk of accidentally modifying the default routing table in a way that affects how non-Magic Transit traffic flows through our edge.)

Magic Transit: Network functions at Cloudflare scale

Network namespaces provide a lightweight environment where a Magic Transit customer can run and manage network functions in isolation, letting us put full control in the customer’s hands.

GRE + anycast = magic

After passing through the edge network functions, the TCP SYN packet is finally ready to be delivered back to the customer’s network infrastructure. Because Acme Corp. does not have a network footprint in a colocation facility with Cloudflare, we need to deliver their network traffic over the public Internet.

This poses a problem. The destination address of the TCP SYN packet is 203.0.113.100, but the only network announcing the IP prefix 203.0.113.0/24 on the Internet is Cloudflare. This means that we can’t simply forward this packet out to the Internet—it will boomerang right back to us! In order to deliver this packet to Acme we need to use a technique called tunneling.

Tunneling is a method of carrying traffic from one network over another network. In our case, it involves encapsulating Acme’s IP packets inside of IP packets that can be delivered to Acme’s router over the Internet. There are a number of common tunneling protocols, but Generic Routing Encapsulation (GRE) is often used for its simplicity and widespread vendor support.

GRE tunnel endpoints are configured both on Cloudflare’s servers (inside of Acme’s network namespace) and on Acme’s router. Cloudflare servers then encapsulate IP packets destined for 203.0.113.0/24 inside of IP packets destined for a publicly-routable IP address for Acme’s router, which decapsulates the packets and emits them into Acme’s internal network.

Magic Transit: Network functions at Cloudflare scale

Now, I’ve omitted an important detail in the diagram above: the IP address of Cloudflare’s side of the GRE tunnel. Configuring a GRE tunnel requires specifying an IP address for each side, and the outer IP header for packets sent over the tunnel must use these specific addresses. But Cloudflare has thousands of servers, each of which may need to deliver packets to the customer through a tunnel. So how many Cloudflare IP addresses (and GRE tunnels) does the customer need to talk to? The answer: just one, thanks to the magic of anycast.

Cloudflare uses anycast IP addresses for our GRE tunnel endpoints, meaning that any server in any data center is capable of encapsulating and decapsulating packets for the same GRE tunnel. How is this possible? Isn’t a tunnel a point-to-point link? The GRE protocol itself is stateless—each packet is processed independently and without requiring any negotiation or coordination between tunnel endpoints. While the tunnel is technically bound to an IP address it need not be bound to a specific device. Any device that can strip off the outer headers and then route the inner packet can handle any GRE packet sent over the tunnel. Actually, in the context of anycast the term “tunnel” is misleading since it implies a link between two fixed points. With Cloudflare’s Anycast GRE, a single “tunnel” gives you a conduit to every server in every data center on Cloudflare’s global edge.

Magic Transit: Network functions at Cloudflare scale

One very powerful consequence of Anycast GRE is that it eliminates single points of failure. Traditionally, GRE-over-Internet can be problematic because an Internet outage between the two GRE endpoints fully breaks the “tunnel”. This means reliable data delivery requires going through the headache of setting up and maintaining redundant GRE tunnels terminating at different physical sites and rerouting traffic when one of the tunnels breaks. But because Cloudflare is encapsulating and delivering customer traffic from every server in every data center, there is no single “tunnel” to break. This means Magic Transit customers can enjoy the redundancy and reliability of terminating tunnels at multiple physical sites while only setting up and maintaining a single GRE endpoint, making their jobs simpler.

Our scale is now your scale

Magic Transit is a powerful new way to deploy network functions at scale. We’re not just giving you a virtual instance, we’re giving you a global virtual edge. Magic Transit takes the hardware appliances you would typically rack in your on-prem network and distributes them across every server in every data center in Cloudflare’s network. This gives you access to our global anycast network, our fleet of servers capable of running your tasks, and our engineering expertise building fast, reliable, secure networks. Our scale is now your scale.

The Network is the Computer: A Conversation with John Gage

Post Syndicated from John Graham-Cumming original https://blog.cloudflare.com/john-gage/

The Network is the Computer: A Conversation with John Gage

The Network is the Computer: A Conversation with John Gage

To learn more about the origins of The Network is the Computer®, I spoke with John Gage, the creator of the phrase and the 21st employee of Sun Microsystems. John had a key role in shaping the vision of Sun and had a lot to share about his vision for the future. Listen to our conversation here and read the full transcript below.


[00:00:13]

John Graham-Cumming: I’m talking to John Gage who was what, the 21st employee of Sun Microsystems, which is what Wikipedia claims and it also claims that you created this phrase “The Network is the Computer,” and that’s actually one of the things I want to talk about with you a little bit because I remember when I was in Silicon Valley seeing that slogan plastered about the place and not quite understanding what it meant. So do you want to tell me what you meant by it or what Sun meant by it at the time?

[00:00:40]

John Gage: Well, in 2019, recalling what it meant in 1982 or 83’ will be colored by all our experience since then but at the time it seemed so obvious that when we introduced the first scientific workstations, they were not very powerful computers. The first Suns had a giant screen and they were on the Internet but they were designed as a complementary component to supercomputers. Bill Joy and I had a series of diagrams for talks we’d give, and Bill had the bi-modal, the two node picture. The serious computing occurred on the giant machines where you could fly into the heart of a black hole and the human interface was the workstation across the network. So each had to complement the other, each built on the strengths of the other, and each enhanced the other because to deal in those days with a supercomputer was very ugly. And to run all your very large computations, you could run them on a Sun because we had virtual memory and series of such advanced things but not fast. So the speed of scientific understanding is deeply affected by the tools the scientist has — is it a microscope, is it an optical telescope, is it a view into the heart of a star by running a simulation on a supercomputer? You need to have the loop with the human and the science constantly interacting and constantly modifying each other, and that’s what the network is for, to tie those different nodes together in as seamless a way as possible. Then, the instant anyone that’s ever created a programming language says, “so if I have to create a syntax of this where I’m trying to let you express, do this, how about the delay on the network, the latency”? Does your phrase “The Network is the Computer” really capture this hundreds, thousands, tens of thousands, millions perhaps at that time, now billions and billions and billions today, all these devices interacting and exchanging state with latency, with delay. It’s sort of an oversimplification, and that we would point out, but it’s just network is the computer. Four words, you know, what we tried to do is give a metaphor that allows you to explore it in your mind and think of new things to do and be inspired.

[00:03:35]

Graham-Cumming: And then by a sort of strange sequence of events, that was a trademark of Sun. It got abandoned. And now Cloudflare has swooped in and trademarked it again. So now it’s our trademark which sort of brings us full circle, I suppose.

[00:03:51]

Gage: Well, trademarks are dealing with the real world, but the inspiration of Cloudflare is to do exactly what Bill Joy and I were talking about in 1982. It’s to build an environment in which every participant globally can share with security, and we were not as strong. Bill wrote most of the code of TCP/IP implemented by every other computer vendor, and still these questions of latency, these questions of distributed denial of service which was, how do you block that? I was so happy to see that Cloudflare invests real money and real people in addressing those kinds of critical problems, which are at the core, what will destroy the Internet.

[00:14:48]

Graham-Cumming: Yes, I agree. I mean, it is a significant investment to actually deal with it and what I think people don’t appreciate about the DDoS attack situation is that they are going on all the time and it’s just a continuous, you know, just depends who the target is. It’s funny you mentioned TCP/IP because about 10 years after, so in about ‘92, my first real job, I had to write a TCP/IP stack for an obscure network card. And this was prior to the Internet really being available everywhere. And so I didn’t realize I could go and get the BSD implementation and recompile it. So I did it from scratch from the RFCs.

[00:05:23]

Gage: You did!

[00:05:25]

Graham-Cumming: And the thing I recommend here is that nobody ever does that because, you know, the real world, real code that really interacts is really hard when you’re trying to work it with other things, so.

[00:05:36]

Gage: Do you still, John, do you have that code?

[00:05:42]

Graham-Cumming: I wonder. I have the binary for it.

[00:05:46]

Gage: Do hunt for it, because our story was at the time DARPA, the Defense Advanced Research Projects Agency, that had funded networking initiatives around the world. I just had a discussion yesterday with Norway and they were one of the first entities to implement using essentially Bill Joy’s code, but to be placed on the ARPANET. And a challenge went out, and at that time the slightly older generation, the Bolt Beranek and Newman Group, Vint Cerf, Bob Con, those names, as Vint Cerf was a grad student at UCLA where he had built one of the four first Internet sites and the DARPA offices were in Arlington, Virginia, they had massive investments in detection of nuclear underground tests, so seismological data, and the moment we made the very first Suns, I shipped them to DARPA, we got the network up and began serving seismic data globally. Really lovely visualization of events. If you’re trying to detect something, those things go off and then there’s a distinctive signature, a collapse of the underground cavern after. So DARPA had tried to implement, as you did, from the spec, from the RFC, the components, and Vint had designed a lot of this, all the acknowledgement codes and so forth that you had to implement in TCP/IP. So Bill, as a graduate student at Berkeley, we had a meeting in Arlington at DARPA headquarters where BBN and AT&T Bell Labs and a number of other people were in the room. Their code didn’t work, this graduate student from Berkeley named Bill Joy, his code did work, and when Bob Kahn and Vint Cerf asked Bill, “Well, so how did you do it?” What he said was exactly what you just said, he said, “I just read the spec and wrote the code.”

[00:08:12]

Graham-Cumming: I do remember very distinctly because the company I was working at didn’t have a TCP/IP stack and we didn’t have any IP machines, right, we were doing actually stuff that was all IBM networking, SMA stuff. Somehow we bought what was at that point a HP machine, it was an Apollo workstation and a Sun workstation. I had them on Ethernet and talking to each other. And I do distinctly remember the first time a ping packet came back from that Sun box, saying, yes I managed to send you an IP packet, you managed to send me ICMP response and that was pretty magical. And then I got to TCP and that was hard.

[00:08:55]

Gage: That was hard. Yeah. When you get down to the details, the spec can be wrong. I mean, it will want you to do something that’s a stupid thing to do. So Bill has such good taste in these things. It would be interesting to do a kind of a diff across the various implementations of the stack. Years and years later we had maybe 50 companies all assemble in a room, only engineers, throw out all the marketing people and all the Ps and VPs and every company in this room—IBM, Hewlett-Packard—oh my God, Hewlett-Packard, fix your TCP—and we just kept going until everybody could work with everybody else in sort of a pact. We’re not going to reveal, Honeywell, that you guys were great with earlier absolute assembly code, determinate time control stuff but you have no clue about how packets work, we’ll help you, so that all of us can make every machine interoperate, which yielded the network show, Interop. Every year we would go put a bunch of fiber inside whatever, you know, Geneva, or pick some, Las Vegas, some big venue.

[00:10:30]

Graham-Cumming: I used to go to Vegas all the time and that was my great introduction to Vegas was going there for Interop, year after year.

[00:10:35]

Gage: Oh, you did! Oh, great.

[00:10:36]

Graham-Cumming: Yes, yes, yes.

[00:10:39]

Gage: You know in a way, what you’re doing with, for example, just last week with the Verizon problem, everybody implementing what you’re doing now that is not open about their mistakes and what they’ve learned and is not sharing this, it’s a problem. And your global presence to me is another absolutely critical thing. We had about, I forget, 600 engineers in Beijing at the East Gate of Tsinghua a lot of networking expertise and lots of those people are at Tencent and Huawei and those network providers throughout the rest of the world, politics comes and goes but the engineering has to be done in a way that protects us. And so these conversations globally are critical.

[00:11:33]

Graham-Cumming: Yes, that’s one of the things that’s fascinating actually about doing real things on the real Internet is there is a global community of people making computers talk to each other and you know, that it’s a tremendously complicated thing to actually make that work, and you do it across countries, across languages. But you end up actually making them work, and that’s the Internet we’re sitting on, that you and I are talking on right now that is based on those conversations around the world.

[00:12:01]

Gage: And only by doing it do you understand more deeply how to do it. It’s very difficult in the abstract to say what should happen as we begin to spread. As Sun grew, every major city in Africa had installations and for network access, you were totally dependent on an often very corrupt national telco or the complications dealing with these people just to make your packet smooth. And as it turned out, many of the intelligence and military entities in all of these countries had very little understanding of any of this. That’s changed to some degree. But the dangerous sides of the Internet. Total surveillance, IPv6, complete control of exact identity of origins of packets. We implemented, let’s see, you had an early Sun. We probably completed our IPv6 implementation, was it still fluid in the 90s, but I remember 10 years after we finished a complete implementation of IPv6, the U.S. was still IPv4, it’s still IPv4.

[00:13:25]

Graham-Cumming: It still is, it still is. Pretty much. Except for the mobile carriers right now. I think in general the mobile phone operators are the ones who’ve gone more into IPv6 than anybody else.

[00:13:37]

Gage: It was remarkable in China. We used to have a conference. We’d bring a thousand Chinese universities into a room. Professor Wu from Tsinghua who built the Chinese Education and Research Network, CERNET. And now a thousand universities have a building on campus doing Internet research. We would get up and show this map of China and he kept his head down politically, but he managed at the point when there was a big fight between the Minister of Telecom and the Minister of Railways. The Minister of Railways said, look, I have continuity throughout China because I have railines. I’ve just made a partnership with the People’s Liberation Army, and they are essentially slave labor, and they’re going to dig the ditches, and I’m going to run fiber alongside the railways and I don’t care what you, the Minister of Telecommunications, has to say about it, because I own the territory. And that created a separate pathway for the backbone IPv6 network in China. Cheap, cheap, cheap, get everybody doing things.

[00:14:45]

Graham-Cumming: Yes, now of course in China that’s resulted in an interesting situation where you have China Telecom and China Unicom, who sort of cooperate with each other but they’re almost rivals which makes IP packets quite difficult to route inside China.

[00:14:58]

Gage: Yes exactly. At one point I think we had four hunks of China. Everyone was geographically divided. You know there were meetings going on, I remember the moment they merged the telecom ministry with the electronics ministry and since we were working with both of them, I walk in a room and there’s a third group, people I didn’t know, it turns out that’s the People’s Liberation Army.

[00:15:32]

Graham-Cumming: Yes, they’re part of the team. So okay, going back to this “Network is the Computer” notion. So you were talking about the initial things that you were doing around that, why is it that it’s okay that Cloudflare has gone out and trademarked that phrase now, because you seem to think that we’ve got a leg to stand on, I guess.

[00:15:56]

Gage: Frankly, I’d only vaguely heard of Cloudflare. I’ve been working in areas, I’ve got a project in the middle of Nairobi in the slum where I’ve spent the last 15 years or so learning a lot about clean water and sewage treatment because we have almost 400,000 people in a very small area, biggest slum in East Africa. How can you introduce sanitary water and clean sewage treatment into a very, an often corrupt, a very difficult environment, and so that’s been a fascination of mine and I’ve been spending a lot of time. What’s a computer person know about fluid dynamics and pathogens? There’s a lot to learn. So as you guys grew so rapidly, I vaguely knew of you but until I started reading your blog about post-quantum crypto and how do we devise a network in these resilient denial of service attacks and all these areas where you’re a growing company, it’s very hard to take time to do serious advanced research-level work on distributed computing and distributed security, and yet you guys are doing it. When Bill created Java, the subsequent step from Java for billions and billions of devices to share resources and share computations was something we call Genie which is a framework for validation of who you are, movement of code from device to device in a secure way, total memory control so that someone is not capable of taking over memory in your device as we’ve seen with Spectre and the failures of these billions of Intel chips out there that all have a flaw on take all branches parallel compute implementations. So the very hardware you’re using can be insecure so your operating systems are insecure, the hardware is insecure, and yet you’re trying to build on top with fallible pieces in infallible systems. And you’re in the middle of this, John, which I’m so impressed by.

[00:18:13]

Graham-Cumming: And Jini sort of lives on as called Apache River now. It moved away from Sun and into an Apache project.

[00:18:21]

Gage: Yes, very few people seem to realize that the name Apache is a poetic phrasing of “a patchy system.” We patch everything because everything is broken. We moved a lot of it, Brian Behlendorf and the Apache group. Well, many of the innovations at Sun, Java is one, file systems that are far more secure and far more resilient than older file systems, the SPARC  implementation, I think the SPARC processor, even though you’re using the new ARM processors, but Fujitsu, I still think keeps the SPARC architecture as the world’s fastest microprocessor.  

[00:19:16]

Graham-Cumming: Right. Yes. Being British of course, ARM is a great British success. So I’m honor-bound to use that particular architecture. Clearly.

[00:19:25]

Gage: Oh, absolutely. And the power. That was the one always in a list of what our engineering goals are. We wanted to make, we were building supercomputers, we were building very large file servers for the telcos and the banks and the intelligence agencies and all these different people, but we always wanted to make a low power and it just fell off the list of what you could accomplish and the ARM chips, their ratios of wattage to packets treated are—you have a great metric on your website someplace about measuring these things at a very low level—that’s key.

[00:20:13]

Graham-Cumming: Yes, and we had Sophie Wilson, who of course is one of the founders of ARM and actually worked on the original chip, tell this wonderful story at our Internet Summit about how the first chip they hooked up was operating fine until they realized they hadn’t hooked the power up and they were asked to. It was so low power that it was able to use the power that was coming in over the logic lines to actually power the whole chip. And they said to me, wait a minute, we haven’t plugged the power in but the thing is running, which was really, I mean that was an amazing achievement to have done that.

[00:20:50]

Gage: That’s amazing. We open sourced SPARC, the instruction set, so that anybody doing crypto that also had Fab capabilities could implement detection of ones and zeroes, sheep and goats, or other kinds of algorithms that are necessary for very high speed crypto. And that’s another aspect that I’m so impressed by Cloudflare. Cloudflare is paying attention at a machine instruction level because you’re implementing with your own hardware packages in what, 180 cities? You’re moving logistically a package into Ulan Bator, or into Mombasa and you’re coming up live.

[00:21:38]

Graham-Cumming: And we need that to be inexpensive and fast because we’re promising people that we will make their Internet properties faster and secure at the same time and that’s one of the interesting challenges which is not trading those two things off. Which means your crypto better be fast, for example, and that requires a lot of fiddling around at the hardware level and understanding it. In our case because we’re using Intel, really what Intel chips are doing at the low level.

[00:22:10]

Gage: Intel did implement a couple of things in one or another of the more recent chips that were very useful for crypto. We had a group of the SPARC engineers, probably 30, at a dinner five or six months ago discussing, yes, we set the world standard for parallel execution branching optimizations for pipelines and chips, and when the overall design is not matched by an implementation that pays attention to protecting the memory, it’s a fundamental, exploitable flaw. So a lot of discussion about this. Selecting precisely which instructions are the most important, the risk analysis with the ability to make a chip specifically to implement a particular algorithm, there’s a lot more to go. We have multiples of performance ahead of us for specific algorithms based on a more fluid way to add instructions that are necessary into a specific piece of hardware. And then we jump to quantum. Oh my.

[00:23:32]

Graham-Cumming: Yes. To talk about that a little bit, the ever-increasing speed of processors and the things we can do; Do you think we actually need that given that we’re now living in this incredibly distributed world where we are actually now running very distributed algorithms and do we really need beefier machines?

[00:23:49]

Gage: At this moment, in a way, it’s you making fun of Bill Joy for only wanting a megabit in Aspen. When Steve Jobs started NeXT, sadly his hardware was just terrible, so we sent a group over to boost NeXT. In fact we sort of secretly slipped him $30 million to keep him afloat. And I’d say, “Jobs, if you really understood something about hardware, it would really be useful here.” So one of the main team members that we sent over to NeXT came to live in Aspen and ended up networking the entire valley. At a point, megabit for what you needed to do, seemed reasonable, so at this moment, as things become alive by the introduction of a little bit of intelligence in them, some little flickering chip that’s able to execute an algorithm, many tasks don’t require. If you really want to factor things fast, quantum, quantum. Which will destroy our existing crypto systems. But if you are just bringing the billions of places where a little bit of knowledge can alter locally a little bit of performance, we could do very well with the compute power that we have right now. But making it live on the network, securely, that’s the key part. The attacks that are going on, simple errors as you had yesterday, are simple errors. In a way, across Cloudflare’s network, you’re watching the challenges of the 21st century take place: attacks, obscure, unknown exploits of devices in the power and water control systems. And so, you are in exactly the right spot to not get much sleep and feel a heavy responsibility.

[00:26:20]

Graham-Cumming: Well it certainly felt like it yesterday when we were offline for 27 minutes, and that’s when we suddenly discovered, we sort of know how many customers we have, and then we really discover when they start phoning us. Our support line had his own DDoS basically where it didn’t work anymore because so many people signed in. But yes, I think that it’s interesting your point about a little bit extra on a device somewhere can do something quite magical and then you link it up to the network and you can do a lot. What we think is going on partly is some things around AI, where large amounts of machine learning are happening on big beefy machines, perhaps in the cloud, perhaps groups of machines, and then devices are doing their own little bits of inference or recognizing faces and stuff like that. And that seems to be an interesting future where we have these devices that are actually intelligent in our pockets.

[00:27:17]

Gage: Oh, I think that’s exactly right. There’s so much power in your pocket. I’m spending a lot of time trying to catch up that little bit of mathematics that you thought you understood so many years ago and it turns out, oh my, I need a little bit of work here. And I’ve been reading Michael Jordan’s papers and watching his talks and he’s the most cited computer scientist in machine learning and he will always say, “Be very careful about the use of the phrase, ‘Artificial Intelligence’.” Maybe it’s a metaphor like “The Network is the Computer.” But, we’re doing gradient descent optimization. Is the slope going up, or is the slope going down? That’s not smart. It’s useful and the real time language translation and a lot of incredible work can occur when you’re doing phrases. There’s a lot of great pattern work you can do, but he’s out in space essentially combining differentiation and integration in a form of integral. And off we go. Are your hessians rippling in the wind? And what’s the shape of this slope? And is this actually the fastest path from here to there to constantly go downhill. Maybe it’s sometimes going uphill and going over and then downhill that’s faster. So there’s just a huge amount of new mathematics coming in this territory and each time, as we move from 2G to 3G to 4G to 5G, many people don’t appreciate that the compression algorithms changed between 2G, 3G, 4G and 5G and as a result, so much more can move into your mobile device for the same amount of power. 10 or 20 times more for the same about of power. And mathematics leads to insights and applications of it. And you have a working group in that area, I think. I tried to probe around to see if you’re hiring.

[00:30:00]

Graham-Cumming: Well you could always just come around to just ask us because we’ll probably tell you because we tend to be fairly transparent. But yes, I mean compression is definitely an area where we are interested in doing things. One of the things I first worked on at Cloudflare was a thing that did differential compression based on the insight that web pages don’t actually change that much when you hit ‘refresh’. And so it turns out that if you if you compress based on the delta from the last thing you served to someone you can actually send many orders of magnitude less data and so there’s lots of interesting things you can do with that kind of insight to save a tremendous amount of bandwidth. And so yeah, definitely compression is interesting, crypto is interesting to us. We’ve actually open sourced some of our compression improvements in zlib which was very popular compression algorithm and now it’s been picked up. It turns out that in neuroscience, because there’s a tremendous amount of data which needs compression and there are pipelines used in neuroscience where actually having better compression algorithms makes you work a lot faster. So it’s fascinating to see the sort of overspill of things we’re doing into other areas where I know nothing about what goes on inside the brain.

[00:31:15]

Gage: Well isn’t that fascinating, John. I mean here you are, the CTO of Cloudflare working on a problem that deeply affects the Internet, enabling a lot more to move across the Internet in less time with less power, and suddenly it turns into a tool for brain modeling and neuroscientists. This is the benefit. There’s a terrific initiative. I’m at Berkeley. The Jupiter notebooks created by Fernando Perez, this environment in which you can write text and code and share things. That environment, taken up by machine learning. I think it’s a major change. And the implementation of diagrams that are causal. These forms of analysis of what caused what. These are useful across every discipline and for you to model traffic and see patterns emerge and find webpages and see the delta has changed and then intelligently change the pattern of traffic in response to it, it’s all pretty much the same thing here.

[00:32:53]

Graham-Cumming: Yes and then as a mathematician, when I see things that are the same thing, I can’t help wondering what the real deep structure is underneath. There must be another layer another layer down or something. So as you know it’s this thing. There’s some other deeper layer below all this stuff.  

[00:33:12]

Gage: I think this is just endlessly fascinating. So my only recommendations to Cloudflare: first, double what you’re doing. That’s so hard because as you go from 10 people to 100 people to 1,000 people to 10,000 people, it’s a different world. You are a prime example, you are global. Suddenly you’re able to deal with local authorities in 60-70 countries and deal with some of the world’s most interesting terrain and with network connectivity and moving data, surveillance, and some security of the foundation infrastructure of all countries. You couldn’t be engaged in more exciting things.

[00:34:10]

Graham-Cumming: It’s true. I mean one of the most interesting things to me is that I have grown up with the Internet when I you know I got an email address using actually the crazy JANET scheme in the UK where the DNS names were backwards. I was in Oxford and they gave me an email address and it was I think it was JGC at uk dot ac dot ox dot prg and that then at some point it flipped around and it went to DNS looked like it had won. For a long time my address was the wrong way around. I think that’s a typically British decision to be slightly different to everybody else.

[00:35:08]

Gage: Well, Oxford’s always had that style, that we’re going to do things differently. There’s an Oxford Center for the 21st century that was created by the money from a wonderful guy who had donated maybe $100 million. And they just branched out into every possible research area. But when you went to meetings, you would enter a building that was built at the time of the Raj. It was the India temple of colonialism.

[00:35:57]

Graham-Cumming: There’s quite a few of those in the UK. Are you thinking of the Martin School? James Martin. And he gave a lot of money to Oxford. Well the funny thing about that was the programming research group. The one thing they didn’t teach us really as an undergraduate was how to program which was one of the most fascinating things they have because that was a bit getting your hands dirty so you needed to let all the theory. So we learnt all the theory we did a little bit of functional programming and that was the extent of it which set me really up badly for a career in an industry. My first job I had to pretend I knew how to program and see and learn very quickly.

[00:36:42]

Gage: Oh my. Well now you’ve been writing code in Go.

[00:36:47]

Graham-Cumming: Yes. Well the thing about Go, the other Oxford thing of course is Tony Hoare, who is a professor of computer science there. He had come up with this thing called CSP (Communicating Sequential Processes) so that was a whole theory around how you do parallel execution. And so of course everybody used his formalism and I did in my doctoral thesis and so when Go came along and they said oh this how Go works, I said, well clearly that’s CSP and I know how to do this. So I can do it again.

[00:37:23]

Gage: Tony Hoare occasionally would issue a statement about something and it was always a moment. So few people seem to realize the birth of so much of what we took in the 60s, 70s, 80s, in Silicon Valley and Berkeley, derived from the Manchester Group, the virtual memory work, these innovations. Today, Whit Diffie. He used to love these Bletchley stories, they’re so far advanced. That generation has died off.

[00:38:37]

Graham-Cumming: There’s a very peculiar thing in computer science and the real application of computing which is that we both somehow sit on this great knowledge of the past of computing and at the same time we seem to willfully forget it and reinvent everything every few years. We go through these cycles where it’s like, let’s do centralized computing, now distributed computing. No, let’s have desktop PCs, now let’s have the cloud. We seem to have this collective amnesia and then on occasion people go, “Oh, Leslie Lamport wrote this thing in 1976 about this problem”. What other subject do we willfully forget the past and then have to go and doing archaeology to discover again?

[00:39:17]

Gage: As a sociological phenomenon it means that the older crowd in a company are depressing because they’ll say, “Oh we tried that and it didn’t work”. Over the years as Sun grew from 15 people or so and ended up being like 45,000 people before we were sold off to Oracle and then everybody dumped out because Oracle didn’t know too much about computing. So Ivan Sutherland, Whit Diffie. Ivan actually stayed on. He may actually still have an Oracle email. Almost all of the research groups, certainly the chip group went off to Intel, Fujitsu, Microsoft. It’s funny to think now that Microsoft’s run by a Sun person.

[00:40:19]

Graham-Cumming: Well that’s the same thing. Everyone’s forgotten that Microsoft was the evil empire not that long ago. And so now it’s not. Right now it’s cool again.

[00:40:28]

Gage: Well, all of the embedded stuff from Microsoft is still that legacy that Bill Gates who’s now doing wonderful things with the Gates Foundation. But the embedded insecurity of the global networks is due to, in large part, the insecurities, that horrible engineering of Microsoft embedded everywhere. You go anywhere in China to some old industrial facility and there is some old not updated junky PC running totally insecure software. And it’s controlling the grid. It’s discouraging. It’s like a lot of the SCADA systems.

[00:41:14]

Graham-Cumming: I’m completely terrified of SCADA systems.

[00:41:20]

Gage: The simplest exploits. I mean, it’s nothing even complicated. There are a series of emerging journalists today that are paying attention to cybersecurity and people have come out with books even very recently. Well, now because we’re in this China, US, Iran nightmare, a United States presidential directive taking the cybersecurity crowd and saying, oops, now you’re an offensive force. Which means we got some 20-year-old lieutenant somewhere who suddenly might just for fun turn off Tehran’s water supply or something. This is scary because the SCADA systems are embedded everywhere, and they’re, I don’t know, would you say totally insecure? Just the simple things, just simple exploits. One of the journalists described, I guess it was the Russians who took a bunch of small USB sticks and at a shopping center near a military base just gave them away. And people put them into their PCs inside SIPRNet, inside the secure U.S. Department of Defense network. Instantly the network was taken over just by inserting a USB device to something on the net. And there you are, John, protecting against this.

[00:43:00]

Graham-Cumming: Trying hard to protect against these things, yes absolutely. It’s very interesting because you mentioned before how rapidly Cloudflare had grown over the last few years. And of course Sun also really got going pretty rapidly, didn’t it?

[00:43:00]

Gage: Well, yes. The first year we were just some students from Berkeley, hardware from Stanford, Andy Bechtolsheim, software from Berkeley, Berkeley Unix BSD, Bill Joy. Combine the two, and 10 of us or so, and we were, I think the first year was 12 million booked, the second year was 50 or 60 million booked, and the third year was 150 or so million booked and then we hit 500 million and then we hit a billion. And now, it’s selling boxes, we were a manufacturing company so that’s different from software or services, but we also needed lots of people and so we instantly raided the immense benefit of variety of people in the San Francisco Bay Area, with Berkeley and Stanford. We had students in computer science, and mechanical engineering, and physics, and mathematics from every country in the world and we recruited from every country in the world. So a great part of Sun’s growth came, as you are, expanding internationally, and at one point I think we ran most of the telcos of the world, we ran China Mobile. 900 million subscribers on China Mobile, all Sun stuff in the back. Throughout Africa, every telco was running Sun and Cisco until Huawei knocked Cisco out. It was an amazing time.

[00:44:55]

Graham-Cumming: You ran the machine that ran Latek, that let me get my doctoral thesis done.

[00:45:01]

Gage: You know that’s how I got into it, actually. I was in econometrics and mathematics at Berkeley, and I walk down a hallway and outside a room was that funny smell from photographic paper from something, and there was perfectly typeset mathematics. Troff and nroff, all those old UNIX utilities for the Bell Technical Journal, and I open the door and I’ve got to get in there. There’s two hundred people sitting in front of these beehive-like little terminals all typing away on a UNIX system. And I want to get an account and I walk down the hall and there’s this skinny guy who types about 200 words a minute named Bill Joy. And I said, I need an account, I’ve got to type set integral signs, and he said, what’s your name. I tell him my name, John Gage, and he goes voop, and I’ve never seen anybody type as fast as him in my life. This is a new world, here.

[00:45:58]

Graham-Cumming: So he was rude then?

[00:46:01]

Gage: Yeah he was, he was. Well, it’s interesting since the arrival of a device at Berkeley to complement the arrival of an MIT professor who had implemented in LISP, mathematical, not typesetting of mathematics, but actual maxima. To get Professor fetman, maxima god from MIT, to come to Berkeley and live a UNIX environment, we had to put a LISP up outside on the PDP. So Bill took that machine which had virtual memory and implemented the environment for significant computational mathematics. And Steve Wolfram took that CalTech, and Princeton Institute for Advanced Studies, and now we have Mathematica. So in a way, all of Sun and the UNIX world derived from attempting to do executable mathematics.

[00:47:17]

Graham-Cumming: Which in some ways is what computers are doing. I think one of the things that people don’t really appreciate is the extent to which all numbers underneath.

[00:47:28]

Gage: Well that’s just this discrete versus continuous problem that Michael Jordan is attempting to address. To my current total puzzlement and complete ignorance, is what in the world is symplectic integration? And how do Lyapunov functions work? Oh, no clue.

[00:47:50]

Graham-Cumming: Are we going to do a second podcast on that? Are you going to come back and teach us?

[00:47:55]

Gage: Try it. We’re on, you’re on, you’re on. Absolutely. But you’ve got to run a company.

[00:48:00]

Graham-Cumming: Well I’ve got some things to do. Yeah. But you can go do that and come tell us about it.

[00:48:05]

Gage: All right, Great John. Well it was terrific to talk to you.

[00:48:08]

Graham-Cumming: So yes it was wonderful speaking to you as well. Thank you for helping me dig up memories of when I was first fooling around with Sun Systems and, you know, some of the early days and of course “The Network is the Computer,” I’m not sure I fully yet understand quite the metaphor or even if maybe I do somehow deeply in my soul get it, but we’re going to try and make it a reality, whatever it is.

[00:48:30]

Gage: Well, I count it as a complete success, because you count as one of our successes because you‘re doing what you’re doing, therefore the phrase, “The Network is the Computer,” resides in your brain and when you get up in the morning and decide what to do, a little bit nudges you toward making the network work.

[00:48:51]

Graham-Cumming: I think that’s probably true. And there’s the dog, the dog is saying you’ve been yakking for an hour and now we better stop. So listen, thank you so much for taking the time. It was wonderful talking to you. You have a good day. Thank you very much.


Interested in hearing more? Listen to my conversations with Ray Rothrock and Greg Papadopoulos of Sun Microsystems:

To learn more about Cloudflare Workers, check out the use cases below:

  • Optimizely – Optimizely chose Workers when updating their experimentation platform to provide faster responses from the edge and support more experiments for their customers.
  • Cordial – Cordial used a “stable of Workers” to do custom Black Friday load shedding as well as using it as a serverless platform for building scalable customer-facing services.
  • AO.com – AO.com used Workers to avoid significant code changes to their underlying platform when migrating from a legacy provider to a modern cloud backend.
  • Pwned Passwords – Troy Hunt’s popular “Have I Been Pwned” project benefits from cache hit ratios of 94% on its Pwned Passwords API due to Workers.
  • Timely – Using Workers and Workers KV, Timely was able to safely migrate application endpoints using simple value updates to a distributed key-value store.
  • Quintype – Quintype was an eager adopter of Workers to cache content they previously considered un-cacheable and improve the user experience of their publishing platform.

The Network is the Computer: A Conversation with Ray Rothrock

Post Syndicated from John Graham-Cumming original https://blog.cloudflare.com/ray-rothrock/

The Network is the Computer: A Conversation with Ray Rothrock

The Network is the Computer: A Conversation with Ray Rothrock

Last week I spoke with Ray Rothrock, former Director of CAD/CAM Marketing at Sun Microsystems, to discuss his time at Sun and how the Internet has evolved. In this conversation, Ray discusses the importance of trust as a principle, the growth of Sun in sales and marketing, and that time he gave Vice President Bush a Sun demo. Listen to our conversation here and read the full transcript below.

[00:00:07]

John Graham-Cumming: Here I am very lucky to get to talk with Ray Rothrock who was I think one of the first investors in Cloudflare, a Series A investor and got the company a little bit of money to get going, but if we dial back a few earlier years than that, he was also at Sun as the Director of CAD/CAM Marketing. There is a link between Sun and Cloudflare. At least one, but probably more than one, which is that Cloudflare has recently trademarked, “The Network is the Computer”. And that was a Sun trademark, wasn’t it?

[00:00:43]

Ray Rothrock: It was, yes.

[00:00:46]

Graham-Cumming: I talked to John Gage and I asked him about this as well and I asked him to explain to me what it meant. And I’m going to ask you the same thing because I remember walking around the Valley thinking, that sounds cool; I’m not sure I totally understand it. So perhaps you can tell me, was I right that it was cool, and what does it mean?

[00:01:06]

Rothrock: Well it certainly was cool and it was extraordinarily unique at the time. Just some quick background. In those early days when I was there, the whole concept of networking computers was brand new. Our competitor Apollo had a proprietary network but Sun chose to go with TCP/IP which was a standard at the time but a brand new standard that very few people know about right. So when we started connecting computers and doing some intensive computing which is what I was responsible for—CAD/CAM in those days was extremely intensive whether it was electrical CAD/camera, or mechanical CAD/CAM, or even simulation solid design modeling and things—having a little extra power from other computers was a big deal. And so this concept of “The Network is the Computer” essentially said that you had one window into the network through your desktop computer in those days—there was no mobile computing at that time, this was like 84’, 85’, 86’ I think. And so if you had the appropriate software you could use other people’s computers (for CPU power) and so you could do very hard problems at that single computer could not do because you could offload some of that CPU to the other computers. Now that was very nerdy, very engineering intensive, and not many people did it. We’d go to the SIGGRAPH, which was a huge graphics show in those days and we would demonstrate ten Sun computers for example, doing some graphic rendering of a 3D wireframe that had been created in the CAD/CAM software of some sort. And it was, it was hard, and that was in the mechanical side. On the electrical side, Berkeley had some software that was called Magic—it’s still around and is a very popular EDA software that’s been incorporated in those concepts. But to imagine calculating the paths in a very complicated PCB or a very complicated chip, one computer couldn’t do it, but Sun had the fundamental technology. So from my seat at Sun at the time, I had access to what could be infinite computing power, even though I had a single application running, and that was a big selling point for me when I was trying to convince EDA and MDA companies to put their software on the Sun. That was my job.

[00:03:38]

Graham-Cumming: And hearing it now, it doesn’t sound very revolutionary, because of course we’re all doing that now. I mean I get my phone out of my pocket and connect to goodness knows what computing power which does image recognition and spots faces and I can do all sorts of things. But walk me through what it felt like at the time.

[00:03:56]

Rothrock: Just doing a Google search, I mean, how many data stores are being spun up for that? At the time it was incredible, because you could actually do side by side comparisons. We created some demonstrations, where one computer might take ten hours to do a calculation, two computers might take three hours, five computers might take 30 minutes. So with this demo, you could turn on computers and we would go out on the TCP/IP network to look for an available CPU that could give me some time. Let’s go back even further. Probably 15 years before that, we had time sharing. So you had a terminal into a big mainframe and did all this swapping in and out of stuff to give you a time slice computing. We were doing the exact same thing except we were CPU slicing, not just time slicing. That’s pretty nerdy, but that’s what we did. And I had to work with the engineering department, with all these great engineers in those days, to make this work for a demo. It was so unique, you know, their eyes would get big. You remember Novell…

[00:05:37]

Graham-Cumming: I was literally just thinking about Novell because I actually worked on IPX and SPX networking stuff at the time. I was going to ask you actually, to what extent do you think TCP/IP was a very important part of this revolution?

[00:05:55]

Rothrock: It was huge. It was fundamentally huge because it was a standard, so it was available and if you implemented it, you didn’t have to pay for it. When Bob Metcalfe did Ethernet, it was on top of the TCP stack. Sun, in my memory, and I could be wrong, was the first company to put a TCP/IP stack on the computer. And so you just plugged in the back, an RJ45 into this TCP/IP network with a switch or a router on it and you were golden. They made it so simple and so cheap that you just did it. And of course if you give an engineer that kind of freedom and it opens up. By the way, as the marketing guy at Sun, this was my first non-engineering job. I came from a very technical world of nuclear physics into Sun. And so it was stunning, just stunning.

[00:06:59]

Graham-Cumming: It’s interesting that you mentioned Novell and then you mentioned Apollo before that and obviously IBM had SNA networking and there were attempts to do all those networking things. It’s interesting that these open standards have really enabled the explosion of everything else we’ve seen and with everything that’s going on in the Internet.

[00:07:23]

Rothrock: Sun was open, so to speak, but this concept of open source now that just dominates the conversation. As a venture capitalist, every deal I ever invested in had open source of some sort in it. There was a while when it was very problematic in an M&A event, but the world’s gotten used to it. So open, is very powerful. It’s like freedom. It’s like liberty. Like today, July 4th, it’s a big deal.

[00:07:52]

Graham-Cumming: Yes, absolutely. It’s just interesting to see it explode today because I spent a lot of my career looking at so many different networking protocols. The thing that really surprises me, or perhaps shouldn’t surprise me when you’ve got these open things, is that you harness so many people’s intelligence that you just end up with something that’s just better. It seems simple.

[00:08:15]

Rothrock: It seems simple. I think part of the magic of Sun is that they made it easy. Easy is the most powerful thing you can do in computing. Computing can be so nerdy and so difficult. But if you just make it easy, and Cloudflare has done a great job with that at that; they did it with their DNS service, they did it with all the stuff we worked on back when I was on the board and actively involved in the company. You’ve got to make it easy. I mean, I remember when Matthew and Lee worked like 20 hours a day on how to switch your DNS from whoever your provider was to Cloudflare. That was supposed to be one click, done. A to B. And that DNA was part of the magic. And whether we agree that Sun did it that way, to me at least, Sun did it that way as well. So it’s huge, a huge lift.

[00:09:08]

Graham-Cumming: It’s funny you talk about that because at the time, how that actually worked is that we just asked people to give us their username and password. And we logged in and did it for them. Early on, Matthew asked me if I’d be interested in joining Cloudflare when it was brand new and because of other reasons I’d moved back to the UK and I wasn’t ready to change jobs and I’d just taken another job. And I remember thinking, this thing is crazy this Cloudflare thing. Who’s going to hand over their DNS and their traffic to these four or five people above a nail salon in Palo Alto? And Matthew’s response was, “They’re giving us their passwords, let alone their traffic.” Because they were so desperate for it.

[00:09:54]

Rothrock: It tells you a lot about Matthew and you know as an attorney, I mean he was very sensitive to that and believes that one of the one of the founding principles is trust. His view was that, if I ever lose the customer’s trust, Cloudflare is toast. And so everything focused around that key value. And he was right.

[00:10:18]

Graham-Cumming: And you must have, at Sun, been involved with some high performance computing things that involved sensitive customers doing cryptography and things like that. So again trust is another theme that runs through there as well.

[00:10:33]

Rothrock: Yeah, very true. As the marketing guy of CAD/CAM, I was in the field two-thirds of the time, showing customers what was possible with them. My job was to get third party software onto the Sun box and then to turn that into a presentation to a customer. So I visited many government customers, many aerospace, power, all these very high falutin sort of behind the firewall kinds of guys in those days. So yes, trust was huge. It would come up: “Okay, so I’m using your CPU, how is it that you can’t use mine. And how do you convince me that you’ve not violated something.” In those days it was a whole different conversation that it is today but it was nonetheless just as important. In fact I remember I spent quite a bit of time at NCSA at the University of Illinois Urbana-Champaign. Larry Smarr was the head of NCSA. We spent a lot of time with Larry. I think John was there with me. John Gage and Vinod and some others but it was a big deal taking about high performance computing because that’s what they were doing and doing it with Sun.

[00:11:50]

Graham-Cumming: So just to dial forward, so you’re at Venrock and you decide to invest in Cloudflare. What was it that made you think that this was worth investing in? Presumably you saw some things that were in some of Sun’s vision. Because Sun had a very wide-ranging visions about what was going to be possible with computing.

[00:12:11]

Rothrock: Yeah. Let me sort of touch on a few points probably. Certainly Sun was my first computer company I worked for after I got out of the nuclear business and the philosophy of the company was very powerful. Not only we had this cool 19 inch black and white giant Macintosh essentially although the Mac wasn’t even born yet, but it had this ease of use that was powerful and had this open, I mean it was we preached that all the time and we made that possible. And Cloudflare—the related philosophy of Matthew and Michelle’s genius—was they wanted to make security and distribution of data as free and easy as possible for the long tail. That was the first thinking because you didn’t have access if you were in the long tail you were a small company you or you’re just going to get whipped around by the big boys. And so there was a bit of, “We’re here to help you, we’re going to do it.” It’s a good thing that the long tail get mobilized if you will or emboldened to use the Internet like the big boys do. And that was part of the attractiveness. I didn’t say, “Boy, Matthew, this sounds like Sun,” but the concept of open and liberating which is what they were trying to do with this long tail DNS and CDN stuff was very compelling and seemed easy. But nothing ever is. But they made it look easy.

[00:13:52]

Graham-Cumming: Yeah, it never is. One of the parallels that I’ve noticed is that I think early on at Sun, a lot of Sun equipment went to companies that later became big companies. So some of these small firms that were using crazy work stations ended up becoming some of the big names in the Valley. To your point about the long tail, they were being ignored and couldn’t buy from IBM even if they wanted to.

[00:14:25]

Rothrock: They couldn’t afford SNA and they couldn’t do lots of things. So Sun was an enabler for these companies with cool ideas for products and software to use Sun as the underpinning. workstations were all the rage, because PCs were very limited in those days. Very very limited, they were all Intel based. Sun was 68000-based originally and then it was their own stuff, SPARC. You know in the beginning it was a cheap microprocessor from Motorola.

[00:15:04]

Graham-Cumming: What was the growth like at Sun? Because it was very fast, right?

[00:15:09]

Rothrock: Oh yes, it was extraordinarily fast. I think I was employee 130 or something like that. I left Sun in 1986 to go to business school and they gave me a leave of absence. Carol Bartz was my boss at that moment. The company was like at 2000 people just two and a half years later. So it was growing like a weed. I measured my success by how thick the catalyst—that was our catalog name and our program—how thick and how quickly I could add bonafide software developers to our catalog. We published on one sheet of paper front to back. When I first got there, our catalyst catalogue was a sheet of paper, and when I left, it was a book. It was about three-quarters of an inch thick. My group grew from me to 30 people in about a year and a half. It was extraordinary growth. We went public during that time, had a lot of capital and a lot of buzz. That openness, that our competition was all proprietary just like you were citing there, John. IBM and Apollo were all proprietary networks. You could buy a NIC card and stick it into your PC and talk to a Sun. And vise versa. And you couldn’t do that with IBM or Apollo. Do you remember those?

[00:16:48]

Graham-Cumming: I do because I was talking to John Gage. In my first job out of college, I wrote a TCP/IP stack from scratch, for a manufacturer of network cards. The test of this stack was I had an HP Apollo box and I had a Sun workstation and there was a sort of magical, can I talk to these devices? And can I ping them? And then that was already magical the first ping as it went across the network. And then, can I Telnet to one of these? So you know, getting the networking actually running was sort of the key thing. How important was networking for Sun in the early days? Was it always there?

[00:17:35]

Rothrock: Yeah, it was there from the beginning, the idea of having a network capability. When I got there it was network; the machine wasn’t standalone at all. We sort of mimicked the mainframe world where we had green screens hooked into a Sun in a department for example. And there was time sharing. But as soon as you got a Sun on your desk, which was rare because we were shipping as many as we could build, it was fantastic. I was sharing information with engineering and we were working back and forth on stuff. But I think it was fundamental: you have a microprocessor, you’ve got a big screen, you’ve got a graphic UI, and you have a network that hooks into the greater universe. In those days, to send an all-Sun email around the world, modems spun up everywhere. The network wasn’t what it is now.

[00:18:35]

Graham-Cumming: I remember in about 89’, I was at a conference and Whit Diffie was there. I asked him what he was doing. He was in a little computer room. I was trying to typeset something. And he said, “I’m telnetting into a machine which is in San Diego.” It was the first time I’d seen this and I stepped over and he was like, “look at this.” And he’s hitting the keyboard and the keys are getting echoed back. And I thought, oh my goodness, this is incredible. It’s right across the Atlantic and across the country as well.

[00:19:10]

Rothrock: I think, and this is just me talking having lived the last years and with all the investing and stuff I did, but you know it enabled the Internet to come about, the TCP/IP standard. You may recall that Microsoft tried to modify the TCP/IP stack slightly, and the world rejected it, because it was just too powerful, too pervasive. And then along comes HTTP and all the other protocols that followed. Telnetting, FTPing, all that file transfer stuff, we were doing that left, right, and center back in the 80s. I mean you know Cloudflare just took all this stuff and made it better, easier, and literally lower friction. That was the core investment thesis at the time and it just exploded. Much like when Sun adopted TCP/IP, it just exploded. You were there when it happened. My little company that I’m the CEO of now, we use Cloudflare services. First thing I did when I got there was switched to Cloudflare.

[00:20:18]

Graham-Cumming: And that was one of the things when I joined, we really wanted people get to a point where if you’re putting something on the web, you just say, well I’m going to put Cloudflare or a thing like Cloudflare just on it. Because it protects it, it makes it faster, etc. And of course now what we’ve done is we’ve given people compute facility. Right now you can write code and run it in our in our machines worldwide which is another whole thing.

[00:20:43]

Rothrock: And that is “The Network is the Computer”. The other thing that Sun was pitching then was a paperless office. I remember we had posters of paper flying out of a computer window on a Sun workstation and I don’t think we’ve gotten there yet. But certainly, the network is the computer.

[00:21:04]

Graham-Cumming: It was probably the case that the paperless office was one of those things that was about to happen for quite a long time.

[00:21:14]

Rothrock: It’s still about to happen if you ask me. I think e-commerce and the sort of the digital transformation has driven it harder than just networking. You know, the fact that we can now sign legal documents over the Internet without paper and things like that. People had to adopt. People have to trust. People have to adopt these standards and accept them. And lo and behold we are because we made it easy, we made it cheap, and we made it trustworthy.

[00:21:42]

Graham-Cumming: If you dial back through Sun, what was the hardest thing? I’m asking because I’m at a 1,000-person company and it feels hard some days, so I’m curious. What do I need to start worrying about?

[00:22:03]

Rothrock: Well yeah, at 1,000 people, I think that’s when John came into the company and sort of organized marketing. I would say, holding engineering to schedules; that was hard. That was hard because we were pushing the envelope our graphics was going from black and white to color. The networking stuff the performance of all the chips into the boards and just the performance was a big deal. And I remember, for me personally, I would go to a trade show. I’d go to Boston to the Association of Mechanical Engineers with the team there and would show up at these workstations and of course the engineers want to show off the latest. So I would be bringing with me tapes that we had of the latest operating system. But getting the engineers to be ready for a tradeshow was very hard because they were always experimenting. I don’t believe the word “code freeze” meant much to them, frankly, but we would we would be downloading the software and building a trade show thing that had to run for three days on the latest and greatest and we knew our competitor would be there right across the aisle from us sort of showing their hot stuff. And working with Eric Schmidt in those days, you know, Eric you just got to be done on this date. But trade shows were wonderful. They focused the company’s endpoints if you will. And marketing and sales drove Sun; Scott McNealy’s culture there was big on that. But we had to show. It’s different today than it was then, I don’t know about the Cloudflare competition, but back then, there were a dozen workstation companies and we were fighting for mindshare and market share every day. So you didn’t dare sort of leave your best jewels at home. You brought them with you. I will give John Gage high, high marks. He showed me how to dance through a reboot in case the code crashed and he’s marvelous and I learned how to work that stuff and to survive.

[00:24:25]

Rothrock: Can I tell you one sort of sales story?  

[00:24:28]

Graham-Cumming: Yes, I’m very interested in hearing the non-technical stories. As an engineer, I can hear engineering stories all the time, but I’m curious what it was like being in sales and marketing in such an engineering heavy company as Sun.

[00:24:48]

Rothrock: Yeah. Well it was challenging of course. One of the strategies that Sun had in those days was to get anyone who was building their own computer. This was Computer Vision and Data General and all those guys to adopt the Sun as their hardware platform and then they could put on whatever they wanted. So because I was one of the demo gods, my job was to go along with the sales guys when they wanted to try to convince somebody. So one of the companies we went after was Data General (DG) in Massachusetts. And so I worked for weeks on getting this whole demo suite running MDA, EDA, word processing, I had everything. And this was a big, big, big deal. And I mean like hundreds of millions of dollars of revenue. And so I went out a couple of days early and we were going to put up a bunch of Suns and I had a demo room at DG. So all the gear showed up and I got there at like 5:30 in the morning and started downloading everything, downloading software, making it dance. And at about 8:00 a.m. in the morning the CEO of Data General walks in. I didn’t know who he was but it turned out to be Ed de Castro. And he introduces himself and I didn’t know who he was and he said, “What are you doing?” And I explained, “I’m from Sun, I’m getting ready for a big demo. We’ve got a big executive presentation. Mr. McNealy will be here shortly, etc.” And he said, “Well, show me what you’ve got.” So I’m sort of still in the middle of downloading this software and I start making this thing dance. I’ve got these machines talking to each other and showing all kinds of cool stuff. And he left. And the meeting was about 10 or 11 in the morning. And so when the executive team from Sun showed up they said, “Well, how’s it going?” I said, “Well I gave a demo to a guy,” and they asked, “Who’s the guy,” and I said, “It was Ed de Castro.” And they went, “Oh my God, that was the CEO.” Well, we got the deal. I thought Ed had a little tactic there to come in early, see what he could see, maybe get the true skinny on this thing and see what’s real. I carried the day. But anyway, I got a nice little bonus for that. But Vinod and I would drop into Lockheed down in Southern California. They wanted to put Suns on P-3 airplanes and we’d go down there with an engineer and we’d figure out how to make it. Those were just incredible times. You may remember back in the 80s everyone dressed up except on Fridays. It was dress-down Fridays. And one day I dressed down and Carol Bartz, my boss, saw me wearing blue jeans and just an open collared shirt and she said, “Rothrock, you go home and put on a suit! You never know when a customer is going to walk in the front door.” She was quite right. Kodak shows up. Kodak made a big investment in Sun when it was still private. And I gave that demo and then AT&T, and then interestingly Vice President Bush back in the Reagan administration came to Sun to see the manufacturing and I gave the demo to the Vice President with Scott and Andy and Bill and Vinod standing there.

[00:28:15]

Graham-Cumming: Do you remember what he saw?

[00:28:18]

Rothrock: It was my standard two minute Sun demo that I can give in my sleep. We were on the manufacturing floor. We picked up a machine and I created a demo for it and my executive team was there. We have a picture of it somewhere, but it was fun. As John Gage would say, he’d say, “Ray, your job is to make the computer dance.” So I did.  

[00:28:44]

Graham-Cumming: And one of the other things I wanted to ask you about is at some point Sun was almost Amazon Web Services, wasn’t it. There was a rent-a-computer service, right?  

[00:28:53]

Rothrock: I don’t know. I don’t remember the rent-a-computer service. I remember we went after the PC business aggressively and went after the data centers which were brand new in those days pretty aggressively, but I don’t remember the rent-a-computer business that much. It wasn’t in my domain.

[00:29:14]

Graham-Cumming: So what are you up to these days?

[00:29:18]

Rothrock: I’m still investing. I do a lot of security investing. I did 15 deals while I was at Venrock. Cloudflare was the last one I did, which turned out really well of course. More to come, I hope. And I’m CEO of one of Venrock’s portfolio companies that had a little trouble a few years back but I fixed that and it’s moving up nicely now. But I’ve started thinking about more of a science base. I’m on the board of the Carnegie Institute of Science. I’m on the board of MIT and I just joined the board of the Nuclear Threat Initiative in Washington which is run by Secretary Ernie Moniz, former secretary of energy. So I’m doing stuff like that. John would be pleased with how well that played through. But I’ll tell you it is this these fundamental principles, just tying it all back to Sun and Cloudflare, and this sort of open, cheap, easy, enabling humans to do things without too much friction, that is exciting. I mean, look at your phone. Steve Jobs was the master of design to make this thing as sweet as it is.

[00:30:37]

Graham-Cumming: Yes, and as addictive.

[00:30:39]

Rothrock: Absolutely, right. I haven’t been to a presentation from Cloudflare in two years, but every time I see an announcement like the DNS service, I immediately switched all my DNS here at the house to 1.1.1.1. Stuff like that. Because I know it’s good and I know it’s trustworthy, and it’s got that philosophy built in the DNA.

[00:31:09]

Graham-Cumming: Yes definitely. Taking it back to what we talked about at the beginning, it’s definitely the trustworthiness is something that Cloudflare has cared about from the beginning and continues to care about. We’re sort of the guardians of the traffic that passes through it.

[00:31:25]

Rothrock: Back when the Internet started happening and when Sun was doing Java, I mean, all those things in the 90s, I was of course at Venrock, but I was still pretty connected to [Edward] Zander and [Scott] McNealy. We were hoping that it would be liberating, that it would create a world which was much more free and open to conversation and we’ve seen the dark side of some of that. But I continue to believe that transparency and openness is a good thing and we should never shut it down. I don’t mean to get it all waxing philosophical here but way more good comes from being open and transparent than bad.

[00:32:07]

Graham-Cumming: Listen it’s July 4th. It’s evening here in London. We can be waxing philosophical as much as we like. Well listen, thank you for taking the time to chat with me. Are there any other reminiscences of Sun that you think the public needs to know in this oral history of “The Network is the Computer.”

[00:32:28]

Rothrock: Well you know the only thing I’d say is having landed in the Silicon Valley in 1981 and getting on with Sun, I can say this given my age and longevity here, everything is built on somebody else’s great ideas. And starting with TCP/IP and then we went to this HTML protocol and browsers, it’s just layer on layer on layer on layer and so Cloudflare is just one of the latest to climb on the shoulders of those giants who put it all together. I mean, we don’t even think about the physical network anymore. But it is there and thank goodness companies like Cloudflare keep providing that fundamental service on which we can build interesting, cool, exciting, and mind-changing things. And without a Cloudflare, without Sun, without Apollo, without all those guys back in the day, it would be different. The world would just be so, so different. I did the New York Times crossword puzzle. I could not do it without Google because I have access to information I would not have unless I went to the library. It’s exponential and it just gets better. Thanks to Michelle and Matthew and Lee for starting Cloudflare and allowing Venrock to invest in it.

[00:34:01]

Graham-Cumming: Well thank you for being an investor. I mean, it helped us get off the ground and get things moving. I very much agree with you about the standing on the shoulders of giants because people don’t appreciate the extent to which so much of this fundamental work that we did was done in the 70s and 80s.

[00:34:19]

Rothrock: Yea, it’s just like the automobile and the airplane. We reminisce about the history but boy, there were a lot of giants in those industries as well. And computing is just the latest.

[00:34:32]

Graham-Cumming: Yep, absolutely. Well, Ray, thank you. Have a good afternoon.


Interested in hearing more? Listen to my conversations with John Gage and Greg Papadopoulos of Sun Microsystems:

To learn more about Cloudflare Workers, check out the use cases below:

  • Optimizely – Optimizely chose Workers when updating their experimentation platform to provide faster responses from the edge and support more experiments for their customers.
  • Cordial – Cordial used a “stable of Workers” to do custom Black Friday load shedding as well as using it as a serverless platform for building scalable customer-facing services.
  • AO.com – AO.com used Workers to avoid significant code changes to their underlying platform when migrating from a legacy provider to a modern cloud backend.
  • Pwned Passwords – Troy Hunt’s popular “Have I Been Pwned” project benefits from cache hit ratios of 94% on its Pwned Passwords API due to Workers.
  • Timely – Using Workers and Workers KV, Timely was able to safely migrate application endpoints using simple value updates to a distributed key-value store. Quintype – Quintype was an eager adopter of Workers to cache content they previously considered un-cacheable and improve the user experience of their publishing platform.

The Network is the Computer: A Conversation with Greg Papadopoulos

Post Syndicated from John Graham-Cumming original https://blog.cloudflare.com/greg-papadopoulos/

The Network is the Computer: A Conversation with Greg Papadopoulos

The Network is the Computer: A Conversation with Greg Papadopoulos

I spoke with Greg Papadopoulos, former CTO of Sun Microsystems, to discuss the origins and meaning of The Network is the Computer®, as well as Cloudflare’s role in the evolution of the phrase. During our conversation, we considered the inevitability of latency, the slowness of the speed of light, and the future of Cloudflare’s newly acquired trademark. Listen to our conversation here and read the full transcript below.


[00:00:08]

John Graham-Cumming: Thank you so much for taking the time to chat with me. I’ve got Greg Papadopoulos who was CTO of Sun and is currently a venture capitalist. Tell us about “The Network is the Computer.”

[00:00:22]

Greg Papadopoulos: Well, from certainly a Sun perspective, the very first Sun-1 was connected via Internet protocols and at that time there was a big war about what should win from a networking point of view. And there was a dedication there that everything that we made was going to interoperate on the network over open standards, and from day one in the company, it was always that thought. It’s really about the collection of these machines and how they interact with one another, and of course that puts the network in the middle of it. And then it becomes hard to, you know, where’s the line? But it is one of those things that I think even if you ask most people at Sun, you go, “Okay explain to me ‘The Network is the Computer.’” It would get rather meta. People would see that phrase and sort of react to it in their own way. But it would always come back to something similar to what I had said I think in the earlier days.

[00:01:37]

Graham-Cumming: I remember it very well because it was obviously plastered everywhere in Silicon Valley for a while. And it sounded incredibly cool but I was never quite sure what it meant. It sounded like it was one of those things that was super deep but I couldn’t dig deep enough. But it sort of seems like this whole vision has come true because if you dial back to I think it’s 2006, you wrote a blog post about how the world was only going to need five or seven or some small number of computers. And that was also linked to this as well, wasn’t it?

[00:02:05]

Papadopoulos: Yeah, I think as things began to evolve into what we would call cloud computing today, but that you could put substantial resources on the other side of the network and from the end user’s perspective and those could be as effective or more effective than something you’d have in front of you. And so this idea that you really could provide these larger scale computing services in early days — you know, grid was the term used before cloud — but if you follow that logic, and you watch what was happening to the improvements of the network. Dave Patterson at Cal was very fond of saying in that era and in the 90s, networks are getting to the place where the desk connected to another machine is transparent to you. I mean it could be your own, in fact, somebody else’s memory may in fact be closer to you than your own disk. And that’s a pretty interesting thought. And so where we ended up going was really a complete realization that these things we would call servers were actually just components of this network computer. And so it was very mysterious, “The Network is the Computer,” and it actually grew into itself in this way. And I’ll say looking at Cloudflare, you see this next level of scale happening. It’s not just, what are those things that you build inside a data center, how do you connect to it, but in fact, it’s the network that is the computer that is the network.

[00:04:26]

Graham-Cumming: It’s interesting though that there have been these waves of centralization and then push the computing power to the edge and the PCs at some point and then Larry Ellison came along and he was going to have this networked computer thing, and it sort of seems to swing back and forth, so where do you think we are in this swinging?

[00:04:44]

Papadopoulos: You know, I don’t think so much swinging. I think it’s a spiral upwards and we come to a place and we look down and it looks familiar. You know, where you’ll say, oh I see, here’s a 3270 connected to a mainframe. Well, that looks like a browser connected to a web server. And you know, here’s the device, it’s connected to the web service. And they look similar but there are some very important differences as we’re traversing this helix of sorts. And if you look back, for example the 3270, it was inextricably bound to a single server that was hosted. And now our devices have really the ability to connect to any other computer on the network. And so then I think we’re seeing something that looks like a pendulum there, it’s really a refactoring question on what software belongs where and how hard is it to maintain where it is, and naturally I think that the Internet protocol clearly is a peer to peer protocol, so it doesn’t take sides on this. And so that we end up in one state, with more on the client or less on the client. I think it really has to do with how well we’ve figured out distributed computing and how well we can deliver code in a management-free way. And that’s a longer conversation.

[00:06:35]

Graham-Cumming: Well, it’s an interesting conversation. One thing is what you talked about with Sun Grid which then we end up with Amazon Web Services and things like that, is that there was sort of the device, be it your handheld or your laptop talking to some cloud computing, and then what Cloudflare has done with this Workers product to say, well, actually I think there’s three places where code could exist. There’s something you can put inside the network.

[00:07:02]

Papadopoulos: Yes. And by extension that could grow to another layer too. And it goes back to, I think it’s Dave Clark who I first remember saying you can get all the bandwidth you want, that’s money, but you can’t reduce latency. That’s God, right. And so I think there are certainly things and as I see the Workers architecture, there are two things going on. There’s clearly something to be said about latency there, and having distributed points of presence and getting closer to the clients. And there’s IBM with interaction there too, but it is also something that is around management of software and how we should be thinking in delivery of applications, which ultimately I believe, in the limit, become more distributed-looking than they are now. It’s just that it’s really hard to write distributed applications in kind of the general way we think about it.

[00:08:18]

Graham-Cumming: Yes, that’s one of these things isn’t it, it is exceedingly hard to actually write these things which is why I think we’re going through a bit of a transition right now where people are trying to figure out where that code should actually execute and what should execute where.

[00:08:31]

Papadopoulos: Yeah. You had graciously pointed out this blog from a dozen years ago on, hey this is inevitable that we’re going to have this concentration of computing, for a lot of economic reasons as anything else. But it’s both a hammer and a nail. You know, cloud stuff in some ways is unnatural in that why should we expect computing to get concentrated like it is. If you really look into it more deeply, I think it has to do with management and control and capital cycles and really things that are kind of on the economic and the administrative side of things, are not about what’s truth and beauty and the destination for where applications should be.

[00:09:27]

Graham-Cumming: And I think you also see some companies are now starting to wrestle with the economics of the cloud where they realize that they are kind of locked into their cloud provider and are paying rent kind of thing; it becomes entirely economic at that point.

[00:09:41]

Papadopoulos: Well it does, and you know, this was also something I was pretty vocal about, although I got misinterpreted for a while there as being, you know, anti-cloud or something which I’m not, I think I’m pragmatic about it. One of the dangers is certainly as people yield particularly to SaaS products, that in fact, your data in many ways, unless you have explicit contracts and abilities to disgorge that data from that service, that data becomes more and more captive. And that’s the part that I think is actually the real question here, which is like, what’s the switching cost from one service to another, from one cloud to another.

[00:10:35]

Graham-Cumming: Yes, absolutely. That’s one of the things that we faced, one of the reasons why we worked on this thing called the Bandwidth Alliance, which is one of the ways in which stuff gets locked into clouds is the egress fee is so large that you don’t want to get your data out.

[00:10:50]

Papadopoulos: Exactly. And then there is always the, you know, well we have these particular features in our particular cloud that are very seductive to developers and you write to them and it’s kind of hard to undo, you know, just the physics of moving things around. So what you all have been doing there is I think necessary and quite progressive. But we can do more.

[00:11:17]

Graham-Cumming: Yes definitely. Just to go back to the thought about latency and bandwidth, I have a jokey pair of slides where I show the average broadband network you can buy over time and it going up, and then the change in the speed of light over the same period, which of course is entirely flat, zero progress in the speed of light. Looking back through your biography, you wrote thinking machines and I assume that fighting latency at a much shorter distance of cabling must have been interesting in those machines because of the speeds at which they were operating.

[00:11:54]

Papadopoulos: Yes, it surprises most people when you say it, but you know, computer architects complain that the speed of light is really slow. And you know, Grace Hopper who is really one of the founders, the pioneers of modern programming languages and COBOL. I think she was a vice admiral. And she would walk around with a wire that was a foot long and say, “this is a nanosecond”. And that seemed pretty short for a while but, you know a nanosecond is an eternity these days.

[00:12:40]

Graham-Cumming: Yes, it’s an eternity. People don’t quite appreciate it if they’re not thinking about it, how long it is. I had someone who was new to the computing world learning about it, come to me with a book which was talking about fiber optics, and in the book it said there is a laser that flashes on and off a billion times a second to send data down the fiber optic. And he came to me and said, “This can’t possibly be true; it’s just too fast.”

[00:13:09]

Papadopoulos: No, it’s too slow!

[00:013:12]

Graham-Cumming: Right? And I thought, well that’s slow. And then I stepped back and thought, you know, to the average person, that is a ridiculous statement, that somehow we humans have managed to control time at this ridiculously small level. And then we keep pushing and pushing and pushing it and people don’t appreciate how fast and actually how slow the light is, really.

[00:13:33]

Papadopoulos: Yeah. And I think if it actually comes down to it, if you want to get into a very pure reckoning of this is latency is the only thing that matters. And one can look at bandwidth as a component of latency, so you can see bandwidth as a serialization delay and that kind of goes back to Clark thing, you know that, yeah I can buy that, I can’t bribe God on the other side so you know I’m fundamentally left with this problem that we have. Thank you, Albert Einstein, right? It’s kind of hopeless to think about sending information faster than that.

[00:14:09]

Graham-Cumming: Yeah exactly. There’s information limits, which is driving why we have such powerful phones, because in fact the latency to the human is very low if you have it in your hand.

[00:14:23]

Papadopoulos: Yes, absolutely. This is where the edge architecture and the Worker structure that you guys are working on, and I think that’s where it becomes really interesting too because it gives me — you talked about earlier, well we’re now introducing this new tier — but it gives me a really closer place from a latency point of view to have some intimate relationship with a device, and at the same time be well-connected to the network.

[00:14:55]

Graham-Cumming: Right. And I think the other thing that is interesting about that is that your device fundamentally is an insecure thing, so you know if you put code on that thing, you can’t put secrets in it, like a cryptographic secrets, because the end user has access to them. Normally you would keep that in the server somewhere, but then the other funny thing is if you have this intermediary tier which is both secure and low latency to the end user, you suddenly have a different world in which you can put secrets, you can put code that is privileged, but it can interact with the user very very rapidly because the low latency.

[00:15:30]

Papadopoulos: Yeah. And that essence of where’s my trust domain. Now I’ve seen all kinds of like, oh my gosh, I cannot believe somebody is doing it, like putting their S3 credentials, putting it down on a device and having it talk, you know, the log in for a database or something. You must be kidding. I mean that trust proxy point at low latency is a really key thing.

[00:16:02]

Graham-Cumming: Yes, I think it’s just people need to start thinking about that architecture. Is there a sort of parallel with things that were going on with very high-performance computing with sort of the massively parallel stuff and what’s happening today? What lessons can we take from work done in the 70s and 80s and apply it to the Internet of today?

[00:16:24]

Papadopoulos: Well, we talked about this sort of, there are a couple of fundamental issues here. And one we’ve been speaking about is latency. The other one is synchronization, and this comes up in a bunch of different ways. You know, whether it’s when one looks at the cap theorem kinds of things that Eric Brewer has been famous for, can I get consistency and availability and survive partitionability, all that, at the same time. And so you end up in this kind of place of—goes back to maybe Einstein a bit—but you know, knowing when things have happened and when state has been actually changed or committed is a pretty profound problem.

[00:17:15]

Graham-Cumming: It is, and what order things have happened.

[00:17:18]

Papadopoulos: Yes. And that order is going to be relative to an observer here as well. And so if you’re insisting on some total ordering then you’re insisting on slowing things down as well. And that really is fundamental. We were pushing into that in the massively parallel stuff and you’ll see that at Internet scale. You know there’s another thing, if I could. This is one of my greatest “aha”s about networks and it’s due to a fellow at Sun, Rob Gingell, who actually ended up being chief engineer at Sun and was one of the real pioneers of the software development framework that brought Solaris forward. But Rob would talk about this thing that I label as network entropy. It’s basically what happens when you connect systems to networks, what do networks kind of do to those systems? And this is a little bit of a philosophical question; it’s not a physical one. And Rob observed that over time networks have this property of wanting to decompose things into constituent parts, have those parts get specialized and then reintegrated. And so let me make that less abstract. So in the early days of connecting systems to networks, one of the natural observations were, well why don’t we take the storage out of those desktop systems or server systems and put them on the other side of at least a local network and into a file server or storage server. And so you could see that computer sort of get pulled apart between its computing and its storage piece. And then that storage piece, you know in Rob’s step, that would go on and get specialized. So we had whole companies start like Network Appliances, Pure Storage, EMC. And so, you know like big pieces of industry or look the original routers were RADb you know running on workstations and you know Cisco went and took that and made that into something and so you now see this effect happen at the next scale. One of the things that really got me excited when I first saw Cloudflare a decade ago was, wow okay in those early days, well we can take a component like a network firewall and that can get pulled away and created as its own network entity and specialized. And I think one of the things, at least from my history of Cloudflare, one of the most profound things was, particularly as you guys went in and separated off these functions early on, the fear of people was this is going to introduce latency, and in fact things got faster. Figure that.

[00:20:51]

Graham-Cumming: Part of that of course is caching and then there’s dealing with the speed of light by being close to people. But also if you say your company makes things faster and you do all these different things including security, you are forced to optimize the whole thing to live up to the claim. Whereas if you try and chain things together, nobody’s really responsible for that overall latency budget. It becomes natural that you have to do it.

[00:21:18]

Papadopoulos: Yes. And you all have done it brilliantly, you know, to sort of Gingell’s view. Okay so this piece got decomposed and now specialized, meaning optimized like heck, because that’s what you do. And so you can see that over and over again and you see it in terms of even Twilio or something. You know, here’s a messaging service. I’m just pulling my applications apart, letting people specialize. But the final piece, and this is really the punchline. The final piece is, Rob will talk about it, the value is in the reintegration of it. And so you know what are those unifying forces that are creating, if you will, the operating system for “The Network is the Computer.” You were asking about the massively parallel scale. Well, we had an operating system we wrote for this. As you get up to the higher scale, you get into these more distributed circumstances where the complexity goes up by some important number of orders of magnitude, and now what’s that reintegration? And so I come back and I look at what Cloudflare is doing here. You’re entering into that phase now of actually being that re-integrator, almost that operating system for the computer that is the network.

[00:23:06]

Graham-Cumming: I think that’s right. We often talk about actually being an operating system on the Internet, so very similar kind of thoughts.

[00:23:14]

Papadopoulos: Yes. And you know as we were talking earlier about how developers make sense of this pendulum or cycle or whatever it is. Having this idea of an operating system or of a place where I can have ground truths and trust and sort of fixed points in this are terribly important.

[00:23:44]

Graham-Cumming: Absolutely. So do you have any final thoughts on, what, it must be 30 years on from when “The Network is the Computer” was a Sun trademark. Now it’s a Cloudflare trademark. What’s the future going to look of that slogan and who’s going to trademark it in 30 years time now?

[00:24:03]

Papadopoulos: Well, it could be interplanetary at that point.

[00:24:13]

Graham-Cumming: Well, if you talk about the latency problems of going interplanetary, we definitely have to solve the latency.

[00:24:18]

Papadopoulos: Yeah. People do understand that. They go, wow it’s like seven minutes within here and Mars, hitting close approach.

[00:24:28]

Graham-Cumming: The earthly equivalent of that is New Zealand. If you speak to people from New Zealand and they come on holiday to Europe or they move to the US, they suddenly say that the Internet works so much better here. And it’s just that it’s closer. Now the Australians have figured this out because Australia is actually drifting northwards so they’re actually going to get within. That’s going to fix it for them but New Zealand is stuck.

[00:24:56]

Papadopoulos: I do ask my physicist friends for one of two things. You know, either give me a faster speed of light — so far they have not delivered — or another dimension I can cut through. Maybe we’ll keep working on the latter.

[00:25:16]

Graham-Cumming: All right. Well listen Greg, thank you for the conversation. Thank you for thinking about this stuff many many years ago. I think we’re getting there slowly on some of this work. And yeah, good talking to you.

[00:25:27]

Papadopoulos: Well, you too. And thank you for carrying the torch forward. I think everyone from Sun who listens to this, and John, and everybody should feel really proud about what part they played in the evolution of this great invention.

[00:25:48]

Graham-Cumming: It’s certainly the case that a tremendous amount of work was done at Sun that was really fundamental and, you know, perhaps some of that was ahead of its time but here we are.

[00:25:57]

Papadopoulos: Thank you.

[00:25:58]

Graham-Cumming: Thank you very much.

[00:25:59]

Papadopoulos: Cheers.


Interested in hearing more? Listen to my conversations with John Gage and Ray Rothrock of Sun Microsystems:

To learn more about Cloudflare Workers, check out the use cases below:

  • Optimizely – Optimizely chose Workers when updating their experimentation platform to provide faster responses from the edge and support more experiments for their customers.
  • Cordial – Cordial used a “stable of Workers” to do custom Black Friday load shedding as well as using it as a serverless platform for building scalable customer-facing services.
  • AO.com – AO.com used Workers to avoid significant code changes to their underlying platform when migrating from a legacy provider to a modern cloud backend.
  • Pwned Passwords – Troy Hunt’s popular “Have I Been Pwned” project benefits from cache hit ratios of 94% on its Pwned Passwords API due to Workers.
  • Timely – Using Workers and Workers KV, Timely was able to safely migrate application endpoints using simple value updates to a distributed key-value store.
  • Quintype – Quintype was an eager adopter of Workers to cache content they previously considered un-cacheable and improve the user experience of their publishing platform.

The Network is the Computer

Post Syndicated from John Graham-Cumming original https://blog.cloudflare.com/the-network-is-the-computer/

The Network is the Computer

The Network is the Computer

We recently registered the trademark for The Network is the Computer®, to encompass how Cloudflare is utilizing its network to pave the way for the future of the Internet.

The phrase was first coined in 1984 by John Gage, the 21st employee of Sun Microsystems, where he was credited with building Sun’s vision around “The Network is the Computer.” When Sun was acquired in 2010, the trademark was not renewed, but the vision remained.

Take it from him:

“When we built Sun Microsystems, every computer we made had the network at its core. But we could only imagine, over thirty years ago, today’s billions of networked devices, from the smallest camera or light bulb to the largest supercomputer, sharing their packets across Cloudflare’s distributed global network.

We based our vision of an interconnected world on open and shared standards. Cloudflare extends this dedication to new levels by openly sharing designs for security and resilience in the post-quantum computer world.

Most importantly, Cloudflare is committed to immediate, open, transparent accountability for network performance. I’m a dedicated reader of their technical blog, as the network becomes central to our security infrastructure and the global economy, demanding even more powerful technical innovation.”

Cloudflare’s massive network, which spans more than 180 cities in 80 countries, enables the company to deliver its suite of security, performance, and reliability products, including its serverless edge computing offerings.

In March of 2018, we launched our serverless solution Cloudflare Workers, to allow anyone to deploy code at the edge of our network. We also recently announced advancements to Cloudflare Workers in June of 2019 to give application developers the ability to do away with cloud regions, VMs, servers, containers, load balancers—all they need to do is write the code, and we do the rest. With each of Cloudflare’s data centers acting as a highly scalable application origin to which users are automatically routed via our Anycast network, code is run within milliseconds of users worldwide.

In honor of registering Sun’s former trademark, I spoke with John Gage, Greg Papadopoulos, former CTO of Sun Microsystems, and Ray Rothrock, former Director of CAD/CAM Marketing at Sun Microsystems, to learn more about the history of the phrase and what it means for the future:

To learn more about Cloudflare Workers, check out the use cases below:

  • Optimizely – Optimizely chose Workers when updating their experimentation platform to provide faster responses from the edge and support more experiments for their customers.
  • Cordial – Cordial used a “stable of Workers” to do custom Black Friday load shedding as well as using it as a serverless platform for building scalable customer-facing services.
  • AO.com – AO.com used Workers to avoid significant code changes to their underlying platform when migrating from a legacy provider to a modern cloud backend.
  • Pwned Passwords – Troy Hunt’s popular “Have I Been Pwned” project benefits from cache hit ratios of 94% on its Pwned Passwords API due to Workers.
  • Timely – Using Workers and Workers KV, Timely was able to safely migrate application endpoints using simple value updates to a distributed key-value store.
  • Quintype – Quintype was an eager adopter of Workers to cache content they previously considered un-cacheable and improve the user experience of their publishing platform.

EC2 Instance Update – M5 Instances with Local NVMe Storage (M5d)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/ec2-instance-update-m5-instances-with-local-nvme-storage-m5d/

Earlier this month we launched the C5 Instances with Local NVMe Storage and I told you that we would be doing the same for additional instance types in the near future!

Today we are introducing M5 instances equipped with local NVMe storage. Available for immediate use in 5 regions, these instances are a great fit for workloads that require a balance of compute and memory resources. Here are the specs:

Instance NamevCPUsRAMLocal StorageEBS-Optimized BandwidthNetwork Bandwidth
m5d.large28 GiB1 x 75 GB NVMe SSDUp to 2.120 GbpsUp to 10 Gbps
m5d.xlarge416 GiB1 x 150 GB NVMe SSDUp to 2.120 GbpsUp to 10 Gbps
m5d.2xlarge832 GiB1 x 300 GB NVMe SSDUp to 2.120 GbpsUp to 10 Gbps
m5d.4xlarge1664 GiB1 x 600 GB NVMe SSD2.210 GbpsUp to 10 Gbps
m5d.12xlarge48192 GiB2 x 900 GB NVMe SSD5.0 Gbps10 Gbps
m5d.24xlarge96384 GiB4 x 900 GB NVMe SSD10.0 Gbps25 Gbps

The M5d instances are powered by Custom Intel® Xeon® Platinum 8175M series processors running at 2.5 GHz, including support for AVX-512.

You can use any AMI that includes drivers for the Elastic Network Adapter (ENA) and NVMe; this includes the latest Amazon Linux, Microsoft Windows (Server 2008 R2, Server 2012, Server 2012 R2 and Server 2016), Ubuntu, RHEL, SUSE, and CentOS AMIs.

Here are a couple of things to keep in mind about the local NVMe storage on the M5d instances:

Naming – You don’t have to specify a block device mapping in your AMI or during the instance launch; the local storage will show up as one or more devices (/dev/nvme*1 on Linux) after the guest operating system has booted.

Encryption – Each local NVMe device is hardware encrypted using the XTS-AES-256 block cipher and a unique key. Each key is destroyed when the instance is stopped or terminated.

Lifetime – Local NVMe devices have the same lifetime as the instance they are attached to, and do not stick around after the instance has been stopped or terminated.

Available Now
M5d instances are available in On-Demand, Reserved Instance, and Spot form in the US East (N. Virginia), US West (Oregon), EU (Ireland), US East (Ohio), and Canada (Central) Regions. Prices vary by Region, and are just a bit higher than for the equivalent M5 instances.

Jeff;

 

AWS Online Tech Talks – June 2018

Post Syndicated from Devin Watson original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-june-2018/

AWS Online Tech Talks – June 2018

Join us this month to learn about AWS services and solutions. New this month, we have a fireside chat with the GM of Amazon WorkSpaces and our 2nd episode of the “How to re:Invent” series. We’ll also cover best practices, deep dives, use cases and more! Join us and register today!

Note – All sessions are free and in Pacific Time.

Tech talks featured this month:

 

Analytics & Big Data

June 18, 2018 | 11:00 AM – 11:45 AM PTGet Started with Real-Time Streaming Data in Under 5 Minutes – Learn how to use Amazon Kinesis to capture, store, and analyze streaming data in real-time including IoT device data, VPC flow logs, and clickstream data.
June 20, 2018 | 11:00 AM – 11:45 AM PT – Insights For Everyone – Deploying Data across your Organization – Learn how to deploy data at scale using AWS Analytics and QuickSight’s new reader role and usage based pricing.

 

AWS re:Invent
June 13, 2018 | 05:00 PM – 05:30 PM PTEpisode 2: AWS re:Invent Breakout Content Secret Sauce – Hear from one of our own AWS content experts as we dive deep into the re:Invent content strategy and how we maintain a high bar.
Compute

June 25, 2018 | 01:00 PM – 01:45 PM PTAccelerating Containerized Workloads with Amazon EC2 Spot Instances – Learn how to efficiently deploy containerized workloads and easily manage clusters at any scale at a fraction of the cost with Spot Instances.

June 26, 2018 | 01:00 PM – 01:45 PM PTEnsuring Your Windows Server Workloads Are Well-Architected – Get the benefits, best practices and tools on running your Microsoft Workloads on AWS leveraging a well-architected approach.

 

Containers
June 25, 2018 | 09:00 AM – 09:45 AM PTRunning Kubernetes on AWS – Learn about the basics of running Kubernetes on AWS including how setup masters, networking, security, and add auto-scaling to your cluster.

 

Databases

June 18, 2018 | 01:00 PM – 01:45 PM PTOracle to Amazon Aurora Migration, Step by Step – Learn how to migrate your Oracle database to Amazon Aurora.
DevOps

June 20, 2018 | 09:00 AM – 09:45 AM PTSet Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tools – Learn how to set up a CI/CD pipeline for deploying containers using the AWS Developer Tools.

 

Enterprise & Hybrid
June 18, 2018 | 09:00 AM – 09:45 AM PTDe-risking Enterprise Migration with AWS Managed Services – Learn how enterprise customers are de-risking cloud adoption with AWS Managed Services.

June 19, 2018 | 11:00 AM – 11:45 AM PTLaunch AWS Faster using Automated Landing Zones – Learn how the AWS Landing Zone can automate the set up of best practice baselines when setting up new

 

AWS Environments

June 21, 2018 | 11:00 AM – 11:45 AM PTLeading Your Team Through a Cloud Transformation – Learn how you can help lead your organization through a cloud transformation.

June 21, 2018 | 01:00 PM – 01:45 PM PTEnabling New Retail Customer Experiences with Big Data – Learn how AWS can help retailers realize actual value from their big data and deliver on differentiated retail customer experiences.

June 28, 2018 | 01:00 PM – 01:45 PM PTFireside Chat: End User Collaboration on AWS – Learn how End User Compute services can help you deliver access to desktops and applications anywhere, anytime, using any device.
IoT

June 27, 2018 | 11:00 AM – 11:45 AM PTAWS IoT in the Connected Home – Learn how to use AWS IoT to build innovative Connected Home products.

 

Machine Learning

June 19, 2018 | 09:00 AM – 09:45 AM PTIntegrating Amazon SageMaker into your Enterprise – Learn how to integrate Amazon SageMaker and other AWS Services within an Enterprise environment.

June 21, 2018 | 09:00 AM – 09:45 AM PTBuilding Text Analytics Applications on AWS using Amazon Comprehend – Learn how you can unlock the value of your unstructured data with NLP-based text analytics.

 

Management Tools

June 20, 2018 | 01:00 PM – 01:45 PM PTOptimizing Application Performance and Costs with Auto Scaling – Learn how selecting the right scaling option can help optimize application performance and costs.

 

Mobile
June 25, 2018 | 11:00 AM – 11:45 AM PTDrive User Engagement with Amazon Pinpoint – Learn how Amazon Pinpoint simplifies and streamlines effective user engagement.

 

Security, Identity & Compliance

June 26, 2018 | 09:00 AM – 09:45 AM PTUnderstanding AWS Secrets Manager – Learn how AWS Secrets Manager helps you rotate and manage access to secrets centrally.
June 28, 2018 | 09:00 AM – 09:45 AM PTUsing Amazon Inspector to Discover Potential Security Issues – See how Amazon Inspector can be used to discover security issues of your instances.

 

Serverless

June 19, 2018 | 01:00 PM – 01:45 PM PTProductionize Serverless Application Building and Deployments with AWS SAM – Learn expert tips and techniques for building and deploying serverless applications at scale with AWS SAM.

 

Storage

June 26, 2018 | 11:00 AM – 11:45 AM PTDeep Dive: Hybrid Cloud Storage with AWS Storage Gateway – Learn how you can reduce your on-premises infrastructure by using the AWS Storage Gateway to connecting your applications to the scalable and reliable AWS storage services.
June 27, 2018 | 01:00 PM – 01:45 PM PTChanging the Game: Extending Compute Capabilities to the Edge – Discover how to change the game for IIoT and edge analytics applications with AWS Snowball Edge plus enhanced Compute instances.
June 28, 2018 | 11:00 AM – 11:45 AM PTBig Data and Analytics Workloads on Amazon EFS – Get best practices and deployment advice for running big data and analytics workloads on Amazon EFS.

Storing Encrypted Credentials In Git

Post Syndicated from Bozho original https://techblog.bozho.net/storing-encrypted-credentials-in-git/

We all know that we should not commit any passwords or keys to the repo with our code (no matter if public or private). Yet, thousands of production passwords can be found on GitHub (and probably thousands more in internal company repositories). Some have tried to fix that by removing the passwords (once they learned it’s not a good idea to store them publicly), but passwords have remained in the git history.

Knowing what not to do is the first and very important step. But how do we store production credentials. Database credentials, system secrets (e.g. for HMACs), access keys for 3rd party services like payment providers or social networks. There doesn’t seem to be an agreed upon solution.

I’ve previously argued with the 12-factor app recommendation to use environment variables – if you have a few that might be okay, but when the number of variables grow (as in any real application), it becomes impractical. And you can set environment variables via a bash script, but you’d have to store it somewhere. And in fact, even separate environment variables should be stored somewhere.

This somewhere could be a local directory (risky), a shared storage, e.g. FTP or S3 bucket with limited access, or a separate git repository. I think I prefer the git repository as it allows versioning (Note: S3 also does, but is provider-specific). So you can store all your environment-specific properties files with all their credentials and environment-specific configurations in a git repo with limited access (only Ops people). And that’s not bad, as long as it’s not the same repo as the source code.

Such a repo would look like this:

project
└─── production
|   |   application.properites
|   |   keystore.jks
└─── staging
|   |   application.properites
|   |   keystore.jks
└─── on-premise-client1
|   |   application.properites
|   |   keystore.jks
└─── on-premise-client2
|   |   application.properites
|   |   keystore.jks

Since many companies are using GitHub or BitBucket for their repositories, storing production credentials on a public provider may still be risky. That’s why it’s a good idea to encrypt the files in the repository. A good way to do it is via git-crypt. It is “transparent” encryption because it supports diff and encryption and decryption on the fly. Once you set it up, you continue working with the repo as if it’s not encrypted. There’s even a fork that works on Windows.

You simply run git-crypt init (after you’ve put the git-crypt binary on your OS Path), which generates a key. Then you specify your .gitattributes, e.g. like that:

secretfile filter=git-crypt diff=git-crypt
*.key filter=git-crypt diff=git-crypt
*.properties filter=git-crypt diff=git-crypt
*.jks filter=git-crypt diff=git-crypt

And you’re done. Well, almost. If this is a fresh repo, everything is good. If it is an existing repo, you’d have to clean up your history which contains the unencrypted files. Following these steps will get you there, with one addition – before calling git commit, you should call git-crypt status -f so that the existing files are actually encrypted.

You’re almost done. We should somehow share and backup the keys. For the sharing part, it’s not a big issue to have a team of 2-3 Ops people share the same key, but you could also use the GPG option of git-crypt (as documented in the README). What’s left is to backup your secret key (that’s generated in the .git/git-crypt directory). You can store it (password-protected) in some other storage, be it a company shared folder, Dropbox/Google Drive, or even your email. Just make sure your computer is not the only place where it’s present and that it’s protected. I don’t think key rotation is necessary, but you can devise some rotation procedure.

git-crypt authors claim to shine when it comes to encrypting just a few files in an otherwise public repo. And recommend looking at git-remote-gcrypt. But as often there are non-sensitive parts of environment-specific configurations, you may not want to encrypt everything. And I think it’s perfectly fine to use git-crypt even in a separate repo scenario. And even though encryption is an okay approach to protect credentials in your source code repo, it’s still not necessarily a good idea to have the environment configurations in the same repo. Especially given that different people/teams manage these credentials. Even in small companies, maybe not all members have production access.

The outstanding questions in this case is – how do you sync the properties with code changes. Sometimes the code adds new properties that should be reflected in the environment configurations. There are two scenarios here – first, properties that could vary across environments, but can have default values (e.g. scheduled job periods), and second, properties that require explicit configuration (e.g. database credentials). The former can have the default values bundled in the code repo and therefore in the release artifact, allowing external files to override them. The latter should be announced to the people who do the deployment so that they can set the proper values.

The whole process of having versioned environment-speific configurations is actually quite simple and logical, even with the encryption added to the picture. And I think it’s a good security practice we should try to follow.

The post Storing Encrypted Credentials In Git appeared first on Bozho's tech blog.

DNS over HTTPS in Firefox

Post Syndicated from corbet original https://lwn.net/Articles/756262/rss

The Mozilla blog has an
article
describing the addition of DNS over HTTPS (DoH) as an optional
feature in the Firefox browser. “DoH support has been added to
Firefox 62 to improve the way Firefox interacts with DNS. DoH uses
encrypted networking to obtain DNS information from a server that is
configured within Firefox. This means that DNS requests sent to the DoH
cloud server are encrypted while old style DNS requests are not
protected.
” The configured server is hosted by Cloudflare, which
has posted this
privacy agreement
about the service.

Protecting coral reefs with Nemo-Pi, the underwater monitor

Post Syndicated from Janina Ander original https://www.raspberrypi.org/blog/coral-reefs-nemo-pi/

The German charity Save Nemo works to protect coral reefs, and they are developing Nemo-Pi, an underwater “weather station” that monitors ocean conditions. Right now, you can vote for Save Nemo in the Google.org Impact Challenge.

Nemo-Pi — Save Nemo

Save Nemo

The organisation says there are two major threats to coral reefs: divers, and climate change. To make diving saver for reefs, Save Nemo installs buoy anchor points where diving tour boats can anchor without damaging corals in the process.

reef damaged by anchor
boat anchored at buoy

In addition, they provide dos and don’ts for how to behave on a reef dive.

The Nemo-Pi

To monitor the effects of climate change, and to help divers decide whether conditions are right at a reef while they’re still on shore, Save Nemo is also in the process of perfecting Nemo-Pi.

Nemo-Pi schematic — Nemo-Pi — Save Nemo

This Raspberry Pi-powered device is made up of a buoy, a solar panel, a GPS device, a Pi, and an array of sensors. Nemo-Pi measures water conditions such as current, visibility, temperature, carbon dioxide and nitrogen oxide concentrations, and pH. It also uploads its readings live to a public webserver.

Inside the Nemo-Pi device — Save Nemo
Inside the Nemo-Pi device — Save Nemo
Inside the Nemo-Pi device — Save Nemo

The Save Nemo team is currently doing long-term tests of Nemo-Pi off the coast of Thailand and Indonesia. They are also working on improving the device’s power consumption and durability, and testing prototypes with the Raspberry Pi Zero W.

web dashboard — Nemo-Pi — Save Nemo

The web dashboard showing live Nemo-Pi data

Long-term goals

Save Nemo aims to install a network of Nemo-Pis at shallow reefs (up to 60 metres deep) in South East Asia. Then diving tour companies can check the live data online and decide day-to-day whether tours are feasible. This will lower the impact of humans on reefs and help the local flora and fauna survive.

Coral reefs with fishes

A healthy coral reef

Nemo-Pi data may also be useful for groups lobbying for reef conservation, and for scientists and activists who want to shine a spotlight on the awful effects of climate change on sea life, such as coral bleaching caused by rising water temperatures.

Bleached coral

A bleached coral reef

Vote now for Save Nemo

If you want to help Save Nemo in their mission today, vote for them to win the Google.org Impact Challenge:

  1. Head to the voting web page
  2. Click “Abstimmen” in the footer of the page to vote
  3. Click “JA” in the footer to confirm

Voting is open until 6 June. You can also follow Save Nemo on Facebook or Twitter. We think this organisation is doing valuable work, and that their projects could be expanded to reefs across the globe. It’s fantastic to see the Raspberry Pi being used to help protect ocean life.

The post Protecting coral reefs with Nemo-Pi, the underwater monitor appeared first on Raspberry Pi.

Amazon SageMaker Updates – Tokyo Region, CloudFormation, Chainer, and GreenGrass ML

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/sagemaker-tokyo-summit-2018/

Today, at the AWS Summit in Tokyo we announced a number of updates and new features for Amazon SageMaker. Starting today, SageMaker is available in Asia Pacific (Tokyo)! SageMaker also now supports CloudFormation. A new machine learning framework, Chainer, is now available in the SageMaker Python SDK, in addition to MXNet and Tensorflow. Finally, support for running Chainer models on several devices was added to AWS Greengrass Machine Learning.

Amazon SageMaker Chainer Estimator


Chainer is a popular, flexible, and intuitive deep learning framework. Chainer networks work on a “Define-by-Run” scheme, where the network topology is defined dynamically via forward computation. This is in contrast to many other frameworks which work on a “Define-and-Run” scheme where the topology of the network is defined separately from the data. A lot of developers enjoy the Chainer scheme since it allows them to write their networks with native python constructs and tools.

Luckily, using Chainer with SageMaker is just as easy as using a TensorFlow or MXNet estimator. In fact, it might even be a bit easier since it’s likely you can take your existing scripts and use them to train on SageMaker with very few modifications. With TensorFlow or MXNet users have to implement a train function with a particular signature. With Chainer your scripts can be a little bit more portable as you can simply read from a few environment variables like SM_MODEL_DIR, SM_NUM_GPUS, and others. We can wrap our existing script in a if __name__ == '__main__': guard and invoke it locally or on sagemaker.


import argparse
import os

if __name__ =='__main__':

    parser = argparse.ArgumentParser()

    # hyperparameters sent by the client are passed as command-line arguments to the script.
    parser.add_argument('--epochs', type=int, default=10)
    parser.add_argument('--batch-size', type=int, default=64)
    parser.add_argument('--learning-rate', type=float, default=0.05)

    # Data, model, and output directories
    parser.add_argument('--output-data-dir', type=str, default=os.environ['SM_OUTPUT_DATA_DIR'])
    parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])
    parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN'])
    parser.add_argument('--test', type=str, default=os.environ['SM_CHANNEL_TEST'])

    args, _ = parser.parse_known_args()

    # ... load from args.train and args.test, train a model, write model to args.model_dir.

Then, we can run that script locally or use the SageMaker Python SDK to launch it on some GPU instances in SageMaker. The hyperparameters will get passed in to the script as CLI commands and the environment variables above will be autopopulated. When we call fit the input channels we pass will be populated in the SM_CHANNEL_* environment variables.


from sagemaker.chainer.estimator import Chainer
# Create my estimator
chainer_estimator = Chainer(
    entry_point='example.py',
    train_instance_count=1,
    train_instance_type='ml.p3.2xlarge',
    hyperparameters={'epochs': 10, 'batch-size': 64}
)
# Train my estimator
chainer_estimator.fit({'train': train_input, 'test': test_input})

# Deploy my estimator to a SageMaker Endpoint and get a Predictor
predictor = chainer_estimator.deploy(
    instance_type="ml.m4.xlarge",
    initial_instance_count=1
)

Now, instead of bringing your own docker container for training and hosting with Chainer, you can just maintain your script. You can see the full sagemaker-chainer-containers on github. One of my favorite features of the new container is built-in chainermn for easy multi-node distribution of your chainer training jobs.

There’s a lot more documentation and information available in both the README and the example notebooks.

AWS GreenGrass ML with Chainer

AWS GreenGrass ML now includes a pre-built Chainer package for all devices powered by Intel Atom, NVIDIA Jetson, TX2, and Raspberry Pi. So, now GreenGrass ML provides pre-built packages for TensorFlow, Apache MXNet, and Chainer! You can train your models on SageMaker then easily deploy it to any GreenGrass-enabled device using GreenGrass ML.

JAWS UG

I want to give a quick shout out to all of our wonderful and inspirational friends in the JAWS UG who attended the AWS Summit in Tokyo today. I’ve very much enjoyed seeing your pictures of the summit. Thanks for making Japan an amazing place for AWS developers! I can’t wait to visit again and meet with all of you.

Randall

1834: The First Cyberattack

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/05/1834_the_first_.html

Tom Standage has a great story of the first cyberattack against a telegraph network.

The Blanc brothers traded government bonds at the exchange in the city of Bordeaux, where information about market movements took several days to arrive from Paris by mail coach. Accordingly, traders who could get the information more quickly could make money by anticipating these movements. Some tried using messengers and carrier pigeons, but the Blanc brothers found a way to use the telegraph line instead. They bribed the telegraph operator in the city of Tours to introduce deliberate errors into routine government messages being sent over the network.

The telegraph’s encoding system included a “backspace” symbol that instructed the transcriber to ignore the previous character. The addition of a spurious character indicating the direction of the previous day’s market movement, followed by a backspace, meant the text of the message being sent was unaffected when it was written out for delivery at the end of the line. But this extra character could be seen by another accomplice: a former telegraph operator who observed the telegraph tower outside Bordeaux with a telescope, and then passed on the news to the Blancs. The scam was only uncovered in 1836, when the crooked operator in Tours fell ill and revealed all to a friend, who he hoped would take his place. The Blanc brothers were put on trial, though they could not be convicted because there was no law against misuse of data networks. But the Blancs’ pioneering misuse of the French network qualifies as the world’s first cyber-attack.

Amazon Neptune Generally Available

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/amazon-neptune-generally-available/

Amazon Neptune is now Generally Available in US East (N. Virginia), US East (Ohio), US West (Oregon), and EU (Ireland). Amazon Neptune is a fast, reliable, fully-managed graph database service that makes it easy to build and run applications that work with highly connected datasets. At the core of Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with millisecond latencies. Neptune supports two popular graph models, Property Graph and RDF, through Apache TinkerPop Gremlin and SPARQL, allowing you to easily build queries that efficiently navigate highly connected datasets. Neptune can be used to power everything from recommendation engines and knowledge graphs to drug discovery and network security. Neptune is fully-managed with automatic minor version upgrades, backups, encryption, and fail-over. I wrote about Neptune in detail for AWS re:Invent last year and customers have been using the preview and providing great feedback that the team has used to prepare the service for GA.

Now that Amazon Neptune is generally available there are a few changes from the preview:

Launching an Amazon Neptune Cluster

Launching a Neptune cluster is as easy as navigating to the AWS Management Console and clicking create cluster. Of course you can also launch with CloudFormation, the CLI, or the SDKs.

You can monitor your cluster health and the health of individual instances through Amazon CloudWatch and the console.

Additional Resources

We’ve created two repos with some additional tools and examples here. You can expect continuous development on these repos as we add additional tools and examples.

  • Amazon Neptune Tools Repo
    This repo has a useful tool for converting GraphML files into Neptune compatible CSVs for bulk loading from S3.
  • Amazon Neptune Samples Repo
    This repo has a really cool example of building a collaborative filtering recommendation engine for video game preferences.

Purpose Built Databases

There’s an industry trend where we’re moving more and more onto purpose-built databases. Developers and businesses want to access their data in the format that makes the most sense for their applications. As cloud resources make transforming large datasets easier with tools like AWS Glue, we have a lot more options than we used to for accessing our data. With tools like Amazon Redshift, Amazon Athena, Amazon Aurora, Amazon DynamoDB, and more we get to choose the best database for the job or even enable entirely new use-cases. Amazon Neptune is perfect for workloads where the data is highly connected across data rich edges.

I’m really excited about graph databases and I see a huge number of applications. Looking for ideas of cool things to build? I’d love to build a web crawler in AWS Lambda that uses Neptune as the backing store. You could further enrich it by running Amazon Comprehend or Amazon Rekognition on the text and images found and creating a search engine on top of Neptune.

As always, feel free to reach out in the comments or on twitter to provide any feedback!

Randall