Tag Archives: serverless

New – Step Functions Support for Dynamic Parallelism

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-step-functions-support-for-dynamic-parallelism/

Microservices make applications easier to scale and faster to develop, but coordinating the components of a distributed application can be a daunting task. AWS Step Functions is a fully managed service that makes coordinating tasks easier by letting you design and run workflows that are made of steps, each step receiving as input the output of the previous step. For example, Novartis Institutes for Biomedical Research is using Step Functions to empower scientists to run image analysis without depending on cluster experts.

Step Functions added some very interesting capabilities recently, such as callback patterns, to simplify the integration of human activities and third-party services, and nested workflows, to assemble together modular, reusable workflows. Today, we are adding support for dynamic parallelism within a workflow!

How Dynamic Parallelism Works
States machines are defined using the Amazon States Language, a JSON-based, structured language. The Parallel state can be used to execute in parallel a fixed number of branches defined in the state machine. Now, Step Functions supports a new Map state type for dynamic parallelism.

To configure a Map state, you define an Iterator, which is a complete sub-workflow. When a Step Functions execution enters a Map state, it will iterate over a JSON array in the state input. For each item, the Map state will execute one sub-workflow, potentially in parallel. When all sub-workflow executions complete, the Map state will return an array containing the output for each item processed by the Iterator.

You can configure an upper bound on how many concurrent sub-workflows Map executes by adding the MaxConcurrency field. The default value is 0, which places no limit on parallelism and iterations are invoked as concurrently as possible. A MaxConcurrency value of 1 has the effect to invoke the Iterator one element at a time, in the order of their appearance in the input state, and will not start an iteration until the previous iteration has completed execution.

One way to use the new Map state is to leverage fan-out or scatter-gather messaging patterns in your workflows:

  • Fan-out is a applied when delivering a message to multiple destinations, and can be useful in workflows such as order processing or batch data processing. For example, you can retrieve arrays of messages from Amazon SQS and Map will send each message to a separate AWS Lambda function.
  • Scatter-gather broadcasts a single message to multiple destinations (scatter) and then aggregates the responses back for the next steps (gather). This can be useful in file processing and test automation. For example, you can transcode ten 500 MB media files in parallel and then join to create a 5 GB file.

Like Parallel and Task states, Map supports Retry and Catch fields to handle service and custom exceptions. You can also apply Retry and Catch to states inside your Iterator to handle exceptions. If any Iterator execution fails, because of an unhandled error or by transitioning to a Fail state, the entire Map state is considered to have failed and all its iterations are stopped. If the error is not handled by the Map state itself, Step Functions stops the workflow execution with an error.

Using the Map State
Let’s build a workflow to process an order and, by using the Map state, to work on the items in the order in parallel. The tasks executed as part of this workflow are all Lambda functions, but with Step Functions you can use other AWS service integrations and have code running on EC2 instances, containers, or on-premises infrastructure.

Here’s our sample order, expressed as a JSON document, for a few books, plus some coffee to drink while reading them. The order has a detail section, where there is a list of items that are part of the order.

{
  "orderId": "12345678",
  "orderDate": "20190820101213",
  "detail": {
    "customerId": "1234",
    "deliveryAddress": "123, Seattle, WA",
    "deliverySpeed": "1-day",
    "paymentMethod": "aCreditCard",
    "items": [
      {
        "productName": "Agile Software Development",
        "category": "book",
        "price": 60.0,
        "quantity": 1
      },
      {
        "productName": "Domain-Driven Design",
        "category": "book",
        "price": 32.0,
        "quantity": 1
      },
      {
        "productName": "The Mythical Man Month",
        "category": "book",
        "price": 18.0,
        "quantity": 1
      },
      {
        "productName": "The Art of Computer Programming",
        "category": "book",
        "price": 180.0,
        "quantity": 1
      },
      {
        "productName": "Ground Coffee, Dark Roast",
        "category": "grocery",
        "price": 8.0,
        "quantity": 6
      }
    ]
  }
}

To process this order, I am using a state machine defining how the different tasks should be executed. The Step Functions console creates a visual representation of the workflow I am building:

  • First, I validate and check the payment.
  • Then, I process the items in the order, potentially in parallel, to check their availability, prepare for delivery, and start the delivery process.
  • At the end, a summary of the order is sent to the customer.
  • In case the payment check fails, I intercept that, for example to send a notification to the customer.

 

Here is the same state machine definition expressed as a JSON document. The ProcessAllItems state is using Map to process items in the order in parallel. In this case, I limit concurrency to 3 using the MaxConcurrency field. Inside the Iterator, I can put a sub-workflow of arbitrary complexity. In this case, I have three steps, to CheckAvailability, PrepareForDelivery, and StartDelivery of the item. Each of this step can Retry and Catch errors to make the sub-workflow execution more reliable, for example in case of integrations with external services.

{
  "StartAt": "ValidatePayment",
  "States": {
    "ValidatePayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-west-2:123456789012:function:validatePayment",
      "Next": "CheckPayment"
    },
    "CheckPayment": {
      "Type": "Choice",
      "Choices": [
        {
          "Not": {
            "Variable": "$.payment",
            "StringEquals": "Ok"
          },
          "Next": "PaymentFailed"
        }
      ],
      "Default": "ProcessAllItems"
    },
    "PaymentFailed": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-west-2:123456789012:function:paymentFailed",
      "End": true
    },
    "ProcessAllItems": {
      "Type": "Map",
      "InputPath": "$.detail",
      "ItemsPath": "$.items",
      "MaxConcurrency": 3,
      "Iterator": {
        "StartAt": "CheckAvailability",
        "States": {
          "CheckAvailability": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:us-west-2:123456789012:function:checkAvailability",
            "Retry": [
              {
                "ErrorEquals": [
                  "TimeOut"
                ],
                "IntervalSeconds": 1,
                "BackoffRate": 2,
                "MaxAttempts": 3
              }
            ],
            "Next": "PrepareForDelivery"
          },
          "PrepareForDelivery": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:us-west-2:123456789012:function:prepareForDelivery",
            "Next": "StartDelivery"
          },
          "StartDelivery": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:us-west-2:123456789012:function:startDelivery",
            "End": true
          }
        }
      },
      "ResultPath": "$.detail.processedItems",
      "Next": "SendOrderSummary"
    },
    "SendOrderSummary": {
      "Type": "Task",
      "InputPath": "$.detail.processedItems",
      "Resource": "arn:aws:lambda:us-west-2:123456789012:function:sendOrderSummary",
      "ResultPath": "$.detail.summary",
      "End": true
    }
  }
}

The Lambda functions used by this workflow are not aware of the overall structure of the order JSON document. They just need to know the part of the input state they are going to process. This is a best practice to make those functions easily reusable in multiple workflows. The state machine definition is manipulating the path used for the input and the output of the functions using JsonPath syntax via the InputPathItemsPathResultPath, and OutputPath fields:

  • InputPath is used to filter the data in the input state, for example to only pass the detail of the order to the Iterator.
  • ItemsPath is specific to the Map state and is used to identify where, in the input, the array field to process is found, for example to process the items inside the detail of the order.
  • ResultPath makes it possible to add the output of a task to the input state, and not overwrite it completely, for example to add a summary to the detail of the order.
  • I am not using OutputPath this time, but it could be useful to filter out unwanted information and pass only the portion of JSON that you care about to the next state. For example, to send as output only the detail of the order.

Optionally, the Parameters field may be used to customize the raw input used for each iteration. For example, the deliveryAddress is in the detail of the order, but not in each item. To have the Iterator have an index of the items, and access the deliveryAddress, I can add this to a Map state:

"Parameters": {
  "index.$": "$$.Map.Item.Index",
  "item.$": "$$.Map.Item.Value",
  "deliveryAddress.$": "$.deliveryAddress"
}

Available Now
This new feature is available today in all regions where Step Functions is offered. Dynamic parallelism was probably the most requested features for Step Functions. It unblocks the implementation of new use cases and can help optimize existing ones. Let us know what are you going to use it for!

How We Design Features for Wrangler, the Cloudflare Workers CLI

Post Syndicated from Ashley M Lewis original https://blog.cloudflare.com/how-we-design-features-for-wrangler/

How We Design Features for Wrangler, the Cloudflare Workers CLI

How We Design Features for Wrangler, the Cloudflare Workers CLI

The most recent update to Wrangler, version 1.3.1, introduces important new features for developers building Cloudflare Workers — from built-in deployment environments to first class support for Workers KV. Wrangler is Cloudflare’s first officially supported CLI. Branching into this field of software has been a novel experience for us engineers and product folks on the Cloudflare Workers team.

As part of the 1.3.1 release, the folks on the Workers Developer Experience team dove into the thought process that goes into building out features for a CLI and thinking like users. Because while we wish building a CLI were as easy as our teammate Avery tweeted…


… it brings design challenges that many of us have never encountered. To overcome these challenges successfully requires deep empathy for users across the entire team, as well as the ability to address ambiguous questions related to how developers write Workers.

Wrangler, meet Workers KV

Our new KV functionality introduced a host of new features, from creating KV namespaces to bulk uploading key-value pairs for use within a Worker. This new functionality primarily consisted of logic for interacting with the Workers KV API, meaning that the technical work under “the hood” was relatively straightforward. Figuring out how to cleanly represent these new features to Wrangler users, however, became the fundamental question of this release.

Designing the invocations for new KV functionality unsurprisingly required multiple iterations, and taught us a lot about usability along the way!

Attempt 1

For our initial pass, the path originally seemed so obvious. (Narrator: It really, really wasn’t). We hypothesized that having Wrangler support familiar commands — like ls and rm — would be a reasonable mapping of familiar command line tools to Workers KV, and ended up with the following set of invocations below:

# creates a new KV Namespace
$ wrangler kv add myNamespace									
	
# sets a string key that doesn't expire
$ wrangler kv set myKey=”someStringValue”

# sets many keys
$ wrangler kv set myKey=”someStringValue” myKey2=”someStringValue2” ...

# sets a volatile (expiring) key that expires in 60 s
$ wrangler kv set myVolatileKey=path/to/value --ttl 60s

# deletes three keys
$ wrangler kv rm myNamespace myKey1 myKey2 myKey3

# lists all your namespaces
$ wrangler kv ls

# lists all the keys for a namespace
$ wrangler kv ls myNamespace

# removes all keys from a namespace, then removes the namespace		
$ wrangler kv rm -r myNamespace

While these commands invoked familiar shell utilities, they made interacting with your KV namespace a lot more like interacting with a filesystem than a key value store. The juxtaposition of a well-known command like ls with a non-command, set, was confusing. Additionally, mapping preexisting command line tools to KV actions was not a good 1-1 mapping (especially for rm -r; there is no need to recursively delete a KV namespace like a directory if you can just delete the namespace!)

This draft also surfaced use cases we needed to support: namely, we needed support for actions like easy bulk uploads from a file. This draft required users to enter every KV pair in the command line instead of reading from a file of key-value pairs; this was also a non-starter.

Finally, these KV subcommands caused confusion about actions to different resources. For example, the command for listing your Workers KV namespaces looked a lot like the command for listing keys within a namespace.

Going forward, we needed to meet these newly identified needs.

Attempt 2

Our next attempt shed the shell utilities in favor of simple, declarative subcommands like create, list, and delete. It also addressed the need for easy-to-use bulk uploads by allowing users to pass a JSON file of keys and values to Wrangler.

# create a namespace
$ wrangler kv create namespace <title>

# delete a namespace
$ wrangler kv delete namespace <namespace-id>

# list namespaces
$ wrangler kv list namespace

# write key-value pairs to a namespace, with an optional expiration flag
$ wrangler kv write key <namespace-id> <key> <value> --ttl 60s

# delete a key from a namespace
$ wrangler kv delete key <namespace-id> <key>

# list all keys in a namespace
$ wrangler kv list key <namespace-id>

# write bulk kv pairs. can be json file or directory; if dir keys will be file paths from root, value will be contents of files
$ wrangler kv write bulk ./path/to/assets

# delete bulk pairs; same input functionality as above
$ wrangler kv delete bulk ./path/to/assets

Given the breadth of new functionality we planned to introduce, we also built out a taxonomy of new subcommands to ensure that invocations for different resources — namespaces, keys, and bulk sets of key-value pairs — were consistent:

How We Design Features for Wrangler, the Cloudflare Workers CLI

Designing invocations with taxonomies became a crucial part of our development process going forward, and gave us a clear look at the “big picture” of our new KV features.

This approach was closer to what we wanted. It offered bulk put and bulk delete operations that would read multiple key-value pairs from a JSON file. After specifying an action subcommand (e.g. delete), users now explicitly stated which resource an action applied to (namespace , key, bulk) and reduced confusion about which action applied to which KV component.

This draft, however, was still not as explicit as we wanted it to be. The distinction between operations on namespaces versus keys was not as obvious as we wanted, and we still feared the possibility of different delete operations accidentally producing unwanted deletes (a possibly disastrous outcome!)

Attempt 3

We really wanted to help differentiate where in the hierarchy of structs a user was operating at any given time. Were they operating on namespaces, keys, or bulk sets of keys in a given operation, and how could we make that as clear as possible? We looked around, comparing the ways CLIs from kubectl to Heroku’s handled commands affecting different objects. We landed on a pleasing pattern inspired by Heroku’s CLI: colon-delimited command namespacing:

plugins:install PLUGIN    # installs a plugin into the CLI
plugins:link [PATH]       # links a local plugin to the CLI for development
plugins:uninstall PLUGIN  # uninstalls or unlinks a plugin
plugins:update            # updates installed plugins

So we adopted kv:namespace, kv:key, and kv:bulk to semantically separate our commands:

# namespace commands operate on namespaces
$ wrangler kv:namespace create <title> [--env]
$ wrangler kv:namespace delete <binding> [--env]
$ wrangler kv:namespace rename <binding> <new-title> [--env]
$ wrangler kv:namespace list [--env]
# key commands operate on individual keys
$ wrangler kv:key write <binding> <key>=<value> [--env | --ttl | --exp]
$ wrangler kv:key delete <binding> <key> [--env]
$ wrangler kv:key list <binding> [--env]
# bulk commands take a user-generated JSON file as an argument
$ wrangler kv:bulk write <binding> ./path/to/data.json [--env]
$ wrangler kv:bulk delete <binding> ./path/to/data.json [--env]

And ultimately ended up with this topology:

How We Design Features for Wrangler, the Cloudflare Workers CLI

We were even closer to our desired usage pattern; the object acted upon was explicit to users, and the action applied to the object was also clear.

There was one usage issue left. Supplying namespace-ids–a field that specifies which Workers KV namespace to perform an action to–required users to get their clunky KV namespace-id (a string like 06779da6940b431db6e566b4846d64db) and provide it in the command-line under the namespace-id option. This namespace-id value is what our Workers KV API expects in requests, but would be cumbersome for users to dig up and provide, let alone frequently use.

The solution we came to takes advantage of the wrangler.toml present in every Wrangler-generated Worker. To publish a Worker that uses a Workers KV store, the following field is needed in the Worker’s wrangler.toml:

kv-namespaces = [
	{ binding = "TEST_NAMESPACE", id = "06779da6940b431db6e566b4846d64db" }
]

This field specifies a Workers KV namespace that is bound to the name TEST_NAMESPACE, such that a Worker script can access it with logic like:

TEST_NAMESPACE.get(“my_key”);

We also decided to take advantage of this wrangler.toml field to allow users to specify a KV binding name instead of a KV namespace id. Upon providing a KV binding name, Wrangler could look up the associated id in wrangler.toml and use that for Workers KV API calls.

Wrangler users performing actions on KV namespaces could simply provide --binding TEST_NAMESPACE for their KV calls let Wrangler retrieve its ID from wrangler.toml. Users can still specify --namespace-id directly if they do not have namespaces specified in their wrangler.toml.

Finally, we reached our happy point: Wrangler’s new KV subcommands were explicit, offered functionality for both individual and bulk actions with Workers KV, and felt ergonomic for Wrangler users to integrate into their day-to-day operations.

Lessons Learned

Throughout this design process, we identified the following takeaways to carry into future Wrangler work:

  1. Taxonomies of your CLI’s subcommands and invocations are a great way to ensure consistency and clarity. CLI users tend to anticipate similar semantics and workflows within a CLI, so visually documenting all paths for the CLI can greatly help with identifying where new work can be consistent with older semantics. Drawing out these taxonomies can also expose missing features that seem like a fundamental part of the “big picture” of a CLI’s functionality.
  2. Use other CLIs for inspiration and sanity checking. Drawing logic from popular CLIs helped us confirm our assumptions about what users like, and learn established patterns for complex CLI invocations.
  3. Avoid logic that requires passing in raw ID strings. Testing CLIs a lot means that remembering and re-pasting ID values gets very tedious very quickly. Emphasizing a set of purely human-readable CLI commands and arguments makes for a far more intuitive experience. When possible, taking advantage of configuration files (like we did with wrangler.toml) offers a straightforward way to provide mappings of human-readable names to complex IDs.

We’re excited to continue using these design principles we’ve learned and documented as we grow Wrangler into a one-stop Cloudflare Workers shop.

If you’d like to try out Wrangler, check it out on GitHub and let us know what you think! We would love your feedback.

How We Design Features for Wrangler, the Cloudflare Workers CLI

Learn about AWS Services & Solutions – September AWS Online Tech Talks

Post Syndicated from Jenny Hang original https://aws.amazon.com/blogs/aws/learn-about-aws-services-solutions-september-aws-online-tech-talks/

Learn about AWS Services & Solutions – September AWS Online Tech Talks

AWS Tech Talks

Join us this September to learn about AWS services and solutions. The AWS Online Tech Talks are live, online presentations that cover a broad range of topics at varying technical levels. These tech talks, led by AWS solutions architects and engineers, feature technical deep dives, live demonstrations, customer examples, and Q&A with AWS experts. Register Now!

Note – All sessions are free and in Pacific Time.

Tech talks this month:

 

Compute:

September 23, 2019 | 11:00 AM – 12:00 PM PTBuild Your Hybrid Cloud Architecture with AWS – Learn about the extensive range of services AWS offers to help you build a hybrid cloud architecture best suited for your use case.

September 26, 2019 | 1:00 PM – 2:00 PM PTSelf-Hosted WordPress: It’s Easier Than You Think – Learn how you can easily build a fault-tolerant WordPress site using Amazon Lightsail.

October 3, 2019 | 11:00 AM – 12:00 PM PTLower Costs by Right Sizing Your Instance with Amazon EC2 T3 General Purpose Burstable Instances – Get an overview of T3 instances, understand what workloads are ideal for them, and understand how the T3 credit system works so that you can lower your EC2 instance costs today.

 

Containers:

September 26, 2019 | 11:00 AM – 12:00 PM PTDevelop a Web App Using Amazon ECS and AWS Cloud Development Kit (CDK) – Learn how to build your first app using CDK and AWS container services.

 

Data Lakes & Analytics:

September 26, 2019 | 9:00 AM – 10:00 AM PTBest Practices for Provisioning Amazon MSK Clusters and Using Popular Apache Kafka-Compatible Tooling – Learn best practices on running Apache Kafka production workloads at a lower cost on Amazon MSK.

 

Databases:

September 25, 2019 | 1:00 PM – 2:00 PM PTWhat’s New in Amazon DocumentDB (with MongoDB compatibility) – Learn what’s new in Amazon DocumentDB, a fully managed MongoDB compatible database service designed from the ground up to be fast, scalable, and highly available.

October 3, 2019 | 9:00 AM – 10:00 AM PTBest Practices for Enterprise-Class Security, High-Availability, and Scalability with Amazon ElastiCache – Learn about new enterprise-friendly Amazon ElastiCache enhancements like customer managed key and online scaling up or down to make your critical workloads more secure, scalable and available.

 

DevOps:

October 1, 2019 | 9:00 AM – 10:00 AM PT – CI/CD for Containers: A Way Forward for Your DevOps Pipeline – Learn how to build CI/CD pipelines using AWS services to get the most out of the agility afforded by containers.

 

Enterprise & Hybrid:

September 24, 2019 | 1:00 PM – 2:30 PM PT Virtual Workshop: How to Monitor and Manage Your AWS Costs – Learn how to visualize and manage your AWS cost and usage in this virtual hands-on workshop.

October 2, 2019 | 1:00 PM – 2:00 PM PT – Accelerate Cloud Adoption and Reduce Operational Risk with AWS Managed Services – Learn how AMS accelerates your migration to AWS, reduces your operating costs, improves security and compliance, and enables you to focus on your differentiating business priorities.

 

IoT:

September 25, 2019 | 9:00 AM – 10:00 AM PTComplex Monitoring for Industrial with AWS IoT Data Services – Learn how to solve your complex event monitoring challenges with AWS IoT Data Services.

 

Machine Learning:

September 23, 2019 | 9:00 AM – 10:00 AM PTTraining Machine Learning Models Faster – Learn how to train machine learning models quickly and with a single click using Amazon SageMaker.

September 30, 2019 | 11:00 AM – 12:00 PM PTUsing Containers for Deep Learning Workflows – Learn how containers can help address challenges in deploying deep learning environments.

October 3, 2019 | 1:00 PM – 2:30 PM PTVirtual Workshop: Getting Hands-On with Machine Learning and Ready to Race in the AWS DeepRacer League – Join DeClercq Wentzel, Senior Product Manager for AWS DeepRacer, for a presentation on the basics of machine learning and how to build a reinforcement learning model that you can use to join the AWS DeepRacer League.

 

AWS Marketplace:

September 30, 2019 | 9:00 AM – 10:00 AM PTAdvancing Software Procurement in a Containerized World – Learn how to deploy applications faster with third-party container products.

 

Migration:

September 24, 2019 | 11:00 AM – 12:00 PM PTApplication Migrations Using AWS Server Migration Service (SMS) – Learn how to use AWS Server Migration Service (SMS) for automating application migration and scheduling continuous replication, from your on-premises data centers or Microsoft Azure to AWS.

 

Networking & Content Delivery:

September 25, 2019 | 11:00 AM – 12:00 PM PTBuilding Highly Available and Performant Applications using AWS Global Accelerator – Learn how to build highly available and performant architectures for your applications with AWS Global Accelerator, now with source IP preservation.

September 30, 2019 | 1:00 PM – 2:00 PM PTAWS Office Hours: Amazon CloudFront – Just getting started with Amazon CloudFront and [email protected]? Get answers directly from our experts during AWS Office Hours.

 

Robotics:

October 1, 2019 | 11:00 AM – 12:00 PM PTRobots and STEM: AWS RoboMaker and AWS Educate Unite! – Come join members of the AWS RoboMaker and AWS Educate teams as we provide an overview of our education initiatives and walk you through the newly launched RoboMaker Badge.

 

Security, Identity & Compliance:

October 1, 2019 | 1:00 PM – 2:00 PM PTDeep Dive on Running Active Directory on AWS – Learn how to deploy Active Directory on AWS and start migrating your windows workloads.

 

Serverless:

October 2, 2019 | 9:00 AM – 10:00 AM PTDeep Dive on Amazon EventBridge – Learn how to optimize event-driven applications, and use rules and policies to route, transform, and control access to these events that react to data from SaaS apps.

 

Storage:

September 24, 2019 | 9:00 AM – 10:00 AM PTOptimize Your Amazon S3 Data Lake with S3 Storage Classes and Management Tools – Learn how to use the Amazon S3 Storage Classes and management tools to better manage your data lake at scale and to optimize storage costs and resources.

October 2, 2019 | 11:00 AM – 12:00 PM PTThe Great Migration to Cloud Storage: Choosing the Right Storage Solution for Your Workload – Learn more about AWS storage services and identify which service is the right fit for your business.

 

 

How Castle is Building Codeless Customer Account Protection

Post Syndicated from Guest Author original https://blog.cloudflare.com/castle-building-codeless-customer-account-protection/

How Castle is Building Codeless Customer Account Protection

How Castle is Building Codeless Customer Account Protection

This is a guest post by Johanna Larsson, of Castle, who designed and built the Castle Cloudflare app and the supporting infrastructure.

Strong security should be easy.

Asking your consumers again and again to take responsibility for their security through robust passwords and other security measures doesn’t work. The responsibility of security needs to shift from end users to the companies who serve them.

Castle is leading the way for companies to better protect their online accounts with millions of consumers being protected every day. Uniquely, Castle extends threat prevention and protection for both pre and post login ensuring you can keep friction low but security high. With realtime responses and automated workflows for account recovery, overwhelmed security teams are given a hand. However, when you’re that busy, sometimes deploying new solutions takes more time than you have. Reducing time to deployment was a priority so Castle turned to Cloudflare Workers.

User security and friction

When security is no longer optional and threats are not black or white, security teams are left with trying to determine how to allow end-user access and transaction completions when there are hints of risk, or when not all of the information is available. Keeping friction low is important to customer experience. Castle helps organizations be more dynamic and proactive by making continuous security decisions based on realtime risk and trust.

Some of the challenges with traditional solutions is that they are often just focused on protecting the app or they are only focused on point of access, protecting against bot access for example. Tools specifically designed for securing user accounts however are fundamentally focused on protecting the accounts of the end-users, whether they are being targeting by human or bots. Being able to understand end-user behaviors and their devices both pre and post login is therefore critical in being able to truly protect each users. The key to protecting users is being able to decipher between normal and anomalous activity on an individual account and device basis. You also need a playbook to respond to anomalies and attacks with dedicated flows, that allows your end users to interact directly and provide feedback around security events.

By understanding the end user and their good behaviors, devices, and transactions, it is possible to automatically respond to account threats in real-time based on risk level and policy. This approach not only reduces end-user friction but enables security teams to feel more confident that they won’t ever be blocking a legitimate login or transaction.

Castle processes tens of millions of events every day through its APIs, including contextual information like headers, IP, and device types. The more information that can be associated with a request the better. This allows us to better recognize abnormalities and protect the end user. Collection of this information is done in two ways. One is done on the web application’s backend side through our SDKs and the other is done on the client side using our mobile SDK or browser script. Our experience shows that any integration of a security service based on user behavior and anomaly detection can involve many different parties across an organization, and it affects multiple layers of the tech stack. On top of the security related roles, it’s not unusual to also have to coordinate between backend, devops, and frontend teams. The information related to an end user session is often spread widely over a code base.

The cost of security

One of the biggest challenges in implementing a user-facing security and risk management solution is the variety of people and teams it needs attention from, each with competing priorities. Security teams are often understaffed and overwhelmed making it difficult to take on new projects. At the same time, it consumes time from product and engineering personnel on the application side, who are responsible for UX flows and performing continuous authentication post-login.

We’ve been experimenting with approaches where we can extract that complexity from your application code base, while also reducing the effort of integrating. At Castle, we believe that strong security should be easy.

How Castle is Building Codeless Customer Account Protection

With Cloudflare we found a service that enables us to create a more friendly, simple, and in the end, safe integration process by placing the security layer directly between the end user and your application. Security-related logic shouldn’t pollute your app, but should reside in a separate service, or shield, that covers your app. When the two environments are kept separate, this reduces the time and cost of implementing complex systems making integration and maintenance less stressful and much easier.

Our integration with Cloudflare aims to solve this implementation challenge, delivering end-to-end account protection for your users, both pre and post login, with the click of a button.

The codeless integration

In our quest for a purely codeless integration, key features are required. When every customer application is different, this means every integration is different. We want to solve this problem for you once and for all. To do this, we needed to move the security work away from the implementation details so that we could instead focus on describing the key interactions with the end user, like logins or bank transactions. We also wanted to empower key decision makers to recognize and handle crucial interactions in their systems. Creating a single solution that could be customized to fit each specific use case was a priority.

Building on top of Cloudflare’s platform, we made use of three unique and powerful products: Workers, Apps for Workers, and Workers KV.

Thanks to Workers we have full access to the interactions between the end user and your application. With their impressive performance, we can confidently run inline of website requests without creating noticeable latency. We will never slow down your site. And in order to achieve the flexibility required to match your specific use case, we created an internal configuration format that fully describes the interactions of devices and servers across HTTP, including web and mobile app traffic. It is in this Worker where we’ve implemented an advanced routing engine to match and collect information about requests and responses to events, directly from the edge. It also fully handles injecting the Castle browser script — one less thing to worry about.

All of this logic is kept separate from your application code, and through the Cloudflare App Store we are able to distribute this Worker, giving you control over when and where it is enabled, as well as what configurations are used. There’s no need to copy/paste code or manage your own Workers.

In order to achieve the required speed while running in distributed edge locations, we needed a high performing low latency datastore, and we found one in the Cloudflare Workers KV Store. Cloudflare Apps are not able to access the KV Store directly, but we’ve solved this by exposing it through a separate Worker that the Castle App connects to. Because traffic between Workers never leaves the Cloudflare network, this is both secure and fast enough to match your requirements. The KV Store allows us to maintain end user sessions across the world, and also gives us a place to store and update the configurations and sessions that drive the Castle App.

In combining these products we have a complete and codeless integration that is fully configurable and that won’t slow you down.

How does it work?

The data flow is straightforward. After installing the Castle App, Cloudflare will route your traffic through the Castle App, which uses the Castle Data Store and our API to intelligently protect your end users. The impact to traffic latency is minimal because most work is done in the background, not blocking the requests. Let’s dig deeper into each technical feature:

Script injection

One of the tools we use to verify user identity is a browser script: Castle.js. It is responsible for gathering device information and UI interaction behavior, and although it is not required for our service to function, it helps improve our verdicts. This means it’s important that it is properly added to every page in your web application. The Castle App, running between the end user and your application, is able to unobtrusively add the script to each page as it is served. In order for the script to also track page interactions it needs to be able to connect them to your users, which is done through a call to our script and also works out of the box with the Cloudflare interaction. This removes 100% of the integration work from your frontend teams.

Collect contextual information

The second half of the information that forms the basis of our security analysis is the information related to the request itself, such as IP and headers, as well as timestamps. Gathering this information may seem straightforward, but our experience shows some recurring problems in traditional integrations. IP-addresses are easily lost behind reverse proxies, as they need to be maintained as separate headers, like `X-Forwarded-For`, and the internal format of headers differs from platform to platform. Headers in general might get cut off based on whitelisting. The Castle App sees the original request as it comes in, with no outside influence or platform differences, enabling it to reliably create the context of the request. This saves your infrastructure and backend engineers from huge efforts debugging edge cases.

Advanced routing engine

Finally, in order to reliably recognize important events, like login attempts, we’ve built a fully configurable routing engine. This is fast enough to run inline of your web application, and supports near real-time configuration updates. It is powerful enough to translate requests to actual events in your system, like logins, purchases, profile updates or transactions. Using information from the request, it is then able to send this information to Castle, where you are able to analyze, verify and take action on suspicious activity. What’s even better, is that at any point in the future if you want to Castle protect a new critical user event – such as a withdrawal or transfer event – all it takes is adding a record to the configuration file. You never have to touch application code in order to expand your Castle integration across sensitive events.

We’ve put together an example TypeScript snippet that naively implements the flow and features we’ve discussed. The details are glossed over so that we can focus on the functionality.

addEventListener(event => event.respondWith(handleEvent(event)));

const respondWith = async (event: CloudflareEvent) => {
  // You configure the application with your Castle API key
  const { apiKey } = INSTALL_OPTIONS;
  const { request } = event;

  // Configuration is fetched from the KV Store
  const configuration = await getConfiguration(apiKey);

  // The session is also retrieved from the KV Store
  const session = await getUserSession(request);

  // Pass the request through and get the response
  let response = await fetch(request);

  // Using the configuration we can recognize events by running
  // the request+response and configuration through our matching engine
  const securityEvent = getMatchingEvent(request, response, configuration);

  if (securityEvent) {
    // With direct access to the raw request, we can confidently build the context
    // including a device ID generated by the browser script, IP, and headers
    const requestContext = getRequestContext(request);

    // Collecting the relevant information, the data is passed to the Castle API
    event.waitUntil(sendToCastle(securityEvent, session, requestContext));
  }

  // Because we have access to the response HTML page we can safely inject the browser
  // script. If the response is not an HTML page it is passed through untouched.
  response = injectScript(response, session);

  return response;
};

We hope we have inspired you and demonstrated how Workers can provide speed and flexibility when implementing end to end account protection for your end users with Castle. If you are curious about our service, learn more here.

Top Resources for API Architects and Developers

Post Syndicated from George Mao original https://aws.amazon.com/blogs/architecture/top-resources-for-api-architects-and-developers/

We hope you’ve enjoyed reading our series on API architecture and development. We wrote about best practices for REST APIs with Amazon API Gateway  and GraphQL APIs with AWS AppSync. This post will cover the top resources that all API developers should be aware of.

Tech Talks, Webinars, and Twitch Live Stream

The technical staff at AWS have produced a variety of digital media that cover new service launches, best practices, and customer questions. Be sure to review these videos for tips and tricks on building APIs:

  • Happy Little APIs: This is a multi part series produced by our awesome Developer Advocate, Eric Johnson. He leads a series of talks that demonstrate how to build a real world API.
  • API Gateway’s WebSocket webinar: API Gateway now supports real time APIs with Websockets. This webinar covers how to use this feature and why you should let API Gateway manage your realtime APIs.
  • Best practices for building enterprise grade APIs: API Gateway reduces the time it takes to build and deploy REST development but there are strategies that can make development, security, and management easier.
  • An Intro to AWS AppSync and GraphQL: AppSync helps you build sophisticated data applications with realtime and offline capabilities.

Gain Experience With Hands-On Workshops and Examples

One of the easiest ways to get started with Serverless REST API development is to use the Serverless Application Model (SAM). SAM lets you run APIs and Lambda functions locally on your machine for easy development and testing.

For example, you can configure API Gateway as an Event source for Lambda with just a few lines of code:

Type: Api
Properties:
Path: /photos
Method: post

There are many great examples on our GitHub page to help you get started with Authorization (IAMCognito), Request, Response,  various policies , and CORS configurations for API Gateway.

If you’re working with GraphQL, you should review the Amplify Framework. This is an official AWS project that helps you quickly build Web Applications with built in AuthN and backend APIs using REST or GraphQL. With just a few lines of code, you can have Amplify add all required configurations for your GraphQL API. You have two options to integrate your application with an AppSync API:

  1. Directly using the Amplify GraphQL Client
  2. Using the AWS AppSync SDK

An excellent walk through of the Amplify toolkit is available here, including an example showing how to create a single page web app using ReactJS powered by an AppSync GraphQL API.

Finally, if you are interested in a full hands on experience, take a look at:

  • The Amazon API Gateway WildRydes workshop. This workshop teaches you how to build a functional single page web app with a REST backend, powered by API Gateway.
  • The AWS AppSync GraphQL Photo Workshop. This workshop teaches you how to use Amplify to quickly build a Photo sharing web app, powered by AppSync.

Useful Documentation

The official AWS documentation is the source of truth for architects and developers. Get started with the API Gateway developer guide. API Gateway is currently has two APIs (V1 and V2) for managing the service. Here is where you can view the SDK and CLI reference.

Get started with the AppSync developer guide, and review the AppSync management API.

Summary

As an API architect, your job is not only to design and implement the best API for your use case, but your job is also to figure out which type of API is most cost effective for your product. For example, an application with high request volume (“chatty“) may benefit from a GraphQL implementation instead of REST.

API Gateway currently charges $3.50 / million requests and provides a free tier of 1 Million requests per month. There is tiered pricing that will reduce your costs as request volume rises. AppSync currently charges $4.00 / million for Query and Mutation requests.

While AppSync pricing per request is slightly higher, keep in mind that the nature of GraphQL APIs typically result in significantly fewer overall request numbers.

Finally, we encourage you to join us in the coming weeks — we will be starting a series of posts covering messaging best practices.

About the Author

George MaoGeorge Mao is a Specialist Solutions Architect at Amazon Web Services, focused on the Serverless platform. George is responsible for helping customers design and operate Serverless applications using services like Lambda, API Gateway, Cognito, and DynamoDB. He is a regular speaker at AWS Summits, re:Invent, and various tech events. George is a software engineer and enjoys contributing to open source projects, delivering technical presentations at technology events, and working with customers to design their applications in the Cloud. George holds a Bachelor of Computer Science and Masters of IT from Virginia Tech.

Fast WordPress Sites with Bluehost & Cloudflare Workers

Post Syndicated from Peter Dumanian original https://blog.cloudflare.com/fast-wordpress-sites-with-bluehost-cloudflare-workers/

Fast WordPress Sites with Bluehost & Cloudflare Workers

Fast WordPress Sites with Bluehost & Cloudflare Workers

WordPress is the most popular CMS (content management system) in the world, powering over a third of the top 10 million websites, according to W3Techs.

WordPress is an open source software project that many website service providers host for end customers to enable them to build WordPress sites and serve that content to visitors over the Internet.  For hosting providers, one of the opportunities and challenges is to host one version of WordPress on their infrastructure that is high performing for all their customers without modifying the WordPress code on a per customer basis.

Hosting providers are increasingly turning to Cloudflare’s Serverless Workers Platform to deliver high performance to their end customers by fixing performance issues at the edge while avoiding modifying code on an individual site basis.

One innovative WordPress hosting provider that Cloudflare has been working with to do this is Bluehost, a recommended web host by WordPress.org. In collaboration with Bluehost, Cloudflare’s Workers have been able to achieve a 40% performance improvement for those sites running Workers. Bluehost started with Cloudflare Workers code for Fast Google Fonts which in-lines the browser-specific font CSS and re-hosts the font files through the page origin. This removes the multiple calls to load the CSS and the font file from Google and improves WordPress site response time.  Bluehost then went further and added additional performance enhancements that rehosted commonly run third party scripts and caches dynamic HTML on the edge in conjunction with Bluehost’s own plugin infrastructure.

Bluehost will offer Cloudflare Workers in early 2020. Once implemented, customers will see faster response times, which could result in more website visitors sticking with the site while it renders. Additional benefits could include improved ad dollars from a higher number of impressions and ecommerce revenue from more shoppers.

“We were so impressed to see a 40% performance improvement for websites leveraging Workers, and can’t wait to offer this to our customers in 2020. Our team is excited to partner with Cloudflare and continue to innovate with Workers for added benefits for our customers,” said Suhaib Zaheer, General Manager for Bluehost.

Stay tuned for more performance improvements with Cloudflare Workers!

Live Preview: Build and Test Workers Faster with Wrangler CLI 1.2.0

Post Syndicated from Matt Alonso original https://blog.cloudflare.com/live-preview-build-and-test-workers-faster-with-wrangler-cli-1-2-0/

Live Preview: Build and Test Workers Faster with Wrangler CLI 1.2.0

As part of my internship on the Workers Developer Experience team, I set out to polish the Wrangler CLI for Cloudflare Workers. If you’re not familiar with Workers, the premise is quite simple: Write a bit of Javascript that takes in an HTTP request, does some processing, and spits out a response. The magic lies in where your Workers scripts run: on Cloudflare’s edge network, which spans 193 cities in more than 90 countries. Workers can be used for nearly anything from configuring Cloudflare caching behavior to building entire serverless web applications. And, you don’t have to worry about operations at all.

I was excited to focus on Wrangler, because Wrangler aims to make developing and publishing Workers projects a pleasant experience for everyone, whether you’re a solo dev working on the next big thing, or an engineer at a Fortune 100 enterprise. The whole point of serverless is about reducing friction, and Wrangler reflects that ethos.

However, when I started at Cloudflare in early June, some parts of the development experience still needed some love. While working on a new WASM tutorial for the Workers documentation, I noticed a storm brewing in my browser…

Live Preview: Build and Test Workers Faster with Wrangler CLI 1.2.0

Wrangler lets you test your Workers project with a subcommand called wrangler preview, and every time I called it to test a new change it opened a new tab. Fast iteration is the most crucial part of a good developer experience, and while the preview was fast, things were getting messy. I was fighting my tooling, having to keep track of the latest preview tab every time I wanted to test a new change. I knew that if I was annoyed about this, others would be too.

So, I thought about what our customers wanted: similarity with tooling that they already used. I set out to create an experience inspired by `webpack-dev-server` and other similar watch-and-build tools, where you would have a single tab that would refresh live with your latest changes. However, I knew that getting changes into the Workers runtime to achieve this goal would be a tall order for week 2 of my internship, so I started thinking about solutions to send updates directly to the previewer.

Wrangler is written in Rust, so I was able to utilize the crates.io ecosystem while developing this feature. I used the notify crate, which provides a cross-platform abstraction layer over the various file system event APIs provided by major OSes. However, there are some gotchas when implementing a file watcher that triggers a build and upload: you can’t simply trigger a build after every filesystem event, as a single file save can emit several events in quick succession depending on which editor you use! To prevent wasteful builds, I implemented a cooldown period, which only triggers the build process when no new file system events have been detected for at least 2 seconds. Rust’s rich standard library makes implementing concurrent behaviors like this very elegant:

/* rx.recv_timeout returns Ok if there was an event on the rx channel
 * or Err if the cooldown period has passed. The while let Ok(_) syntax
 * will end the loop if the cooldown period has ended, or restart the cooldown period if there was an event on the rx channel
 */
while let Ok(_) = rx.recv_timeout(cooldown) {
  message::working("Detected change during cooldown...");
}

Another challenge was handling communication with the previewer. I settled on an unconventional application of WebSockets, creating one to localhost to allow for a browser application to communicate with the Wrangler CLI running on the local machine. I coordinated with the Workers UI team to get my WebSocket client added to the preview UI, and with the security team to pass a security review for the feature, to make sure script contents were properly protected from exposure.

This was the result:

Live Preview: Build and Test Workers Faster with Wrangler CLI 1.2.0

This is what Developer Experience is all about. You should feel like 💆🏻‍♀️💆🏽‍♂️ when using Wrangler, not like 😡. If this isn’t the case, we want to hear about it.

Live Preview was shipped in the 1.2.0 release of Wrangler, exposed under wrangler preview --watch. It works for all Wrangler projects, even ones that use WebAssembly.

And to the Workers Developer Experience team, Dubs, Ashley, Avery, Gabbi, Kristian, Sven, and Victoria: thank you. Y’all are motivated, talented, and I genuinely had fun every day this summer.

Building a GraphQL server on the edge with Cloudflare Workers

Post Syndicated from Kristian Freeman original https://blog.cloudflare.com/building-a-graphql-server-on-the-edge-with-cloudflare-workers/

Building a GraphQL server on the edge with Cloudflare Workers

Building a GraphQL server on the edge with Cloudflare Workers

Today, we’re open-sourcing an exciting project that showcases the strengths of our Cloudflare Workers platform: workers-graphql-server is a batteries-included Apollo GraphQL server, designed to get you up and running quickly with GraphQL.

Building a GraphQL server on the edge with Cloudflare Workers
Testing GraphQL queries in the GraphQL Playground

As a full-stack developer, I’m really excited about GraphQL. I love building user interfaces with React, but as a project gets more complex, it can become really difficult to manage how your data is managed inside of an application. GraphQL makes that really easy – instead of having to recall the REST URL structure of your backend API, or remember when your backend server doesn’t quite follow REST conventions – you just tell GraphQL what data you want, and it takes care of the rest.

Cloudflare Workers is uniquely suited as a platform to being an incredible place to host a GraphQL server. Because your code is running on Cloudflare’s servers around the world, the average latency for your requests is extremely low, and by using Wrangler, our open-source command line tool for building and managing Workers projects, you can deploy new versions of your GraphQL server around the world within seconds.

If you’d like to try the GraphQL server, check out a demo GraphQL playground, deployed on Workers.dev. This optional add-on to the GraphQL server allows you to experiment with GraphQL queries and mutations, giving you a super powerful way to understand how to interface with your data, without having to hop into a codebase.

If you’re ready to get started building your own GraphQL server with our new open-source project, we’ve added a new tutorial to our Workers documentation to help you get up and running – check it out here!

Finally, if you’re interested in how the project works, or want to help contribute – it’s open-source! We’d love to hear your feedback and see your contributions. Check out the project on GitHub.

ICYMI: Serverless Q2 2019

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/icymi-serverless-q2-2019/

This post is courtesy of Moheeb Zara, Senior Developer Advocate – AWS Serverless

Welcome to the sixth edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all of the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, checkout what happened last quarter here.

April - June 2019

Amazon EventBridge

Before we dive in to all that happened in Q2, we’re excited about this quarter’s launch of Amazon EventBridge, the serverless event bus that connects application data from your own apps, SaaS, and AWS-as-a-service. This allows you to create powerful event-driven serverless applications using a variety of event sources.

Our very own AWS Solutions Architect, Mike Deck, sat down with AWS Serverless Hero Jeremy Daly and recorded a podcast on Amazon EventBridge. It’s a worthy listen if you’re interested in exploring all the features offered by this launch.

Now, back to Q2, here’s what’s new.

AWS Lambda

Lambda Monitoring

Amazon CloudWatch Logs Insights now allows you to see statistics from recent invocations of your Lambda functions in the Lambda monitoring tab.

Additionally, as of June, you can monitor the [email protected] functions associated with your Amazon CloudFront distributions directly from your Amazon CloudFront console. This includes a revamped monitoring dashboard for CloudFront distributions and [email protected] functions.

AWS Step Functions

Step Functions

AWS Step Functions now supports workflow execution events, which help in the building and monitoring of even-driven serverless workflows. Automatic Execution event notifications can be delivered upon start/completion of CloudWatch Events/Amazon EventBridge. This allows services such as AWS Lambda, Amazon SNS, Amazon Kinesis, or AWS Step Functions to respond to these events.

Additionally you can use callback patterns to automate workflows for applications with human activities and custom integrations with third-party services. You create callback patterns in minutes with less code to write and maintain, run without servers and infrastructure to manage, and scale reliably.

Amazon API Gateway

API Gateway Tag Based Control

Amazon API Gateway now offers tag-based access control for WebSocket APIs using AWS Identity and Access Management (IAM) policies, allowing you to categorize API Gateway resources for WebSocket APIs by purpose, owner, or other criteria.  With the addition of tag-based access control to WebSocket resources, you can now give permissions to WebSocket resources at various levels by creating policies based on tags. For example, you can grant full access to admins to while limiting access to developers.

You can now enforce a minimum Transport Layer Security (TLS) version and cipher suites through a security policy for connecting to your Amazon API Gateway custom domain.

In addition, Amazon API Gateway now allows you to define VPC Endpoint policies, enabling you to specify which Private APIs a VPC Endpoint can connect to. This enables granular security control using VPC Endpoint policies.

AWS Amplify

Amplify CLI (part of the open source Amplify Framework) now includes support for adding and configuring AWS Lambda triggers for events when using Amazon Cognito, Amazon Simple Storage Service, and Amazon DynamoDB as event sources. This means you can setup custom authentication flows for mobile and web applications via the Amplify CLI and Amazon Cognito User Pool as an authentication provider.

Amplify Console

Amplify Console,  a Git-based workflow for continuous deployment and hosting for fullstack serverless web apps, launched several updates to the build service including SAM CLI and custom container support.

Amazon Kinesis

Amazon Kinesis Data Firehose can now utilize AWS PrivateLink to securely ingest data. AWS PrivateLink provides private connectivity between VPCs, AWS services, and on-premises applications, securely over the Amazon network. When AWS PrivateLink is used with Amazon Kinesis Data Firehose, all traffic to a Kinesis Data Firehose from a VPC flows over a private connection.

You can now assign AWS resource tags to applications in Amazon Kinesis Data Analytics. These key/value tags can be used to organize and identify resources, create cost allocation reports, and control access to resources within Amazon Kinesis Data Analytics.

Amazon Kinesis Data Firehose is now available in the AWS GovCloud (US-East), Europe (Stockholm), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), and EU (London) regions.

For a complete list of where Amazon Kinesis Data Analytics is available, please see the AWS Region Table.

AWS Cloud9

Cloud9 Quick Starts

Amazon Web Services (AWS) Cloud9 integrated development environment (IDE) now has a Quick Start which deploys in the AWS cloud in about 30 minutes. This enables organizations to provide developers a powerful cloud-based IDE that can edit, run, and debug code in the browser and allow easy sharing and collaboration.

AWS Cloud9 is also now available in the EU (Frankfurt) and Asia Pacific (Tokyo) regions. For a current list of supported regions, see AWS Regions and Endpoints in the AWS documentation.

Amazon DynamoDB

You can now tag Amazon DynamoDB tables when you create them. Tags are labels you can attach to AWS resources to make them easier to manage, search, and filter.  Tagging support has also been extended to the AWS GovCloud (US) Regions.

DynamoDBMapper now supports Amazon DynamoDB transactional API calls. This support is included within the AWS SDK for Java. These transactional APIs provide developers atomic, consistent, isolated, and durable (ACID) operations to help ensure data correctness.

Amazon DynamoDB now applies adaptive capacity in real time in response to changing application traffic patterns, which helps you maintain uninterrupted performance indefinitely, even for imbalanced workloads.

AWS Training and Certification has launched Amazon DynamoDB: Building NoSQL Database–Driven Applications, a new self-paced, digital course available exclusively on edX.

Amazon Aurora

Amazon Aurora Serverless MySQL 5.6 can now be accessed using the built-in Data API enabling you to access Aurora Serverless with web services-based applications, including AWS LambdaAWS AppSync, and AWS Cloud9. For more check out this post.

Sharing snapshots of Aurora Serverless DB clusters with other AWS accounts or publicly is now possible. We are also giving you the ability to copy Aurora Serverless DB cluster snapshots across AWS regions.

You can now set the minimum capacity of your Aurora Serverless DB clusters to 1 Aurora Capacity Unit (ACU). With Aurora Serverless, you specify the minimum and maximum ACUs for your Aurora Serverless DB cluster instead of provisioning and managing database instances. Each ACU is a combination of processing and memory capacity. By setting the minimum capacity to 1 ACU, you can keep your Aurora Serverless DB cluster running at a lower cost.

AWS Serverless Application Repository

The AWS Serverless Application Repository is now available in 17 regions with the addition of the AWS GovCloud (US-West) region.

Region support includes Asia Pacific (Mumbai, Singapore, Sydney, Tokyo), Canada (Central), EU (Frankfurt, Ireland, London, Paris, Stockholm), South America (São Paulo), US West (N. California, Oregon), and US East (N. Virginia, Ohio).

Amazon Cognito

Amazon Cognito has launched a new API – AdminSetUserPassword – for the Cognito User Pool service that provides a way for administrators to set temporary or permanent passwords for their end users. This functionality is available for end users even when their verified phone or email are unavailable.

Serverless Posts

April

May

June

Events

Events this quarter

Senior Developer Advocates for AWS Serverless spoke at several conferences this quarter. Here are some recordings worth watching!

Tech Talks

We hold several AWS Online Tech Talks covering serverless tech talks throughout the year, so look out for them in the Serverless section of the AWS Online Tech Talks page. Here are the ones from Q2.

Twitch

Twitch Series

In April, we started a 13-week deep dive into building APIs on AWS as part of our Twitch Build On series. The Building Happy Little APIs series covers the common and not-so-common use cases for APIs on AWS and the features available to customers as they look to build secure, scalable, efficient, and flexible APIs.

There are also a number of other helpful video series covering Serverless available on the AWS Twitch Channel.

Build with Serverless on Twitch

Serverless expert and AWS Specialist Solutions architect, Heitor Lessa, has been hosting a weekly Twitch series since April. Join him and others as they build an end-to-end airline booking solution using serverless. The final episode airs on August 7th at Wednesday 8:00am PT.

Here’s a recap of the last quarter:

AWS re:Invent

AWS re:Invent 2019

AWS re:Invent 2019 is around the corner! From December 2 – 6 in Las Vegas, Nevada, join tens of thousands of AWS customers to learn, share ideas, and see exciting keynote announcements. Be sure to take a look at the growing catalog of serverless sessions this year.

Register for AWS re:Invent now!

What did we do at AWS re:Invent 2018? Check out our recap here: AWS re:Invent 2018 Recap at the San Francisco Loft

AWS Serverless Heroes

We urge you to explore the efforts of our AWS Serverless Heroes Community. This is a worldwide network of AWS Serverless experts with a diverse background of experience. For example, check out this post from last month where Marcia Villalba demonstrates how to set up unit tests for serverless applications.

Still looking for more?

The Serverless landing page has lots of information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

The Serverlist: Building out the SHAMstack

Post Syndicated from Connor Peshek original https://blog.cloudflare.com/serverlist-7th-edition/

The Serverlist: Building out the SHAMstack

Check out our seventh edition of The Serverlist below. Get the latest scoop on the serverless space, get your hands dirty with new developer tutorials, engage in conversations with other serverless developers, and find upcoming meetups and conferences to attend.

Sign up below to have The Serverlist sent directly to your mailbox.



Building a Modern CI/CD Pipeline in the Serverless Era with GitOps

Post Syndicated from Shimon Tolts original https://aws.amazon.com/blogs/aws/building-a-modern-ci-cd-pipeline-in-the-serverless-era-with-gitops/

Guest post by AWS Community Hero Shimon Tolts, CTO and co-founder at Datree.io. He specializes in developer tools and infrastructure, running a company that is 100% serverless.

In recent years, there was a major transition in the way you build and ship software. This was mainly around microservices, splitting code into small components, using infrastructure as code, and using Git as the single source of truth that glues it all together.

In this post, I discuss the transition and the different steps of modern software development to showcase the possible solutions for the serverless world. In addition, I list useful tools that were designed for this era.

What is serverless?

Before I dive into the wonderful world of serverless development and tooling, here’s what I mean by serverless. The AWS website talks about four main benefits:

  • No server management.
  • Flexible scaling.
  • Pay for value.
  • Automated high availability.

To me, serverless is any infrastructure that you don’t have to manage and scale yourself.
At my company Datree.io, we run 95% of our workload on AWS Fargate and 5% on AWS Lambda. We are a serverless company; we have zero Amazon EC2 instances in our AWS account. For more information, see the following:

What is GitOps?

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
According to Luis Faceira, a CI/CD consultant, GitOps is a way of working. You might look at it as an approach in which everything starts and ends with Git. Here are some key concepts:

  • Git as the SINGLE source of truth of a system
  • Git as the SINGLE place where we operate (create, change and destroy) ALL environments
  • ALL changes are observable/verifiable.

How you built software before the cloud

Back in the waterfall pre-cloud era, you used to have separate teams for development, testing, security, operations, monitoring, and so on.

Nowadays, in most organizations, there is a transition to full developer autonomy and developers owning the entire production path. The developer is the King – or Queen 🙂

Those teams (Ops/Security/IT/etc) used to be gatekeepers to validate and control every developer change. Now they have become more of a satellite unit that drives policy and sets best practices and standards. They are no longer the production bottleneck, so they provide organization-wide platforms and enablement solutions.

Everything is codified

With the transition into full developer ownership of the entire pipeline, developers automated everything. We have more code than ever, and processes that used to be manual are now described in code.

This is a good transition, in my opinion. Here are some of the benefits:

  • Automation: By storing all things as code, everything can be automated, reused, and re-created in moments.
  • Immutable: If anything goes wrong, create it again from the stored configuration.
  • Versioning: Changes can be applied and reverted, and are tracked to a single user who made the change.

GitOps: Git has become the single source of truth

The second major transition is that now everything is in one place! Git is the place where all of the code is stored and where all operations are initiated. Whether it’s testing, building, packaging, or releasing, nowadays everything is triggered through pull requests.

This is amplified by the codification of everything.

GitOps Graphic

Useful tools in the serverless era

There are many useful tools in the market, here is a list of ones that were designed for serverless.

Logos

Code

Always store your code in a source control system. In recent years, more and more functions are codified, such as, BI, ops, security, and AI. For new developers, it is not always obvious that they should use source control for some functionality.

Build and test

The most common mistake I see is manually configuring build jobs in the GUI. This might be good for a small POC but it is not scalable. You should have your job codified and inside your Git repository. Here are some tools to help with building and testing:

Security and governance

When working in a serverless way, you end up having many Git repos. The number of code packages can be overwhelming. The demand for unified code standards remains as it was but now it is much harder to enforce it on top of your R&D org. Here are some tools that might help you with the challenge:

Bundle and release

Building a serverless application is connecting microservices into one unit. For example, you might be using Amazon API Gateway, AWS Lambda, and Amazon DynamoDB. Instead of configuring each one separately, you should use a bundler to hold the configuration in one place. That allows for easy versioning and replication of the app for several environments. Here are a couple of bundlers:

Package

When working with many different serverless components, you should create small packages of tools to be able to import across different Lambda functions. You can use a language-specific store like npm or RubyGems, or use a more holistic solution. Here are several package artifact stores that allow hosting for multiple programming languages:

Monitor

This part is especially tricky when working with serverless applications, as everything is split into small pieces. It’s important to use monitoring tools that support this mode of work. Here are some tools that can handle serverless:

Summary

The serverless era brings many transitions along with it like a codification of the entire pipeline and Git being the single source of truth. This doesn’t mean that the same problems that we use to have like security, logging and more disappeared, you should continue addressing them and leveraging tools that enable you to focus on your business.

Simple Two-way Messaging using the Amazon SQS Temporary Queue Client

Post Syndicated from Rachel Richardson original https://aws.amazon.com/blogs/compute/simple-two-way-messaging-using-the-amazon-sqs-temporary-queue-client/

This post is contributed by Robin Salkeld, Sr. Software Development Engineer

Amazon SQS is a fully managed message queuing service that makes it easy to decouple and scale microservices, distributed systems, and serverless applications. Asynchronous workflows have always been the primary use case for SQS. Using queues ensures one component can keep running smoothly without losing data when another component is unavailable or slow.

We were surprised, then, to discover that many customers use SQS in synchronous workflows. For example, many applications use queues to communicate between frontends and backends when processing a login request from a user.

Why would anyone use SQS for this? The service stores messages for up to 14 days with high durability, but messages in a synchronous workflow often must be processed within a few minutes, or even seconds. Why not just set up an HTTPS endpoint?

The more we talked to customers, the more we understood. Here’s what we learned:

  • Creating a queue is often easier and faster than creating an HTTPS endpoint and the infrastructure necessary to ensure the endpoint’s scalability.
  • Queues are safe by default because they are locked down to the AWS account that created them. In addition, any DDoS attempt on your service is absorbed by SQS instead of loading down your own servers.
  • There is generally no need to create firewall rules for the communication between microservices if they use queues.
  • Although SQS provides durable storage (which isn’t necessary for short-lived messages), it is still a cost-effective solution for this use case. This is especially true when you consider all the messaging broker maintenance that is done for you.

However, setting up efficient two-way communication through one-way queues requires some non-trivial client-side code. In our previous two-part post series on implementing enterprise integration patterns with AWS messaging services, Point-to-point channels and Publish-subscribe channels, we discussed the Request-Response Messaging Pattern. In this pattern, each requester creates a temporary destination to receive each response message.

The simplest approach is to create a new queue for each response, but this is like building a road just so a single car can drive on it before tearing it down. Technically, this can work (and SQS can create and delete queues quickly), but we can definitely make it faster and cheaper.

To better support short-lived, lightweight messaging destinations, we are pleased to present the Amazon SQS Temporary Queue Client. This client makes it easy to create and delete many temporary messaging destinations without inflating your AWS bill.

Virtual queues

The key concept behind the client is the virtual queue. Virtual queues let you multiplex many low-traffic queues onto a single SQS queue. Creating a virtual queue only instantiates a local buffer to hold messages for consumers as they arrive; there is no API call to SQS and no costs associated with creating a virtual queue.

The Temporary Queue Client includes the AmazonSQSVirtualQueuesClient class for creating and managing virtual queues. This class implements the AmazonSQS interface and adds support for attributes related to virtual queues. You can create a virtual queue using this client by calling the CreateQueue API action and including the HostQueueURL queue attribute. This attribute specifies the existing SQS queue on which to host the virtual queue. The queue URL for a virtual queue is in the form <host queue URL>#<virtual queue name>. For example:

https://sqs.us-east-2.amazonaws.com/123456789012/MyQueue#MyVirtualQueueName

When you call the SendMessage or SendMessageBatch API actions on AmazonSQSVirtualQueuesClient with a virtual queue URL, the client first extracts the virtual queue name. It then attaches this name as an additional message attribute to each message, and sends the messages to the host queue. When you call the ReceiveMessage API action on a virtual queue, the calling thread waits for messages to appear in the in-memory buffer for the virtual queue. Meanwhile, a background thread polls the host queue and dispatches messages to these buffers according to the additional message attribute.

This mechanism is similar to how the AmazonSQSBufferedAsyncClient prefetches messages, and the benefits are similar. A single call to SQS can provide messages for up to 10 virtual queues, reducing the API calls that you pay for by up to a factor of ten. Deleting a virtual queue simply removes the client-side resources used to implement them, again without making API calls to SQS.

The diagram below illustrates the end-to-end process for sending messages through virtual queues:

Sending messages through virtual queues

Virtual queues are similar to virtual machines. Just as a virtual machine provides the same experience as a physical machine, a virtual queue divides the resources of a single SQS queue into smaller logical queues. This is ideal for temporary queues, since they frequently only receive a handful of messages in their lifetime. Virtual queues are currently implemented entirely within the Temporary Queue Client, but additional support and optimizations might be added to SQS itself in the future.

In most cases, you don’t have to manage virtual queues yourself. The library also includes the AmazonSQSTemporaryQueuesClient class. This class automatically creates virtual queues when the CreateQueue API action is called and creates host queues on demand for all queues with the same queue attributes. To optimize existing application code that creates and deletes queues, you can use this class as a drop-in replacement implementation of the AmazonSQS interface.

The client also includes the AmazonSQSRequester and AmazonSQSResponder interfaces, which enable two-way communication through SQS queues. The following is an example of an RPC implementation for a web application’s login process.

/**
 * This class handles a user's login request on the client side.
 */
public class LoginClient {

    // The SQS queue to send the requests to.
    private final String requestQueueUrl;

    // The AmazonSQSRequester creates a temporary queue for each response.
    private final AmazonSQSRequester sqsRequester = AmazonSQSRequesterClientBuilder.defaultClient();

    private final LoginClient(String requestQueueUrl) {
        this.requestQueueUrl = requestQueueUrl;
    }

    /**
     * Send a login request to the server.
     */
    public String login(String body) throws TimeoutException {
        SendMessageRequest request = new SendMessageRequest()
                .withMessageBody(body)
                .withQueueUrl(requestQueueUrl);

        // This:
        //  - creates a temporary queue
        //  - attaches its URL as an attribute on the message
        //  - sends the message
        //  - receives the response from the temporary queue
        //  - deletes the temporary queue
        //  - returns the response
        //
        // If something goes wrong and the server's response never shows up, this method throws a
        // TimeoutException.
        Message response = sqsRequester.sendMessageAndGetResponse(request, 20, TimeUnit.SECONDS);
        
        return response.getBody();
    }
}

/**
 * This class processes users' login requests on the server side.
 */
public class LoginServer {

    // The SQS queue to poll for login requests.
    // Assume that on construction a thread is created to poll this queue and call
    // handleLoginRequest() below for each message.
    private final String requestQueueUrl;

    // The AmazonSQSResponder sends responses to the correct response destination.
    private final AmazonSQSResponder sqsResponder = AmazonSQSResponderClientBuilder.defaultClient();

    private final AmazonSQS(String requestQueueUrl) {
        this.requestQueueUrl = requestQueueUrl;
    }

    /**
     * Handle a login request sent from the client above.
     */
    public void handleLoginRequest(Message message) {
        // Assume doLogin does the actual work, and returns a serialized result
        String response = doLogin(message.getBody());

        // This extracts the URL of the temporary queue from the message attribute and sends the
        // response to that queue.
        sqsResponder.sendResponseMessage(MessageContent.fromMessage(message), new MessageContent(response));  
    }
}

Cleaning up

As with any other AWS SDK client, your code should call the shutdown() method when the temporary queue client is no longer needed. The AmazonSQSRequester interface also provides a shutdown() method, which shuts down its internal temporary queue client. This ensures that the in-memory resources needed for any virtual queues are reclaimed, and that the host queue that the client automatically created is also deleted automatically.

However, in the world of distributed systems things are a little more complex. Processes can run out of memory and crash, and hosts can reboot suddenly and unexpectedly. There are even cases where you don’t have the opportunity to run custom code on shutdown.

The Temporary Queue Client client addresses this issue as well. For each host queue with recent API calls, the client periodically uses the TagQueue API action to attach a fresh tag value that indicates the queue is still being used. The tagging process serves as a heartbeat to keep the queue alive. According to a configurable time period (by default, 5 minutes), a background thread uses the ListQueues API action to obtain the URLs of all queues with the configured prefix. Then, it deletes each queue that has not been tagged recently. The mechanism is similar to how the Amazon DynamoDB Lock Client expires stale lock leases.

If you use the AmazonSQSTemporaryQueuesClient directly, you can customize how long queues must be idle before they is deleted by configuring the IdleQueueRetentionPeriodSeconds queue attribute. The client supports setting this attribute on both host queues and virtual queues. For virtual queues, setting this attribute ensures that the in-memory resources do not become a memory leak in the presence of application bugs.

Any API call to a queue marks it as non-idle, including ReceiveMessage calls that don’t return any messages. The only reason to increase the retention period attribute is to give the client more time when it can’t send heartbeats—for example, due to garbage collection pauses or networking issues.

But what if you want to use this client in a fleet of a thousand EC2 instances? Won’t every client spend a lot of time checking every queue for idleness? Doesn’t that imply duplicate work that increases as the fleet is scaled up?

We thought of this too. The Temporary Queue Client creates a shared queue for all clients using the same queue prefix, and uses this queue as a work queue for the distributed task. Instead of every client calling the ListQueues API action every five minutes, a new seed message (which triggers the sweeping process) is sent to this queue every five minutes.

When one of the clients receives this message, it calls the ListQueues API action and sends each queue URL in the result as another kind of message to the same shared work queue. The work of actually checking each queue for idleness is distributed roughly evenly to the active clients, ensuring scalability. There is even a mechanism that works around the fact that the ListQueues API action currently only returns no more than 1,000 queue URLs at time.

Summary

We are excited about how the new Amazon SQS Temporary Queue Client makes more messaging patterns easier and cheaper for you. Download the code from GitHub, have a look at Temporary Queues in the Amazon SQS Developer Guide, try out the client, and let us know what you think!

Ten Things Serverless Architects Should Know

Post Syndicated from Justin Pirtle original https://aws.amazon.com/blogs/architecture/ten-things-serverless-architects-should-know/

Building on the first three parts of the AWS Lambda scaling and best practices series where you learned how to design serverless apps for massive scale, AWS Lambda’s different invocation models, and best practices for developing with AWS Lambda, we now invite you to take your serverless knowledge to the next level by reviewing the following 10 topics to deepen your serverless skills.

1: API and Microservices Design

With the move to microservices-based architectures, decomposing monothlic applications and de-coupling dependencies is more important than ever. Learn more about how to design and deploy your microservices with Amazon API Gateway:

Get hands-on experience building out a serverless API with API Gateway, AWS Lambda, and Amazon DynamoDB powering a serverless web application by completing the self-paced Wild Rydes web application workshop.

Figure 1: WildRydes serverless web application workshop

2: Event-driven Architectures and Asynchronous Messaging Patterns

When building event-driven architectures, whether you’re looking for simple queueing and message buffering or a more intricate event-based choreography pattern, it’s valuable to learn about the mechanisms to enable asynchronous messaging and integration. These are enabled primarily through the use of queues or streams as a message buffer and topics for pub/sub messaging. Understand when to use each and the unique advantages and features of all three:

Gets hands-on experience building a real-time data processing application using Amazon Kinesis Data Streams and AWS Lambda by completing the self-paced Wild Rydes data processing workshop.

3: Workflow Orchestration in a Distributed, Microservices Environment

In distributed microservices architectures, you must design coordinated transactions in different ways than traditional database-based ACID transactions, which are typically implemented using a monolithic relational database. Instead, you must implement coordinated sequenced invocations across services along with rollback and retry mechanisms. For workloads where there a significant orchestration logic is required and you want to use more of an orchestrator pattern than the event choreography pattern mentioned above, AWS Step Functions enables the building complex workflows and distributed transactions through integration with a variety of AWS services, including AWS Lambda. Learn about the options you have to build your business workflows and keep orchestration logic out of your AWS Lambda code:

Get hands-on experience building an image processing workflow using computer vision AI services with AWS Rekognition and AWS Step Functions to orchestrate all logic and steps with the self-paced Serverless image processing workflow workshop.

Figure 2: Several AWS Lambda functions managed by an AWS Step Functions state machine

4: Lambda Computing Environment and Programming Model

Though AWS Lambda is a service that is quick to get started, there is value in learning more about the AWS Lambda computing environment and how to take advantage of deeper performance and cost optimization strategies with the AWS Lambda runtime. Take your understanding and skills of AWS Lambda to the next level:

5: Serverless Deployment Automation and CI/CD Patterns

When dealing with a large number of microservices or smaller components—such as AWS Lambda functions all working together as part of a broader application—it’s critical to integrate automation and code management into your application early on to efficiently create, deploy, and version your serverless architectures. AWS offers several first-party deployment tools and frameworks for Serverless architectures, including the AWS Serverless Application Model (SAM), the AWS Cloud Development Kit (CDK), AWS Amplify, and AWS Chalice. Additionally, there are several third party deployment tools and frameworks available, such as the Serverless Framework, Claudia.js, Sparta, or Zappa. You can also build your own custom-built homegrown framework. The important thing is to ensure your automation strategy works for your use case and team, and supports your planned data source integrations and development workflow. Learn more about the available options:

Learn how to build a full CI/CD pipeline and other DevOps deployment automation with the following workshops:

6: Serverless Identity Management, Authentication, and Authorization

Modern application developers need to plan for and integrate identity management into their applications while implementing robust authentication and authorization functionality. With Amazon Cognito, you can deploy serverless identity management and secure sign-up and sign-in directly into your applications. Beyond authentication, Amazon API Gateway also allows developers to granularly manage authorization logic at the gateway layer and authorize requests directly, without exposing their using several types of native authorization.

Learn more about the options and benefits of each:

Get hands-on experience working with Amazon Cognito, AWS Identity and Access Management (IAM), and Amazon API Gateway with the Serverless Identity Management, Authentication, and Authorization Workshop.

Figure 3: Serverless Identity Management, Authentication, and Authorization Workshop

7: End-to-End Security Techniques

Beyond identity and authentication/authorization, there are many other areas to secure in a serverless application. These include:

  • Input and request validation
  • Dependency and vulnerability management
  • Secure secrets storage and retrieval
  • IAM execution roles and invocation policies
  • Data encryption at-rest/in-transit
  • Metering and throttling access
  • Regulatory compliance concerns

Thankfully, there are AWS offerings and integrations for each of these areas. Learn more about the options and benefits of each:

Get hands-on experience adding end-to-end security with the techniques mentioned above into a serverless application with the Serverless Security Workshop.

8: Application Observability with Comprehensive Logging, Metrics, and Tracing

Before taking your application to production, it’s critical that you ensure your application is fully observable, both at a microservice or component level, as well as overall through comprehensive logging, metrics at various granularity, and tracing to understand distributed system performance and end user experiences end-to-end. With many different components making up modern architectures, having centralized visibility into all of your key logs, metrics, and end-to-end traces will make it much easier to monitor and understand your end users’ experiences. Learn more about the options for observability of your AWS serverless application:

9. Ensuring Your Application is Well-Architected

Adding onto the considerations mentioned above, we suggest architecting your applications more holistically to the AWS Well-Architected framework. This framework includes the five key pillars: security, reliability, performance efficiency, cost optimization, and operational excellence. Additionally, there is a serverless-specific lens to the Well-Architected framework, which more specifically looks at key serverless scenarios/use cases such as RESTful microservices, Alexa skills, mobile backends, stream processing, and web applications, and how they can implement best practices to be Well-Architected. More information:

10. Continuing your Learning as Serverless Computing Continues to Evolve

As we’ve discussed, there are many opportunities to dive deeper into serverless architectures in a variety of areas. Though the resources shared above should be helpful in familiarizing yourself with key concepts and techniques, there’s nothing better than continued learning from others over time as new advancements come out and patterns evolve.

Finally, we encourage you to check back often as we’ll be continuing further blog post series on serverless architectures, with the next series focusing on API design patterns and best practices.

About the author

Justin PritleJustin Pirtle is a specialist Solutions Architect at Amazon Web Services, focused on the Serverless platform. He’s responsible for helping customers design, deploy, and scale serverless applications using services such as AWS Lambda, Amazon API Gateway, Amazon Cognito, and Amazon DynamoDB. He is a regular speaker at AWS conferences, including re:Invent, as well as other AWS events. Justin holds a bachelor’s degree in Management Information Systems from the University of Texas at Austin and a master’s degree in Software Engineering from Seattle University.

The Network is the Computer: A Conversation with John Gage

Post Syndicated from John Graham-Cumming original https://blog.cloudflare.com/john-gage/

The Network is the Computer: A Conversation with John Gage

The Network is the Computer: A Conversation with John Gage

To learn more about the origins of The Network is the Computer®, I spoke with John Gage, the creator of the phrase and the 21st employee of Sun Microsystems. John had a key role in shaping the vision of Sun and had a lot to share about his vision for the future. Listen to our conversation here and read the full transcript below.


[00:00:13]

John Graham-Cumming: I’m talking to John Gage who was what, the 21st employee of Sun Microsystems, which is what Wikipedia claims and it also claims that you created this phrase “The Network is the Computer,” and that’s actually one of the things I want to talk about with you a little bit because I remember when I was in Silicon Valley seeing that slogan plastered about the place and not quite understanding what it meant. So do you want to tell me what you meant by it or what Sun meant by it at the time?

[00:00:40]

John Gage: Well, in 2019, recalling what it meant in 1982 or 83’ will be colored by all our experience since then but at the time it seemed so obvious that when we introduced the first scientific workstations, they were not very powerful computers. The first Suns had a giant screen and they were on the Internet but they were designed as a complementary component to supercomputers. Bill Joy and I had a series of diagrams for talks we’d give, and Bill had the bi-modal, the two node picture. The serious computing occurred on the giant machines where you could fly into the heart of a black hole and the human interface was the workstation across the network. So each had to complement the other, each built on the strengths of the other, and each enhanced the other because to deal in those days with a supercomputer was very ugly. And to run all your very large computations, you could run them on a Sun because we had virtual memory and series of such advanced things but not fast. So the speed of scientific understanding is deeply affected by the tools the scientist has — is it a microscope, is it an optical telescope, is it a view into the heart of a star by running a simulation on a supercomputer? You need to have the loop with the human and the science constantly interacting and constantly modifying each other, and that’s what the network is for, to tie those different nodes together in as seamless a way as possible. Then, the instant anyone that’s ever created a programming language says, “so if I have to create a syntax of this where I’m trying to let you express, do this, how about the delay on the network, the latency”? Does your phrase “The Network is the Computer” really capture this hundreds, thousands, tens of thousands, millions perhaps at that time, now billions and billions and billions today, all these devices interacting and exchanging state with latency, with delay. It’s sort of an oversimplification, and that we would point out, but it’s just network is the computer. Four words, you know, what we tried to do is give a metaphor that allows you to explore it in your mind and think of new things to do and be inspired.

[00:03:35]

Graham-Cumming: And then by a sort of strange sequence of events, that was a trademark of Sun. It got abandoned. And now Cloudflare has swooped in and trademarked it again. So now it’s our trademark which sort of brings us full circle, I suppose.

[00:03:51]

Gage: Well, trademarks are dealing with the real world, but the inspiration of Cloudflare is to do exactly what Bill Joy and I were talking about in 1982. It’s to build an environment in which every participant globally can share with security, and we were not as strong. Bill wrote most of the code of TCP/IP implemented by every other computer vendor, and still these questions of latency, these questions of distributed denial of service which was, how do you block that? I was so happy to see that Cloudflare invests real money and real people in addressing those kinds of critical problems, which are at the core, what will destroy the Internet.

[00:14:48]

Graham-Cumming: Yes, I agree. I mean, it is a significant investment to actually deal with it and what I think people don’t appreciate about the DDoS attack situation is that they are going on all the time and it’s just a continuous, you know, just depends who the target is. It’s funny you mentioned TCP/IP because about 10 years after, so in about ‘92, my first real job, I had to write a TCP/IP stack for an obscure network card. And this was prior to the Internet really being available everywhere. And so I didn’t realize I could go and get the BSD implementation and recompile it. So I did it from scratch from the RFCs.

[00:05:23]

Gage: You did!

[00:05:25]

Graham-Cumming: And the thing I recommend here is that nobody ever does that because, you know, the real world, real code that really interacts is really hard when you’re trying to work it with other things, so.

[00:05:36]

Gage: Do you still, John, do you have that code?

[00:05:42]

Graham-Cumming: I wonder. I have the binary for it.

[00:05:46]

Gage: Do hunt for it, because our story was at the time DARPA, the Defense Advanced Research Projects Agency, that had funded networking initiatives around the world. I just had a discussion yesterday with Norway and they were one of the first entities to implement using essentially Bill Joy’s code, but to be placed on the ARPANET. And a challenge went out, and at that time the slightly older generation, the Bolt Beranek and Newman Group, Vint Cerf, Bob Con, those names, as Vint Cerf was a grad student at UCLA where he had built one of the four first Internet sites and the DARPA offices were in Arlington, Virginia, they had massive investments in detection of nuclear underground tests, so seismological data, and the moment we made the very first Suns, I shipped them to DARPA, we got the network up and began serving seismic data globally. Really lovely visualization of events. If you’re trying to detect something, those things go off and then there’s a distinctive signature, a collapse of the underground cavern after. So DARPA had tried to implement, as you did, from the spec, from the RFC, the components, and Vint had designed a lot of this, all the acknowledgement codes and so forth that you had to implement in TCP/IP. So Bill, as a graduate student at Berkeley, we had a meeting in Arlington at DARPA headquarters where BBN and AT&T Bell Labs and a number of other people were in the room. Their code didn’t work, this graduate student from Berkeley named Bill Joy, his code did work, and when Bob Kahn and Vint Cerf asked Bill, “Well, so how did you do it?” What he said was exactly what you just said, he said, “I just read the spec and wrote the code.”

[00:08:12]

Graham-Cumming: I do remember very distinctly because the company I was working at didn’t have a TCP/IP stack and we didn’t have any IP machines, right, we were doing actually stuff that was all IBM networking, SMA stuff. Somehow we bought what was at that point a HP machine, it was an Apollo workstation and a Sun workstation. I had them on Ethernet and talking to each other. And I do distinctly remember the first time a ping packet came back from that Sun box, saying, yes I managed to send you an IP packet, you managed to send me ICMP response and that was pretty magical. And then I got to TCP and that was hard.

[00:08:55]

Gage: That was hard. Yeah. When you get down to the details, the spec can be wrong. I mean, it will want you to do something that’s a stupid thing to do. So Bill has such good taste in these things. It would be interesting to do a kind of a diff across the various implementations of the stack. Years and years later we had maybe 50 companies all assemble in a room, only engineers, throw out all the marketing people and all the Ps and VPs and every company in this room—IBM, Hewlett-Packard—oh my God, Hewlett-Packard, fix your TCP—and we just kept going until everybody could work with everybody else in sort of a pact. We’re not going to reveal, Honeywell, that you guys were great with earlier absolute assembly code, determinate time control stuff but you have no clue about how packets work, we’ll help you, so that all of us can make every machine interoperate, which yielded the network show, Interop. Every year we would go put a bunch of fiber inside whatever, you know, Geneva, or pick some, Las Vegas, some big venue.

[00:10:30]

Graham-Cumming: I used to go to Vegas all the time and that was my great introduction to Vegas was going there for Interop, year after year.

[00:10:35]

Gage: Oh, you did! Oh, great.

[00:10:36]

Graham-Cumming: Yes, yes, yes.

[00:10:39]

Gage: You know in a way, what you’re doing with, for example, just last week with the Verizon problem, everybody implementing what you’re doing now that is not open about their mistakes and what they’ve learned and is not sharing this, it’s a problem. And your global presence to me is another absolutely critical thing. We had about, I forget, 600 engineers in Beijing at the East Gate of Tsinghua a lot of networking expertise and lots of those people are at Tencent and Huawei and those network providers throughout the rest of the world, politics comes and goes but the engineering has to be done in a way that protects us. And so these conversations globally are critical.

[00:11:33]

Graham-Cumming: Yes, that’s one of the things that’s fascinating actually about doing real things on the real Internet is there is a global community of people making computers talk to each other and you know, that it’s a tremendously complicated thing to actually make that work, and you do it across countries, across languages. But you end up actually making them work, and that’s the Internet we’re sitting on, that you and I are talking on right now that is based on those conversations around the world.

[00:12:01]

Gage: And only by doing it do you understand more deeply how to do it. It’s very difficult in the abstract to say what should happen as we begin to spread. As Sun grew, every major city in Africa had installations and for network access, you were totally dependent on an often very corrupt national telco or the complications dealing with these people just to make your packet smooth. And as it turned out, many of the intelligence and military entities in all of these countries had very little understanding of any of this. That’s changed to some degree. But the dangerous sides of the Internet. Total surveillance, IPv6, complete control of exact identity of origins of packets. We implemented, let’s see, you had an early Sun. We probably completed our IPv6 implementation, was it still fluid in the 90s, but I remember 10 years after we finished a complete implementation of IPv6, the U.S. was still IPv4, it’s still IPv4.

[00:13:25]

Graham-Cumming: It still is, it still is. Pretty much. Except for the mobile carriers right now. I think in general the mobile phone operators are the ones who’ve gone more into IPv6 than anybody else.

[00:13:37]

Gage: It was remarkable in China. We used to have a conference. We’d bring a thousand Chinese universities into a room. Professor Wu from Tsinghua who built the Chinese Education and Research Network, CERNET. And now a thousand universities have a building on campus doing Internet research. We would get up and show this map of China and he kept his head down politically, but he managed at the point when there was a big fight between the Minister of Telecom and the Minister of Railways. The Minister of Railways said, look, I have continuity throughout China because I have railines. I’ve just made a partnership with the People’s Liberation Army, and they are essentially slave labor, and they’re going to dig the ditches, and I’m going to run fiber alongside the railways and I don’t care what you, the Minister of Telecommunications, has to say about it, because I own the territory. And that created a separate pathway for the backbone IPv6 network in China. Cheap, cheap, cheap, get everybody doing things.

[00:14:45]

Graham-Cumming: Yes, now of course in China that’s resulted in an interesting situation where you have China Telecom and China Unicom, who sort of cooperate with each other but they’re almost rivals which makes IP packets quite difficult to route inside China.

[00:14:58]

Gage: Yes exactly. At one point I think we had four hunks of China. Everyone was geographically divided. You know there were meetings going on, I remember the moment they merged the telecom ministry with the electronics ministry and since we were working with both of them, I walk in a room and there’s a third group, people I didn’t know, it turns out that’s the People’s Liberation Army.

[00:15:32]

Graham-Cumming: Yes, they’re part of the team. So okay, going back to this “Network is the Computer” notion. So you were talking about the initial things that you were doing around that, why is it that it’s okay that Cloudflare has gone out and trademarked that phrase now, because you seem to think that we’ve got a leg to stand on, I guess.

[00:15:56]

Gage: Frankly, I’d only vaguely heard of Cloudflare. I’ve been working in areas, I’ve got a project in the middle of Nairobi in the slum where I’ve spent the last 15 years or so learning a lot about clean water and sewage treatment because we have almost 400,000 people in a very small area, biggest slum in East Africa. How can you introduce sanitary water and clean sewage treatment into a very, an often corrupt, a very difficult environment, and so that’s been a fascination of mine and I’ve been spending a lot of time. What’s a computer person know about fluid dynamics and pathogens? There’s a lot to learn. So as you guys grew so rapidly, I vaguely knew of you but until I started reading your blog about post-quantum crypto and how do we devise a network in these resilient denial of service attacks and all these areas where you’re a growing company, it’s very hard to take time to do serious advanced research-level work on distributed computing and distributed security, and yet you guys are doing it. When Bill created Java, the subsequent step from Java for billions and billions of devices to share resources and share computations was something we call Genie which is a framework for validation of who you are, movement of code from device to device in a secure way, total memory control so that someone is not capable of taking over memory in your device as we’ve seen with Spectre and the failures of these billions of Intel chips out there that all have a flaw on take all branches parallel compute implementations. So the very hardware you’re using can be insecure so your operating systems are insecure, the hardware is insecure, and yet you’re trying to build on top with fallible pieces in infallible systems. And you’re in the middle of this, John, which I’m so impressed by.

[00:18:13]

Graham-Cumming: And Jini sort of lives on as called Apache River now. It moved away from Sun and into an Apache project.

[00:18:21]

Gage: Yes, very few people seem to realize that the name Apache is a poetic phrasing of “a patchy system.” We patch everything because everything is broken. We moved a lot of it, Brian Behlendorf and the Apache group. Well, many of the innovations at Sun, Java is one, file systems that are far more secure and far more resilient than older file systems, the SPARC  implementation, I think the SPARC processor, even though you’re using the new ARM processors, but Fujitsu, I still think keeps the SPARC architecture as the world’s fastest microprocessor.  

[00:19:16]

Graham-Cumming: Right. Yes. Being British of course, ARM is a great British success. So I’m honor-bound to use that particular architecture. Clearly.

[00:19:25]

Gage: Oh, absolutely. And the power. That was the one always in a list of what our engineering goals are. We wanted to make, we were building supercomputers, we were building very large file servers for the telcos and the banks and the intelligence agencies and all these different people, but we always wanted to make a low power and it just fell off the list of what you could accomplish and the ARM chips, their ratios of wattage to packets treated are—you have a great metric on your website someplace about measuring these things at a very low level—that’s key.

[00:20:13]

Graham-Cumming: Yes, and we had Sophie Wilson, who of course is one of the founders of ARM and actually worked on the original chip, tell this wonderful story at our Internet Summit about how the first chip they hooked up was operating fine until they realized they hadn’t hooked the power up and they were asked to. It was so low power that it was able to use the power that was coming in over the logic lines to actually power the whole chip. And they said to me, wait a minute, we haven’t plugged the power in but the thing is running, which was really, I mean that was an amazing achievement to have done that.

[00:20:50]

Gage: That’s amazing. We open sourced SPARC, the instruction set, so that anybody doing crypto that also had Fab capabilities could implement detection of ones and zeroes, sheep and goats, or other kinds of algorithms that are necessary for very high speed crypto. And that’s another aspect that I’m so impressed by Cloudflare. Cloudflare is paying attention at a machine instruction level because you’re implementing with your own hardware packages in what, 180 cities? You’re moving logistically a package into Ulan Bator, or into Mombasa and you’re coming up live.

[00:21:38]

Graham-Cumming: And we need that to be inexpensive and fast because we’re promising people that we will make their Internet properties faster and secure at the same time and that’s one of the interesting challenges which is not trading those two things off. Which means your crypto better be fast, for example, and that requires a lot of fiddling around at the hardware level and understanding it. In our case because we’re using Intel, really what Intel chips are doing at the low level.

[00:22:10]

Gage: Intel did implement a couple of things in one or another of the more recent chips that were very useful for crypto. We had a group of the SPARC engineers, probably 30, at a dinner five or six months ago discussing, yes, we set the world standard for parallel execution branching optimizations for pipelines and chips, and when the overall design is not matched by an implementation that pays attention to protecting the memory, it’s a fundamental, exploitable flaw. So a lot of discussion about this. Selecting precisely which instructions are the most important, the risk analysis with the ability to make a chip specifically to implement a particular algorithm, there’s a lot more to go. We have multiples of performance ahead of us for specific algorithms based on a more fluid way to add instructions that are necessary into a specific piece of hardware. And then we jump to quantum. Oh my.

[00:23:32]

Graham-Cumming: Yes. To talk about that a little bit, the ever-increasing speed of processors and the things we can do; Do you think we actually need that given that we’re now living in this incredibly distributed world where we are actually now running very distributed algorithms and do we really need beefier machines?

[00:23:49]

Gage: At this moment, in a way, it’s you making fun of Bill Joy for only wanting a megabit in Aspen. When Steve Jobs started NeXT, sadly his hardware was just terrible, so we sent a group over to boost NeXT. In fact we sort of secretly slipped him $30 million to keep him afloat. And I’d say, “Jobs, if you really understood something about hardware, it would really be useful here.” So one of the main team members that we sent over to NeXT came to live in Aspen and ended up networking the entire valley. At a point, megabit for what you needed to do, seemed reasonable, so at this moment, as things become alive by the introduction of a little bit of intelligence in them, some little flickering chip that’s able to execute an algorithm, many tasks don’t require. If you really want to factor things fast, quantum, quantum. Which will destroy our existing crypto systems. But if you are just bringing the billions of places where a little bit of knowledge can alter locally a little bit of performance, we could do very well with the compute power that we have right now. But making it live on the network, securely, that’s the key part. The attacks that are going on, simple errors as you had yesterday, are simple errors. In a way, across Cloudflare’s network, you’re watching the challenges of the 21st century take place: attacks, obscure, unknown exploits of devices in the power and water control systems. And so, you are in exactly the right spot to not get much sleep and feel a heavy responsibility.

[00:26:20]

Graham-Cumming: Well it certainly felt like it yesterday when we were offline for 27 minutes, and that’s when we suddenly discovered, we sort of know how many customers we have, and then we really discover when they start phoning us. Our support line had his own DDoS basically where it didn’t work anymore because so many people signed in. But yes, I think that it’s interesting your point about a little bit extra on a device somewhere can do something quite magical and then you link it up to the network and you can do a lot. What we think is going on partly is some things around AI, where large amounts of machine learning are happening on big beefy machines, perhaps in the cloud, perhaps groups of machines, and then devices are doing their own little bits of inference or recognizing faces and stuff like that. And that seems to be an interesting future where we have these devices that are actually intelligent in our pockets.

[00:27:17]

Gage: Oh, I think that’s exactly right. There’s so much power in your pocket. I’m spending a lot of time trying to catch up that little bit of mathematics that you thought you understood so many years ago and it turns out, oh my, I need a little bit of work here. And I’ve been reading Michael Jordan’s papers and watching his talks and he’s the most cited computer scientist in machine learning and he will always say, “Be very careful about the use of the phrase, ‘Artificial Intelligence’.” Maybe it’s a metaphor like “The Network is the Computer.” But, we’re doing gradient descent optimization. Is the slope going up, or is the slope going down? That’s not smart. It’s useful and the real time language translation and a lot of incredible work can occur when you’re doing phrases. There’s a lot of great pattern work you can do, but he’s out in space essentially combining differentiation and integration in a form of integral. And off we go. Are your hessians rippling in the wind? And what’s the shape of this slope? And is this actually the fastest path from here to there to constantly go downhill. Maybe it’s sometimes going uphill and going over and then downhill that’s faster. So there’s just a huge amount of new mathematics coming in this territory and each time, as we move from 2G to 3G to 4G to 5G, many people don’t appreciate that the compression algorithms changed between 2G, 3G, 4G and 5G and as a result, so much more can move into your mobile device for the same amount of power. 10 or 20 times more for the same about of power. And mathematics leads to insights and applications of it. And you have a working group in that area, I think. I tried to probe around to see if you’re hiring.

[00:30:00]

Graham-Cumming: Well you could always just come around to just ask us because we’ll probably tell you because we tend to be fairly transparent. But yes, I mean compression is definitely an area where we are interested in doing things. One of the things I first worked on at Cloudflare was a thing that did differential compression based on the insight that web pages don’t actually change that much when you hit ‘refresh’. And so it turns out that if you if you compress based on the delta from the last thing you served to someone you can actually send many orders of magnitude less data and so there’s lots of interesting things you can do with that kind of insight to save a tremendous amount of bandwidth. And so yeah, definitely compression is interesting, crypto is interesting to us. We’ve actually open sourced some of our compression improvements in zlib which was very popular compression algorithm and now it’s been picked up. It turns out that in neuroscience, because there’s a tremendous amount of data which needs compression and there are pipelines used in neuroscience where actually having better compression algorithms makes you work a lot faster. So it’s fascinating to see the sort of overspill of things we’re doing into other areas where I know nothing about what goes on inside the brain.

[00:31:15]

Gage: Well isn’t that fascinating, John. I mean here you are, the CTO of Cloudflare working on a problem that deeply affects the Internet, enabling a lot more to move across the Internet in less time with less power, and suddenly it turns into a tool for brain modeling and neuroscientists. This is the benefit. There’s a terrific initiative. I’m at Berkeley. The Jupiter notebooks created by Fernando Perez, this environment in which you can write text and code and share things. That environment, taken up by machine learning. I think it’s a major change. And the implementation of diagrams that are causal. These forms of analysis of what caused what. These are useful across every discipline and for you to model traffic and see patterns emerge and find webpages and see the delta has changed and then intelligently change the pattern of traffic in response to it, it’s all pretty much the same thing here.

[00:32:53]

Graham-Cumming: Yes and then as a mathematician, when I see things that are the same thing, I can’t help wondering what the real deep structure is underneath. There must be another layer another layer down or something. So as you know it’s this thing. There’s some other deeper layer below all this stuff.  

[00:33:12]

Gage: I think this is just endlessly fascinating. So my only recommendations to Cloudflare: first, double what you’re doing. That’s so hard because as you go from 10 people to 100 people to 1,000 people to 10,000 people, it’s a different world. You are a prime example, you are global. Suddenly you’re able to deal with local authorities in 60-70 countries and deal with some of the world’s most interesting terrain and with network connectivity and moving data, surveillance, and some security of the foundation infrastructure of all countries. You couldn’t be engaged in more exciting things.

[00:34:10]

Graham-Cumming: It’s true. I mean one of the most interesting things to me is that I have grown up with the Internet when I you know I got an email address using actually the crazy JANET scheme in the UK where the DNS names were backwards. I was in Oxford and they gave me an email address and it was I think it was JGC at uk dot ac dot ox dot prg and that then at some point it flipped around and it went to DNS looked like it had won. For a long time my address was the wrong way around. I think that’s a typically British decision to be slightly different to everybody else.

[00:35:08]

Gage: Well, Oxford’s always had that style, that we’re going to do things differently. There’s an Oxford Center for the 21st century that was created by the money from a wonderful guy who had donated maybe $100 million. And they just branched out into every possible research area. But when you went to meetings, you would enter a building that was built at the time of the Raj. It was the India temple of colonialism.

[00:35:57]

Graham-Cumming: There’s quite a few of those in the UK. Are you thinking of the Martin School? James Martin. And he gave a lot of money to Oxford. Well the funny thing about that was the programming research group. The one thing they didn’t teach us really as an undergraduate was how to program which was one of the most fascinating things they have because that was a bit getting your hands dirty so you needed to let all the theory. So we learnt all the theory we did a little bit of functional programming and that was the extent of it which set me really up badly for a career in an industry. My first job I had to pretend I knew how to program and see and learn very quickly.

[00:36:42]

Gage: Oh my. Well now you’ve been writing code in Go.

[00:36:47]

Graham-Cumming: Yes. Well the thing about Go, the other Oxford thing of course is Tony Hoare, who is a professor of computer science there. He had come up with this thing called CSP (Communicating Sequential Processes) so that was a whole theory around how you do parallel execution. And so of course everybody used his formalism and I did in my doctoral thesis and so when Go came along and they said oh this how Go works, I said, well clearly that’s CSP and I know how to do this. So I can do it again.

[00:37:23]

Gage: Tony Hoare occasionally would issue a statement about something and it was always a moment. So few people seem to realize the birth of so much of what we took in the 60s, 70s, 80s, in Silicon Valley and Berkeley, derived from the Manchester Group, the virtual memory work, these innovations. Today, Whit Diffie. He used to love these Bletchley stories, they’re so far advanced. That generation has died off.

[00:38:37]

Graham-Cumming: There’s a very peculiar thing in computer science and the real application of computing which is that we both somehow sit on this great knowledge of the past of computing and at the same time we seem to willfully forget it and reinvent everything every few years. We go through these cycles where it’s like, let’s do centralized computing, now distributed computing. No, let’s have desktop PCs, now let’s have the cloud. We seem to have this collective amnesia and then on occasion people go, “Oh, Leslie Lamport wrote this thing in 1976 about this problem”. What other subject do we willfully forget the past and then have to go and doing archaeology to discover again?

[00:39:17]

Gage: As a sociological phenomenon it means that the older crowd in a company are depressing because they’ll say, “Oh we tried that and it didn’t work”. Over the years as Sun grew from 15 people or so and ended up being like 45,000 people before we were sold off to Oracle and then everybody dumped out because Oracle didn’t know too much about computing. So Ivan Sutherland, Whit Diffie. Ivan actually stayed on. He may actually still have an Oracle email. Almost all of the research groups, certainly the chip group went off to Intel, Fujitsu, Microsoft. It’s funny to think now that Microsoft’s run by a Sun person.

[00:40:19]

Graham-Cumming: Well that’s the same thing. Everyone’s forgotten that Microsoft was the evil empire not that long ago. And so now it’s not. Right now it’s cool again.

[00:40:28]

Gage: Well, all of the embedded stuff from Microsoft is still that legacy that Bill Gates who’s now doing wonderful things with the Gates Foundation. But the embedded insecurity of the global networks is due to, in large part, the insecurities, that horrible engineering of Microsoft embedded everywhere. You go anywhere in China to some old industrial facility and there is some old not updated junky PC running totally insecure software. And it’s controlling the grid. It’s discouraging. It’s like a lot of the SCADA systems.

[00:41:14]

Graham-Cumming: I’m completely terrified of SCADA systems.

[00:41:20]

Gage: The simplest exploits. I mean, it’s nothing even complicated. There are a series of emerging journalists today that are paying attention to cybersecurity and people have come out with books even very recently. Well, now because we’re in this China, US, Iran nightmare, a United States presidential directive taking the cybersecurity crowd and saying, oops, now you’re an offensive force. Which means we got some 20-year-old lieutenant somewhere who suddenly might just for fun turn off Tehran’s water supply or something. This is scary because the SCADA systems are embedded everywhere, and they’re, I don’t know, would you say totally insecure? Just the simple things, just simple exploits. One of the journalists described, I guess it was the Russians who took a bunch of small USB sticks and at a shopping center near a military base just gave them away. And people put them into their PCs inside SIPRNet, inside the secure U.S. Department of Defense network. Instantly the network was taken over just by inserting a USB device to something on the net. And there you are, John, protecting against this.

[00:43:00]

Graham-Cumming: Trying hard to protect against these things, yes absolutely. It’s very interesting because you mentioned before how rapidly Cloudflare had grown over the last few years. And of course Sun also really got going pretty rapidly, didn’t it?

[00:43:00]

Gage: Well, yes. The first year we were just some students from Berkeley, hardware from Stanford, Andy Bechtolsheim, software from Berkeley, Berkeley Unix BSD, Bill Joy. Combine the two, and 10 of us or so, and we were, I think the first year was 12 million booked, the second year was 50 or 60 million booked, and the third year was 150 or so million booked and then we hit 500 million and then we hit a billion. And now, it’s selling boxes, we were a manufacturing company so that’s different from software or services, but we also needed lots of people and so we instantly raided the immense benefit of variety of people in the San Francisco Bay Area, with Berkeley and Stanford. We had students in computer science, and mechanical engineering, and physics, and mathematics from every country in the world and we recruited from every country in the world. So a great part of Sun’s growth came, as you are, expanding internationally, and at one point I think we ran most of the telcos of the world, we ran China Mobile. 900 million subscribers on China Mobile, all Sun stuff in the back. Throughout Africa, every telco was running Sun and Cisco until Huawei knocked Cisco out. It was an amazing time.

[00:44:55]

Graham-Cumming: You ran the machine that ran Latek, that let me get my doctoral thesis done.

[00:45:01]

Gage: You know that’s how I got into it, actually. I was in econometrics and mathematics at Berkeley, and I walk down a hallway and outside a room was that funny smell from photographic paper from something, and there was perfectly typeset mathematics. Troff and nroff, all those old UNIX utilities for the Bell Technical Journal, and I open the door and I’ve got to get in there. There’s two hundred people sitting in front of these beehive-like little terminals all typing away on a UNIX system. And I want to get an account and I walk down the hall and there’s this skinny guy who types about 200 words a minute named Bill Joy. And I said, I need an account, I’ve got to type set integral signs, and he said, what’s your name. I tell him my name, John Gage, and he goes voop, and I’ve never seen anybody type as fast as him in my life. This is a new world, here.

[00:45:58]

Graham-Cumming: So he was rude then?

[00:46:01]

Gage: Yeah he was, he was. Well, it’s interesting since the arrival of a device at Berkeley to complement the arrival of an MIT professor who had implemented in LISP, mathematical, not typesetting of mathematics, but actual maxima. To get Professor fetman, maxima god from MIT, to come to Berkeley and live a UNIX environment, we had to put a LISP up outside on the PDP. So Bill took that machine which had virtual memory and implemented the environment for significant computational mathematics. And Steve Wolfram took that CalTech, and Princeton Institute for Advanced Studies, and now we have Mathematica. So in a way, all of Sun and the UNIX world derived from attempting to do executable mathematics.

[00:47:17]

Graham-Cumming: Which in some ways is what computers are doing. I think one of the things that people don’t really appreciate is the extent to which all numbers underneath.

[00:47:28]

Gage: Well that’s just this discrete versus continuous problem that Michael Jordan is attempting to address. To my current total puzzlement and complete ignorance, is what in the world is symplectic integration? And how do Lyapunov functions work? Oh, no clue.

[00:47:50]

Graham-Cumming: Are we going to do a second podcast on that? Are you going to come back and teach us?

[00:47:55]

Gage: Try it. We’re on, you’re on, you’re on. Absolutely. But you’ve got to run a company.

[00:48:00]

Graham-Cumming: Well I’ve got some things to do. Yeah. But you can go do that and come tell us about it.

[00:48:05]

Gage: All right, Great John. Well it was terrific to talk to you.

[00:48:08]

Graham-Cumming: So yes it was wonderful speaking to you as well. Thank you for helping me dig up memories of when I was first fooling around with Sun Systems and, you know, some of the early days and of course “The Network is the Computer,” I’m not sure I fully yet understand quite the metaphor or even if maybe I do somehow deeply in my soul get it, but we’re going to try and make it a reality, whatever it is.

[00:48:30]

Gage: Well, I count it as a complete success, because you count as one of our successes because you‘re doing what you’re doing, therefore the phrase, “The Network is the Computer,” resides in your brain and when you get up in the morning and decide what to do, a little bit nudges you toward making the network work.

[00:48:51]

Graham-Cumming: I think that’s probably true. And there’s the dog, the dog is saying you’ve been yakking for an hour and now we better stop. So listen, thank you so much for taking the time. It was wonderful talking to you. You have a good day. Thank you very much.


Interested in hearing more? Listen to my conversations with Ray Rothrock and Greg Papadopoulos of Sun Microsystems:

To learn more about Cloudflare Workers, check out the use cases below:

  • Optimizely – Optimizely chose Workers when updating their experimentation platform to provide faster responses from the edge and support more experiments for their customers.
  • Cordial – Cordial used a “stable of Workers” to do custom Black Friday load shedding as well as using it as a serverless platform for building scalable customer-facing services.
  • AO.com – AO.com used Workers to avoid significant code changes to their underlying platform when migrating from a legacy provider to a modern cloud backend.
  • Pwned Passwords – Troy Hunt’s popular “Have I Been Pwned” project benefits from cache hit ratios of 94% on its Pwned Passwords API due to Workers.
  • Timely – Using Workers and Workers KV, Timely was able to safely migrate application endpoints using simple value updates to a distributed key-value store.
  • Quintype – Quintype was an eager adopter of Workers to cache content they previously considered un-cacheable and improve the user experience of their publishing platform.

The Network is the Computer: A Conversation with Ray Rothrock

Post Syndicated from John Graham-Cumming original https://blog.cloudflare.com/ray-rothrock/

The Network is the Computer: A Conversation with Ray Rothrock

The Network is the Computer: A Conversation with Ray Rothrock

Last week I spoke with Ray Rothrock, former Director of CAD/CAM Marketing at Sun Microsystems, to discuss his time at Sun and how the Internet has evolved. In this conversation, Ray discusses the importance of trust as a principle, the growth of Sun in sales and marketing, and that time he gave Vice President Bush a Sun demo. Listen to our conversation here and read the full transcript below.

[00:00:07]

John Graham-Cumming: Here I am very lucky to get to talk with Ray Rothrock who was I think one of the first investors in Cloudflare, a Series A investor and got the company a little bit of money to get going, but if we dial back a few earlier years than that, he was also at Sun as the Director of CAD/CAM Marketing. There is a link between Sun and Cloudflare. At least one, but probably more than one, which is that Cloudflare has recently trademarked, “The Network is the Computer”. And that was a Sun trademark, wasn’t it?

[00:00:43]

Ray Rothrock: It was, yes.

[00:00:46]

Graham-Cumming: I talked to John Gage and I asked him about this as well and I asked him to explain to me what it meant. And I’m going to ask you the same thing because I remember walking around the Valley thinking, that sounds cool; I’m not sure I totally understand it. So perhaps you can tell me, was I right that it was cool, and what does it mean?

[00:01:06]

Rothrock: Well it certainly was cool and it was extraordinarily unique at the time. Just some quick background. In those early days when I was there, the whole concept of networking computers was brand new. Our competitor Apollo had a proprietary network but Sun chose to go with TCP/IP which was a standard at the time but a brand new standard that very few people know about right. So when we started connecting computers and doing some intensive computing which is what I was responsible for—CAD/CAM in those days was extremely intensive whether it was electrical CAD/camera, or mechanical CAD/CAM, or even simulation solid design modeling and things—having a little extra power from other computers was a big deal. And so this concept of “The Network is the Computer” essentially said that you had one window into the network through your desktop computer in those days—there was no mobile computing at that time, this was like 84’, 85’, 86’ I think. And so if you had the appropriate software you could use other people’s computers (for CPU power) and so you could do very hard problems at that single computer could not do because you could offload some of that CPU to the other computers. Now that was very nerdy, very engineering intensive, and not many people did it. We’d go to the SIGGRAPH, which was a huge graphics show in those days and we would demonstrate ten Sun computers for example, doing some graphic rendering of a 3D wireframe that had been created in the CAD/CAM software of some sort. And it was, it was hard, and that was in the mechanical side. On the electrical side, Berkeley had some software that was called Magic—it’s still around and is a very popular EDA software that’s been incorporated in those concepts. But to imagine calculating the paths in a very complicated PCB or a very complicated chip, one computer couldn’t do it, but Sun had the fundamental technology. So from my seat at Sun at the time, I had access to what could be infinite computing power, even though I had a single application running, and that was a big selling point for me when I was trying to convince EDA and MDA companies to put their software on the Sun. That was my job.

[00:03:38]

Graham-Cumming: And hearing it now, it doesn’t sound very revolutionary, because of course we’re all doing that now. I mean I get my phone out of my pocket and connect to goodness knows what computing power which does image recognition and spots faces and I can do all sorts of things. But walk me through what it felt like at the time.

[00:03:56]

Rothrock: Just doing a Google search, I mean, how many data stores are being spun up for that? At the time it was incredible, because you could actually do side by side comparisons. We created some demonstrations, where one computer might take ten hours to do a calculation, two computers might take three hours, five computers might take 30 minutes. So with this demo, you could turn on computers and we would go out on the TCP/IP network to look for an available CPU that could give me some time. Let’s go back even further. Probably 15 years before that, we had time sharing. So you had a terminal into a big mainframe and did all this swapping in and out of stuff to give you a time slice computing. We were doing the exact same thing except we were CPU slicing, not just time slicing. That’s pretty nerdy, but that’s what we did. And I had to work with the engineering department, with all these great engineers in those days, to make this work for a demo. It was so unique, you know, their eyes would get big. You remember Novell…

[00:05:37]

Graham-Cumming: I was literally just thinking about Novell because I actually worked on IPX and SPX networking stuff at the time. I was going to ask you actually, to what extent do you think TCP/IP was a very important part of this revolution?

[00:05:55]

Rothrock: It was huge. It was fundamentally huge because it was a standard, so it was available and if you implemented it, you didn’t have to pay for it. When Bob Metcalfe did Ethernet, it was on top of the TCP stack. Sun, in my memory, and I could be wrong, was the first company to put a TCP/IP stack on the computer. And so you just plugged in the back, an RJ45 into this TCP/IP network with a switch or a router on it and you were golden. They made it so simple and so cheap that you just did it. And of course if you give an engineer that kind of freedom and it opens up. By the way, as the marketing guy at Sun, this was my first non-engineering job. I came from a very technical world of nuclear physics into Sun. And so it was stunning, just stunning.

[00:06:59]

Graham-Cumming: It’s interesting that you mentioned Novell and then you mentioned Apollo before that and obviously IBM had SNA networking and there were attempts to do all those networking things. It’s interesting that these open standards have really enabled the explosion of everything else we’ve seen and with everything that’s going on in the Internet.

[00:07:23]

Rothrock: Sun was open, so to speak, but this concept of open source now that just dominates the conversation. As a venture capitalist, every deal I ever invested in had open source of some sort in it. There was a while when it was very problematic in an M&A event, but the world’s gotten used to it. So open, is very powerful. It’s like freedom. It’s like liberty. Like today, July 4th, it’s a big deal.

[00:07:52]

Graham-Cumming: Yes, absolutely. It’s just interesting to see it explode today because I spent a lot of my career looking at so many different networking protocols. The thing that really surprises me, or perhaps shouldn’t surprise me when you’ve got these open things, is that you harness so many people’s intelligence that you just end up with something that’s just better. It seems simple.

[00:08:15]

Rothrock: It seems simple. I think part of the magic of Sun is that they made it easy. Easy is the most powerful thing you can do in computing. Computing can be so nerdy and so difficult. But if you just make it easy, and Cloudflare has done a great job with that at that; they did it with their DNS service, they did it with all the stuff we worked on back when I was on the board and actively involved in the company. You’ve got to make it easy. I mean, I remember when Matthew and Lee worked like 20 hours a day on how to switch your DNS from whoever your provider was to Cloudflare. That was supposed to be one click, done. A to B. And that DNA was part of the magic. And whether we agree that Sun did it that way, to me at least, Sun did it that way as well. So it’s huge, a huge lift.

[00:09:08]

Graham-Cumming: It’s funny you talk about that because at the time, how that actually worked is that we just asked people to give us their username and password. And we logged in and did it for them. Early on, Matthew asked me if I’d be interested in joining Cloudflare when it was brand new and because of other reasons I’d moved back to the UK and I wasn’t ready to change jobs and I’d just taken another job. And I remember thinking, this thing is crazy this Cloudflare thing. Who’s going to hand over their DNS and their traffic to these four or five people above a nail salon in Palo Alto? And Matthew’s response was, “They’re giving us their passwords, let alone their traffic.” Because they were so desperate for it.

[00:09:54]

Rothrock: It tells you a lot about Matthew and you know as an attorney, I mean he was very sensitive to that and believes that one of the one of the founding principles is trust. His view was that, if I ever lose the customer’s trust, Cloudflare is toast. And so everything focused around that key value. And he was right.

[00:10:18]

Graham-Cumming: And you must have, at Sun, been involved with some high performance computing things that involved sensitive customers doing cryptography and things like that. So again trust is another theme that runs through there as well.

[00:10:33]

Rothrock: Yeah, very true. As the marketing guy of CAD/CAM, I was in the field two-thirds of the time, showing customers what was possible with them. My job was to get third party software onto the Sun box and then to turn that into a presentation to a customer. So I visited many government customers, many aerospace, power, all these very high falutin sort of behind the firewall kinds of guys in those days. So yes, trust was huge. It would come up: “Okay, so I’m using your CPU, how is it that you can’t use mine. And how do you convince me that you’ve not violated something.” In those days it was a whole different conversation that it is today but it was nonetheless just as important. In fact I remember I spent quite a bit of time at NCSA at the University of Illinois Urbana-Champaign. Larry Smarr was the head of NCSA. We spent a lot of time with Larry. I think John was there with me. John Gage and Vinod and some others but it was a big deal taking about high performance computing because that’s what they were doing and doing it with Sun.

[00:11:50]

Graham-Cumming: So just to dial forward, so you’re at Venrock and you decide to invest in Cloudflare. What was it that made you think that this was worth investing in? Presumably you saw some things that were in some of Sun’s vision. Because Sun had a very wide-ranging visions about what was going to be possible with computing.

[00:12:11]

Rothrock: Yeah. Let me sort of touch on a few points probably. Certainly Sun was my first computer company I worked for after I got out of the nuclear business and the philosophy of the company was very powerful. Not only we had this cool 19 inch black and white giant Macintosh essentially although the Mac wasn’t even born yet, but it had this ease of use that was powerful and had this open, I mean it was we preached that all the time and we made that possible. And Cloudflare—the related philosophy of Matthew and Michelle’s genius—was they wanted to make security and distribution of data as free and easy as possible for the long tail. That was the first thinking because you didn’t have access if you were in the long tail you were a small company you or you’re just going to get whipped around by the big boys. And so there was a bit of, “We’re here to help you, we’re going to do it.” It’s a good thing that the long tail get mobilized if you will or emboldened to use the Internet like the big boys do. And that was part of the attractiveness. I didn’t say, “Boy, Matthew, this sounds like Sun,” but the concept of open and liberating which is what they were trying to do with this long tail DNS and CDN stuff was very compelling and seemed easy. But nothing ever is. But they made it look easy.

[00:13:52]

Graham-Cumming: Yeah, it never is. One of the parallels that I’ve noticed is that I think early on at Sun, a lot of Sun equipment went to companies that later became big companies. So some of these small firms that were using crazy work stations ended up becoming some of the big names in the Valley. To your point about the long tail, they were being ignored and couldn’t buy from IBM even if they wanted to.

[00:14:25]

Rothrock: They couldn’t afford SNA and they couldn’t do lots of things. So Sun was an enabler for these companies with cool ideas for products and software to use Sun as the underpinning. workstations were all the rage, because PCs were very limited in those days. Very very limited, they were all Intel based. Sun was 68000-based originally and then it was their own stuff, SPARC. You know in the beginning it was a cheap microprocessor from Motorola.

[00:15:04]

Graham-Cumming: What was the growth like at Sun? Because it was very fast, right?

[00:15:09]

Rothrock: Oh yes, it was extraordinarily fast. I think I was employee 130 or something like that. I left Sun in 1986 to go to business school and they gave me a leave of absence. Carol Bartz was my boss at that moment. The company was like at 2000 people just two and a half years later. So it was growing like a weed. I measured my success by how thick the catalyst—that was our catalog name and our program—how thick and how quickly I could add bonafide software developers to our catalog. We published on one sheet of paper front to back. When I first got there, our catalyst catalogue was a sheet of paper, and when I left, it was a book. It was about three-quarters of an inch thick. My group grew from me to 30 people in about a year and a half. It was extraordinary growth. We went public during that time, had a lot of capital and a lot of buzz. That openness, that our competition was all proprietary just like you were citing there, John. IBM and Apollo were all proprietary networks. You could buy a NIC card and stick it into your PC and talk to a Sun. And vise versa. And you couldn’t do that with IBM or Apollo. Do you remember those?

[00:16:48]

Graham-Cumming: I do because I was talking to John Gage. In my first job out of college, I wrote a TCP/IP stack from scratch, for a manufacturer of network cards. The test of this stack was I had an HP Apollo box and I had a Sun workstation and there was a sort of magical, can I talk to these devices? And can I ping them? And then that was already magical the first ping as it went across the network. And then, can I Telnet to one of these? So you know, getting the networking actually running was sort of the key thing. How important was networking for Sun in the early days? Was it always there?

[00:17:35]

Rothrock: Yeah, it was there from the beginning, the idea of having a network capability. When I got there it was network; the machine wasn’t standalone at all. We sort of mimicked the mainframe world where we had green screens hooked into a Sun in a department for example. And there was time sharing. But as soon as you got a Sun on your desk, which was rare because we were shipping as many as we could build, it was fantastic. I was sharing information with engineering and we were working back and forth on stuff. But I think it was fundamental: you have a microprocessor, you’ve got a big screen, you’ve got a graphic UI, and you have a network that hooks into the greater universe. In those days, to send an all-Sun email around the world, modems spun up everywhere. The network wasn’t what it is now.

[00:18:35]

Graham-Cumming: I remember in about 89’, I was at a conference and Whit Diffie was there. I asked him what he was doing. He was in a little computer room. I was trying to typeset something. And he said, “I’m telnetting into a machine which is in San Diego.” It was the first time I’d seen this and I stepped over and he was like, “look at this.” And he’s hitting the keyboard and the keys are getting echoed back. And I thought, oh my goodness, this is incredible. It’s right across the Atlantic and across the country as well.

[00:19:10]

Rothrock: I think, and this is just me talking having lived the last years and with all the investing and stuff I did, but you know it enabled the Internet to come about, the TCP/IP standard. You may recall that Microsoft tried to modify the TCP/IP stack slightly, and the world rejected it, because it was just too powerful, too pervasive. And then along comes HTTP and all the other protocols that followed. Telnetting, FTPing, all that file transfer stuff, we were doing that left, right, and center back in the 80s. I mean you know Cloudflare just took all this stuff and made it better, easier, and literally lower friction. That was the core investment thesis at the time and it just exploded. Much like when Sun adopted TCP/IP, it just exploded. You were there when it happened. My little company that I’m the CEO of now, we use Cloudflare services. First thing I did when I got there was switched to Cloudflare.

[00:20:18]

Graham-Cumming: And that was one of the things when I joined, we really wanted people get to a point where if you’re putting something on the web, you just say, well I’m going to put Cloudflare or a thing like Cloudflare just on it. Because it protects it, it makes it faster, etc. And of course now what we’ve done is we’ve given people compute facility. Right now you can write code and run it in our in our machines worldwide which is another whole thing.

[00:20:43]

Rothrock: And that is “The Network is the Computer”. The other thing that Sun was pitching then was a paperless office. I remember we had posters of paper flying out of a computer window on a Sun workstation and I don’t think we’ve gotten there yet. But certainly, the network is the computer.

[00:21:04]

Graham-Cumming: It was probably the case that the paperless office was one of those things that was about to happen for quite a long time.

[00:21:14]

Rothrock: It’s still about to happen if you ask me. I think e-commerce and the sort of the digital transformation has driven it harder than just networking. You know, the fact that we can now sign legal documents over the Internet without paper and things like that. People had to adopt. People have to trust. People have to adopt these standards and accept them. And lo and behold we are because we made it easy, we made it cheap, and we made it trustworthy.

[00:21:42]

Graham-Cumming: If you dial back through Sun, what was the hardest thing? I’m asking because I’m at a 1,000-person company and it feels hard some days, so I’m curious. What do I need to start worrying about?

[00:22:03]

Rothrock: Well yeah, at 1,000 people, I think that’s when John came into the company and sort of organized marketing. I would say, holding engineering to schedules; that was hard. That was hard because we were pushing the envelope our graphics was going from black and white to color. The networking stuff the performance of all the chips into the boards and just the performance was a big deal. And I remember, for me personally, I would go to a trade show. I’d go to Boston to the Association of Mechanical Engineers with the team there and would show up at these workstations and of course the engineers want to show off the latest. So I would be bringing with me tapes that we had of the latest operating system. But getting the engineers to be ready for a tradeshow was very hard because they were always experimenting. I don’t believe the word “code freeze” meant much to them, frankly, but we would we would be downloading the software and building a trade show thing that had to run for three days on the latest and greatest and we knew our competitor would be there right across the aisle from us sort of showing their hot stuff. And working with Eric Schmidt in those days, you know, Eric you just got to be done on this date. But trade shows were wonderful. They focused the company’s endpoints if you will. And marketing and sales drove Sun; Scott McNealy’s culture there was big on that. But we had to show. It’s different today than it was then, I don’t know about the Cloudflare competition, but back then, there were a dozen workstation companies and we were fighting for mindshare and market share every day. So you didn’t dare sort of leave your best jewels at home. You brought them with you. I will give John Gage high, high marks. He showed me how to dance through a reboot in case the code crashed and he’s marvelous and I learned how to work that stuff and to survive.

[00:24:25]

Rothrock: Can I tell you one sort of sales story?  

[00:24:28]

Graham-Cumming: Yes, I’m very interested in hearing the non-technical stories. As an engineer, I can hear engineering stories all the time, but I’m curious what it was like being in sales and marketing in such an engineering heavy company as Sun.

[00:24:48]

Rothrock: Yeah. Well it was challenging of course. One of the strategies that Sun had in those days was to get anyone who was building their own computer. This was Computer Vision and Data General and all those guys to adopt the Sun as their hardware platform and then they could put on whatever they wanted. So because I was one of the demo gods, my job was to go along with the sales guys when they wanted to try to convince somebody. So one of the companies we went after was Data General (DG) in Massachusetts. And so I worked for weeks on getting this whole demo suite running MDA, EDA, word processing, I had everything. And this was a big, big, big deal. And I mean like hundreds of millions of dollars of revenue. And so I went out a couple of days early and we were going to put up a bunch of Suns and I had a demo room at DG. So all the gear showed up and I got there at like 5:30 in the morning and started downloading everything, downloading software, making it dance. And at about 8:00 a.m. in the morning the CEO of Data General walks in. I didn’t know who he was but it turned out to be Ed de Castro. And he introduces himself and I didn’t know who he was and he said, “What are you doing?” And I explained, “I’m from Sun, I’m getting ready for a big demo. We’ve got a big executive presentation. Mr. McNealy will be here shortly, etc.” And he said, “Well, show me what you’ve got.” So I’m sort of still in the middle of downloading this software and I start making this thing dance. I’ve got these machines talking to each other and showing all kinds of cool stuff. And he left. And the meeting was about 10 or 11 in the morning. And so when the executive team from Sun showed up they said, “Well, how’s it going?” I said, “Well I gave a demo to a guy,” and they asked, “Who’s the guy,” and I said, “It was Ed de Castro.” And they went, “Oh my God, that was the CEO.” Well, we got the deal. I thought Ed had a little tactic there to come in early, see what he could see, maybe get the true skinny on this thing and see what’s real. I carried the day. But anyway, I got a nice little bonus for that. But Vinod and I would drop into Lockheed down in Southern California. They wanted to put Suns on P-3 airplanes and we’d go down there with an engineer and we’d figure out how to make it. Those were just incredible times. You may remember back in the 80s everyone dressed up except on Fridays. It was dress-down Fridays. And one day I dressed down and Carol Bartz, my boss, saw me wearing blue jeans and just an open collared shirt and she said, “Rothrock, you go home and put on a suit! You never know when a customer is going to walk in the front door.” She was quite right. Kodak shows up. Kodak made a big investment in Sun when it was still private. And I gave that demo and then AT&T, and then interestingly Vice President Bush back in the Reagan administration came to Sun to see the manufacturing and I gave the demo to the Vice President with Scott and Andy and Bill and Vinod standing there.

[00:28:15]

Graham-Cumming: Do you remember what he saw?

[00:28:18]

Rothrock: It was my standard two minute Sun demo that I can give in my sleep. We were on the manufacturing floor. We picked up a machine and I created a demo for it and my executive team was there. We have a picture of it somewhere, but it was fun. As John Gage would say, he’d say, “Ray, your job is to make the computer dance.” So I did.  

[00:28:44]

Graham-Cumming: And one of the other things I wanted to ask you about is at some point Sun was almost Amazon Web Services, wasn’t it. There was a rent-a-computer service, right?  

[00:28:53]

Rothrock: I don’t know. I don’t remember the rent-a-computer service. I remember we went after the PC business aggressively and went after the data centers which were brand new in those days pretty aggressively, but I don’t remember the rent-a-computer business that much. It wasn’t in my domain.

[00:29:14]

Graham-Cumming: So what are you up to these days?

[00:29:18]

Rothrock: I’m still investing. I do a lot of security investing. I did 15 deals while I was at Venrock. Cloudflare was the last one I did, which turned out really well of course. More to come, I hope. And I’m CEO of one of Venrock’s portfolio companies that had a little trouble a few years back but I fixed that and it’s moving up nicely now. But I’ve started thinking about more of a science base. I’m on the board of the Carnegie Institute of Science. I’m on the board of MIT and I just joined the board of the Nuclear Threat Initiative in Washington which is run by Secretary Ernie Moniz, former secretary of energy. So I’m doing stuff like that. John would be pleased with how well that played through. But I’ll tell you it is this these fundamental principles, just tying it all back to Sun and Cloudflare, and this sort of open, cheap, easy, enabling humans to do things without too much friction, that is exciting. I mean, look at your phone. Steve Jobs was the master of design to make this thing as sweet as it is.

[00:30:37]

Graham-Cumming: Yes, and as addictive.

[00:30:39]

Rothrock: Absolutely, right. I haven’t been to a presentation from Cloudflare in two years, but every time I see an announcement like the DNS service, I immediately switched all my DNS here at the house to 1.1.1.1. Stuff like that. Because I know it’s good and I know it’s trustworthy, and it’s got that philosophy built in the DNA.

[00:31:09]

Graham-Cumming: Yes definitely. Taking it back to what we talked about at the beginning, it’s definitely the trustworthiness is something that Cloudflare has cared about from the beginning and continues to care about. We’re sort of the guardians of the traffic that passes through it.

[00:31:25]

Rothrock: Back when the Internet started happening and when Sun was doing Java, I mean, all those things in the 90s, I was of course at Venrock, but I was still pretty connected to [Edward] Zander and [Scott] McNealy. We were hoping that it would be liberating, that it would create a world which was much more free and open to conversation and we’ve seen the dark side of some of that. But I continue to believe that transparency and openness is a good thing and we should never shut it down. I don’t mean to get it all waxing philosophical here but way more good comes from being open and transparent than bad.

[00:32:07]

Graham-Cumming: Listen it’s July 4th. It’s evening here in London. We can be waxing philosophical as much as we like. Well listen, thank you for taking the time to chat with me. Are there any other reminiscences of Sun that you think the public needs to know in this oral history of “The Network is the Computer.”

[00:32:28]

Rothrock: Well you know the only thing I’d say is having landed in the Silicon Valley in 1981 and getting on with Sun, I can say this given my age and longevity here, everything is built on somebody else’s great ideas. And starting with TCP/IP and then we went to this HTML protocol and browsers, it’s just layer on layer on layer on layer and so Cloudflare is just one of the latest to climb on the shoulders of those giants who put it all together. I mean, we don’t even think about the physical network anymore. But it is there and thank goodness companies like Cloudflare keep providing that fundamental service on which we can build interesting, cool, exciting, and mind-changing things. And without a Cloudflare, without Sun, without Apollo, without all those guys back in the day, it would be different. The world would just be so, so different. I did the New York Times crossword puzzle. I could not do it without Google because I have access to information I would not have unless I went to the library. It’s exponential and it just gets better. Thanks to Michelle and Matthew and Lee for starting Cloudflare and allowing Venrock to invest in it.

[00:34:01]

Graham-Cumming: Well thank you for being an investor. I mean, it helped us get off the ground and get things moving. I very much agree with you about the standing on the shoulders of giants because people don’t appreciate the extent to which so much of this fundamental work that we did was done in the 70s and 80s.

[00:34:19]

Rothrock: Yea, it’s just like the automobile and the airplane. We reminisce about the history but boy, there were a lot of giants in those industries as well. And computing is just the latest.

[00:34:32]

Graham-Cumming: Yep, absolutely. Well, Ray, thank you. Have a good afternoon.


Interested in hearing more? Listen to my conversations with John Gage and Greg Papadopoulos of Sun Microsystems:

To learn more about Cloudflare Workers, check out the use cases below:

  • Optimizely – Optimizely chose Workers when updating their experimentation platform to provide faster responses from the edge and support more experiments for their customers.
  • Cordial – Cordial used a “stable of Workers” to do custom Black Friday load shedding as well as using it as a serverless platform for building scalable customer-facing services.
  • AO.com – AO.com used Workers to avoid significant code changes to their underlying platform when migrating from a legacy provider to a modern cloud backend.
  • Pwned Passwords – Troy Hunt’s popular “Have I Been Pwned” project benefits from cache hit ratios of 94% on its Pwned Passwords API due to Workers.
  • Timely – Using Workers and Workers KV, Timely was able to safely migrate application endpoints using simple value updates to a distributed key-value store. Quintype – Quintype was an eager adopter of Workers to cache content they previously considered un-cacheable and improve the user experience of their publishing platform.

The Network is the Computer: A Conversation with Greg Papadopoulos

Post Syndicated from John Graham-Cumming original https://blog.cloudflare.com/greg-papadopoulos/

The Network is the Computer: A Conversation with Greg Papadopoulos

The Network is the Computer: A Conversation with Greg Papadopoulos

I spoke with Greg Papadopoulos, former CTO of Sun Microsystems, to discuss the origins and meaning of The Network is the Computer®, as well as Cloudflare’s role in the evolution of the phrase. During our conversation, we considered the inevitability of latency, the slowness of the speed of light, and the future of Cloudflare’s newly acquired trademark. Listen to our conversation here and read the full transcript below.


[00:00:08]

John Graham-Cumming: Thank you so much for taking the time to chat with me. I’ve got Greg Papadopoulos who was CTO of Sun and is currently a venture capitalist. Tell us about “The Network is the Computer.”

[00:00:22]

Greg Papadopoulos: Well, from certainly a Sun perspective, the very first Sun-1 was connected via Internet protocols and at that time there was a big war about what should win from a networking point of view. And there was a dedication there that everything that we made was going to interoperate on the network over open standards, and from day one in the company, it was always that thought. It’s really about the collection of these machines and how they interact with one another, and of course that puts the network in the middle of it. And then it becomes hard to, you know, where’s the line? But it is one of those things that I think even if you ask most people at Sun, you go, “Okay explain to me ‘The Network is the Computer.’” It would get rather meta. People would see that phrase and sort of react to it in their own way. But it would always come back to something similar to what I had said I think in the earlier days.

[00:01:37]

Graham-Cumming: I remember it very well because it was obviously plastered everywhere in Silicon Valley for a while. And it sounded incredibly cool but I was never quite sure what it meant. It sounded like it was one of those things that was super deep but I couldn’t dig deep enough. But it sort of seems like this whole vision has come true because if you dial back to I think it’s 2006, you wrote a blog post about how the world was only going to need five or seven or some small number of computers. And that was also linked to this as well, wasn’t it?

[00:02:05]

Papadopoulos: Yeah, I think as things began to evolve into what we would call cloud computing today, but that you could put substantial resources on the other side of the network and from the end user’s perspective and those could be as effective or more effective than something you’d have in front of you. And so this idea that you really could provide these larger scale computing services in early days — you know, grid was the term used before cloud — but if you follow that logic, and you watch what was happening to the improvements of the network. Dave Patterson at Cal was very fond of saying in that era and in the 90s, networks are getting to the place where the desk connected to another machine is transparent to you. I mean it could be your own, in fact, somebody else’s memory may in fact be closer to you than your own disk. And that’s a pretty interesting thought. And so where we ended up going was really a complete realization that these things we would call servers were actually just components of this network computer. And so it was very mysterious, “The Network is the Computer,” and it actually grew into itself in this way. And I’ll say looking at Cloudflare, you see this next level of scale happening. It’s not just, what are those things that you build inside a data center, how do you connect to it, but in fact, it’s the network that is the computer that is the network.

[00:04:26]

Graham-Cumming: It’s interesting though that there have been these waves of centralization and then push the computing power to the edge and the PCs at some point and then Larry Ellison came along and he was going to have this networked computer thing, and it sort of seems to swing back and forth, so where do you think we are in this swinging?

[00:04:44]

Papadopoulos: You know, I don’t think so much swinging. I think it’s a spiral upwards and we come to a place and we look down and it looks familiar. You know, where you’ll say, oh I see, here’s a 3270 connected to a mainframe. Well, that looks like a browser connected to a web server. And you know, here’s the device, it’s connected to the web service. And they look similar but there are some very important differences as we’re traversing this helix of sorts. And if you look back, for example the 3270, it was inextricably bound to a single server that was hosted. And now our devices have really the ability to connect to any other computer on the network. And so then I think we’re seeing something that looks like a pendulum there, it’s really a refactoring question on what software belongs where and how hard is it to maintain where it is, and naturally I think that the Internet protocol clearly is a peer to peer protocol, so it doesn’t take sides on this. And so that we end up in one state, with more on the client or less on the client. I think it really has to do with how well we’ve figured out distributed computing and how well we can deliver code in a management-free way. And that’s a longer conversation.

[00:06:35]

Graham-Cumming: Well, it’s an interesting conversation. One thing is what you talked about with Sun Grid which then we end up with Amazon Web Services and things like that, is that there was sort of the device, be it your handheld or your laptop talking to some cloud computing, and then what Cloudflare has done with this Workers product to say, well, actually I think there’s three places where code could exist. There’s something you can put inside the network.

[00:07:02]

Papadopoulos: Yes. And by extension that could grow to another layer too. And it goes back to, I think it’s Dave Clark who I first remember saying you can get all the bandwidth you want, that’s money, but you can’t reduce latency. That’s God, right. And so I think there are certainly things and as I see the Workers architecture, there are two things going on. There’s clearly something to be said about latency there, and having distributed points of presence and getting closer to the clients. And there’s IBM with interaction there too, but it is also something that is around management of software and how we should be thinking in delivery of applications, which ultimately I believe, in the limit, become more distributed-looking than they are now. It’s just that it’s really hard to write distributed applications in kind of the general way we think about it.

[00:08:18]

Graham-Cumming: Yes, that’s one of these things isn’t it, it is exceedingly hard to actually write these things which is why I think we’re going through a bit of a transition right now where people are trying to figure out where that code should actually execute and what should execute where.

[00:08:31]

Papadopoulos: Yeah. You had graciously pointed out this blog from a dozen years ago on, hey this is inevitable that we’re going to have this concentration of computing, for a lot of economic reasons as anything else. But it’s both a hammer and a nail. You know, cloud stuff in some ways is unnatural in that why should we expect computing to get concentrated like it is. If you really look into it more deeply, I think it has to do with management and control and capital cycles and really things that are kind of on the economic and the administrative side of things, are not about what’s truth and beauty and the destination for where applications should be.

[00:09:27]

Graham-Cumming: And I think you also see some companies are now starting to wrestle with the economics of the cloud where they realize that they are kind of locked into their cloud provider and are paying rent kind of thing; it becomes entirely economic at that point.

[00:09:41]

Papadopoulos: Well it does, and you know, this was also something I was pretty vocal about, although I got misinterpreted for a while there as being, you know, anti-cloud or something which I’m not, I think I’m pragmatic about it. One of the dangers is certainly as people yield particularly to SaaS products, that in fact, your data in many ways, unless you have explicit contracts and abilities to disgorge that data from that service, that data becomes more and more captive. And that’s the part that I think is actually the real question here, which is like, what’s the switching cost from one service to another, from one cloud to another.

[00:10:35]

Graham-Cumming: Yes, absolutely. That’s one of the things that we faced, one of the reasons why we worked on this thing called the Bandwidth Alliance, which is one of the ways in which stuff gets locked into clouds is the egress fee is so large that you don’t want to get your data out.

[00:10:50]

Papadopoulos: Exactly. And then there is always the, you know, well we have these particular features in our particular cloud that are very seductive to developers and you write to them and it’s kind of hard to undo, you know, just the physics of moving things around. So what you all have been doing there is I think necessary and quite progressive. But we can do more.

[00:11:17]

Graham-Cumming: Yes definitely. Just to go back to the thought about latency and bandwidth, I have a jokey pair of slides where I show the average broadband network you can buy over time and it going up, and then the change in the speed of light over the same period, which of course is entirely flat, zero progress in the speed of light. Looking back through your biography, you wrote thinking machines and I assume that fighting latency at a much shorter distance of cabling must have been interesting in those machines because of the speeds at which they were operating.

[00:11:54]

Papadopoulos: Yes, it surprises most people when you say it, but you know, computer architects complain that the speed of light is really slow. And you know, Grace Hopper who is really one of the founders, the pioneers of modern programming languages and COBOL. I think she was a vice admiral. And she would walk around with a wire that was a foot long and say, “this is a nanosecond”. And that seemed pretty short for a while but, you know a nanosecond is an eternity these days.

[00:12:40]

Graham-Cumming: Yes, it’s an eternity. People don’t quite appreciate it if they’re not thinking about it, how long it is. I had someone who was new to the computing world learning about it, come to me with a book which was talking about fiber optics, and in the book it said there is a laser that flashes on and off a billion times a second to send data down the fiber optic. And he came to me and said, “This can’t possibly be true; it’s just too fast.”

[00:13:09]

Papadopoulos: No, it’s too slow!

[00:013:12]

Graham-Cumming: Right? And I thought, well that’s slow. And then I stepped back and thought, you know, to the average person, that is a ridiculous statement, that somehow we humans have managed to control time at this ridiculously small level. And then we keep pushing and pushing and pushing it and people don’t appreciate how fast and actually how slow the light is, really.

[00:13:33]

Papadopoulos: Yeah. And I think if it actually comes down to it, if you want to get into a very pure reckoning of this is latency is the only thing that matters. And one can look at bandwidth as a component of latency, so you can see bandwidth as a serialization delay and that kind of goes back to Clark thing, you know that, yeah I can buy that, I can’t bribe God on the other side so you know I’m fundamentally left with this problem that we have. Thank you, Albert Einstein, right? It’s kind of hopeless to think about sending information faster than that.

[00:14:09]

Graham-Cumming: Yeah exactly. There’s information limits, which is driving why we have such powerful phones, because in fact the latency to the human is very low if you have it in your hand.

[00:14:23]

Papadopoulos: Yes, absolutely. This is where the edge architecture and the Worker structure that you guys are working on, and I think that’s where it becomes really interesting too because it gives me — you talked about earlier, well we’re now introducing this new tier — but it gives me a really closer place from a latency point of view to have some intimate relationship with a device, and at the same time be well-connected to the network.

[00:14:55]

Graham-Cumming: Right. And I think the other thing that is interesting about that is that your device fundamentally is an insecure thing, so you know if you put code on that thing, you can’t put secrets in it, like a cryptographic secrets, because the end user has access to them. Normally you would keep that in the server somewhere, but then the other funny thing is if you have this intermediary tier which is both secure and low latency to the end user, you suddenly have a different world in which you can put secrets, you can put code that is privileged, but it can interact with the user very very rapidly because the low latency.

[00:15:30]

Papadopoulos: Yeah. And that essence of where’s my trust domain. Now I’ve seen all kinds of like, oh my gosh, I cannot believe somebody is doing it, like putting their S3 credentials, putting it down on a device and having it talk, you know, the log in for a database or something. You must be kidding. I mean that trust proxy point at low latency is a really key thing.

[00:16:02]

Graham-Cumming: Yes, I think it’s just people need to start thinking about that architecture. Is there a sort of parallel with things that were going on with very high-performance computing with sort of the massively parallel stuff and what’s happening today? What lessons can we take from work done in the 70s and 80s and apply it to the Internet of today?

[00:16:24]

Papadopoulos: Well, we talked about this sort of, there are a couple of fundamental issues here. And one we’ve been speaking about is latency. The other one is synchronization, and this comes up in a bunch of different ways. You know, whether it’s when one looks at the cap theorem kinds of things that Eric Brewer has been famous for, can I get consistency and availability and survive partitionability, all that, at the same time. And so you end up in this kind of place of—goes back to maybe Einstein a bit—but you know, knowing when things have happened and when state has been actually changed or committed is a pretty profound problem.

[00:17:15]

Graham-Cumming: It is, and what order things have happened.

[00:17:18]

Papadopoulos: Yes. And that order is going to be relative to an observer here as well. And so if you’re insisting on some total ordering then you’re insisting on slowing things down as well. And that really is fundamental. We were pushing into that in the massively parallel stuff and you’ll see that at Internet scale. You know there’s another thing, if I could. This is one of my greatest “aha”s about networks and it’s due to a fellow at Sun, Rob Gingell, who actually ended up being chief engineer at Sun and was one of the real pioneers of the software development framework that brought Solaris forward. But Rob would talk about this thing that I label as network entropy. It’s basically what happens when you connect systems to networks, what do networks kind of do to those systems? And this is a little bit of a philosophical question; it’s not a physical one. And Rob observed that over time networks have this property of wanting to decompose things into constituent parts, have those parts get specialized and then reintegrated. And so let me make that less abstract. So in the early days of connecting systems to networks, one of the natural observations were, well why don’t we take the storage out of those desktop systems or server systems and put them on the other side of at least a local network and into a file server or storage server. And so you could see that computer sort of get pulled apart between its computing and its storage piece. And then that storage piece, you know in Rob’s step, that would go on and get specialized. So we had whole companies start like Network Appliances, Pure Storage, EMC. And so, you know like big pieces of industry or look the original routers were RADb you know running on workstations and you know Cisco went and took that and made that into something and so you now see this effect happen at the next scale. One of the things that really got me excited when I first saw Cloudflare a decade ago was, wow okay in those early days, well we can take a component like a network firewall and that can get pulled away and created as its own network entity and specialized. And I think one of the things, at least from my history of Cloudflare, one of the most profound things was, particularly as you guys went in and separated off these functions early on, the fear of people was this is going to introduce latency, and in fact things got faster. Figure that.

[00:20:51]

Graham-Cumming: Part of that of course is caching and then there’s dealing with the speed of light by being close to people. But also if you say your company makes things faster and you do all these different things including security, you are forced to optimize the whole thing to live up to the claim. Whereas if you try and chain things together, nobody’s really responsible for that overall latency budget. It becomes natural that you have to do it.

[00:21:18]

Papadopoulos: Yes. And you all have done it brilliantly, you know, to sort of Gingell’s view. Okay so this piece got decomposed and now specialized, meaning optimized like heck, because that’s what you do. And so you can see that over and over again and you see it in terms of even Twilio or something. You know, here’s a messaging service. I’m just pulling my applications apart, letting people specialize. But the final piece, and this is really the punchline. The final piece is, Rob will talk about it, the value is in the reintegration of it. And so you know what are those unifying forces that are creating, if you will, the operating system for “The Network is the Computer.” You were asking about the massively parallel scale. Well, we had an operating system we wrote for this. As you get up to the higher scale, you get into these more distributed circumstances where the complexity goes up by some important number of orders of magnitude, and now what’s that reintegration? And so I come back and I look at what Cloudflare is doing here. You’re entering into that phase now of actually being that re-integrator, almost that operating system for the computer that is the network.

[00:23:06]

Graham-Cumming: I think that’s right. We often talk about actually being an operating system on the Internet, so very similar kind of thoughts.

[00:23:14]

Papadopoulos: Yes. And you know as we were talking earlier about how developers make sense of this pendulum or cycle or whatever it is. Having this idea of an operating system or of a place where I can have ground truths and trust and sort of fixed points in this are terribly important.

[00:23:44]

Graham-Cumming: Absolutely. So do you have any final thoughts on, what, it must be 30 years on from when “The Network is the Computer” was a Sun trademark. Now it’s a Cloudflare trademark. What’s the future going to look of that slogan and who’s going to trademark it in 30 years time now?

[00:24:03]

Papadopoulos: Well, it could be interplanetary at that point.

[00:24:13]

Graham-Cumming: Well, if you talk about the latency problems of going interplanetary, we definitely have to solve the latency.

[00:24:18]

Papadopoulos: Yeah. People do understand that. They go, wow it’s like seven minutes within here and Mars, hitting close approach.

[00:24:28]

Graham-Cumming: The earthly equivalent of that is New Zealand. If you speak to people from New Zealand and they come on holiday to Europe or they move to the US, they suddenly say that the Internet works so much better here. And it’s just that it’s closer. Now the Australians have figured this out because Australia is actually drifting northwards so they’re actually going to get within. That’s going to fix it for them but New Zealand is stuck.

[00:24:56]

Papadopoulos: I do ask my physicist friends for one of two things. You know, either give me a faster speed of light — so far they have not delivered — or another dimension I can cut through. Maybe we’ll keep working on the latter.

[00:25:16]

Graham-Cumming: All right. Well listen Greg, thank you for the conversation. Thank you for thinking about this stuff many many years ago. I think we’re getting there slowly on some of this work. And yeah, good talking to you.

[00:25:27]

Papadopoulos: Well, you too. And thank you for carrying the torch forward. I think everyone from Sun who listens to this, and John, and everybody should feel really proud about what part they played in the evolution of this great invention.

[00:25:48]

Graham-Cumming: It’s certainly the case that a tremendous amount of work was done at Sun that was really fundamental and, you know, perhaps some of that was ahead of its time but here we are.

[00:25:57]

Papadopoulos: Thank you.

[00:25:58]

Graham-Cumming: Thank you very much.

[00:25:59]

Papadopoulos: Cheers.


Interested in hearing more? Listen to my conversations with John Gage and Ray Rothrock of Sun Microsystems:

To learn more about Cloudflare Workers, check out the use cases below:

  • Optimizely – Optimizely chose Workers when updating their experimentation platform to provide faster responses from the edge and support more experiments for their customers.
  • Cordial – Cordial used a “stable of Workers” to do custom Black Friday load shedding as well as using it as a serverless platform for building scalable customer-facing services.
  • AO.com – AO.com used Workers to avoid significant code changes to their underlying platform when migrating from a legacy provider to a modern cloud backend.
  • Pwned Passwords – Troy Hunt’s popular “Have I Been Pwned” project benefits from cache hit ratios of 94% on its Pwned Passwords API due to Workers.
  • Timely – Using Workers and Workers KV, Timely was able to safely migrate application endpoints using simple value updates to a distributed key-value store.
  • Quintype – Quintype was an eager adopter of Workers to cache content they previously considered un-cacheable and improve the user experience of their publishing platform.

The Network is the Computer

Post Syndicated from John Graham-Cumming original https://blog.cloudflare.com/the-network-is-the-computer/

The Network is the Computer

The Network is the Computer

We recently registered the trademark for The Network is the Computer®, to encompass how Cloudflare is utilizing its network to pave the way for the future of the Internet.

The phrase was first coined in 1984 by John Gage, the 21st employee of Sun Microsystems, where he was credited with building Sun’s vision around “The Network is the Computer.” When Sun was acquired in 2010, the trademark was not renewed, but the vision remained.

Take it from him:

“When we built Sun Microsystems, every computer we made had the network at its core. But we could only imagine, over thirty years ago, today’s billions of networked devices, from the smallest camera or light bulb to the largest supercomputer, sharing their packets across Cloudflare’s distributed global network.

We based our vision of an interconnected world on open and shared standards. Cloudflare extends this dedication to new levels by openly sharing designs for security and resilience in the post-quantum computer world.

Most importantly, Cloudflare is committed to immediate, open, transparent accountability for network performance. I’m a dedicated reader of their technical blog, as the network becomes central to our security infrastructure and the global economy, demanding even more powerful technical innovation.”

Cloudflare’s massive network, which spans more than 180 cities in 80 countries, enables the company to deliver its suite of security, performance, and reliability products, including its serverless edge computing offerings.

In March of 2018, we launched our serverless solution Cloudflare Workers, to allow anyone to deploy code at the edge of our network. We also recently announced advancements to Cloudflare Workers in June of 2019 to give application developers the ability to do away with cloud regions, VMs, servers, containers, load balancers—all they need to do is write the code, and we do the rest. With each of Cloudflare’s data centers acting as a highly scalable application origin to which users are automatically routed via our Anycast network, code is run within milliseconds of users worldwide.

In honor of registering Sun’s former trademark, I spoke with John Gage, Greg Papadopoulos, former CTO of Sun Microsystems, and Ray Rothrock, former Director of CAD/CAM Marketing at Sun Microsystems, to learn more about the history of the phrase and what it means for the future:

To learn more about Cloudflare Workers, check out the use cases below:

  • Optimizely – Optimizely chose Workers when updating their experimentation platform to provide faster responses from the edge and support more experiments for their customers.
  • Cordial – Cordial used a “stable of Workers” to do custom Black Friday load shedding as well as using it as a serverless platform for building scalable customer-facing services.
  • AO.com – AO.com used Workers to avoid significant code changes to their underlying platform when migrating from a legacy provider to a modern cloud backend.
  • Pwned Passwords – Troy Hunt’s popular “Have I Been Pwned” project benefits from cache hit ratios of 94% on its Pwned Passwords API due to Workers.
  • Timely – Using Workers and Workers KV, Timely was able to safely migrate application endpoints using simple value updates to a distributed key-value store.
  • Quintype – Quintype was an eager adopter of Workers to cache content they previously considered un-cacheable and improve the user experience of their publishing platform.

Amazon Aurora PostgreSQL Serverless – Now Generally Available

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/amazon-aurora-postgresql-serverless-now-generally-available/

The database is usually the most critical part of a software architecture and managing databases, especially relational ones, has never been easy. For this reason, we created Amazon Aurora Serverless, an auto-scaling version of Amazon Aurora that automatically starts up, shuts down and scales up or down based on your application workload.

The MySQL-compatible edition of Aurora Serverless has been available for some time now. I am pleased to announce that the PostgreSQL-compatible edition of Aurora Serverless is generally available today.

Before moving on with details, I take the opportunity to congratulate the Amazon Aurora development team that has just won the 2019 Association for Computing Machinery’s (ACM) Special Interest Group on Management of Data (SIGMOD) Systems Award!

When you create a database with Aurora Serverless, you set the minimum and maximum capacity. Your client applications transparently connect to a proxy fleet that routes the workload to a pool of resources that are automatically scaled. Scaling is very fast because resources are “warm” and ready to be added to serve your requests.

 

There is no change with Aurora Serverless on how storage is managed by Aurora. The storage layer is independent from the compute resources used by the database. There is no need to provision storage in advance. The minimum storage is 10GB and, based on the database usage, the Amazon Aurora storage will automatically grow, up to 64 TB, in 10GB increments with no impact to database performance.

Creating an Aurora Serverless PostgreSQL Database
Let’s start an Aurora Serverless PostgreSQL database and see the automatic scalability at work. From the Amazon RDS console, I select to create a database using Amazon Aurora as engine. Currently, Aurora serverless is compatible with PostgreSQL version 10.5. Selecting that version, the serverless option becomes available.

I give the new DB cluster an identifier, choose my master username, and let Amazon RDS generate a password for me. I will be able to retrieve my credentials during database creation.

I can now select the minimum and maximum capacity for my database, in terms of Aurora Capacity Units (ACUs), and in the additional scaling configuration I choose to pause compute capacity after 5 minutes of inactivity. Based on my settings, Aurora Serverless automatically creates scaling rules for thresholds for CPU utilization, connections, and available memory.

Testing Some Load on the Database
To generate some load on the database I am using sysbench on an EC2 instance. There are a couple of Lua scripts bundled with sysbench that can help generate an online transaction processing (OLTP) workload:

  • The first script, parallel_prepare.lua, generates 100,000 rows per table for 24 tables.
  • The second script, oltp.lua, generates workload against those data using 64 worker threads.

By using those scripts, I start generating load on my database cluster. As you can see from this graph, taken from the RDS console monitoring tab, the serverless database capacity grows and shrinks to follow my requirements. The metric shown on this graph is the number of ACUs used by the database cluster. First it scales up to accommodate the sysbench workload. When I stop the load generator, it scales down and then pauses.

Available Now
Aurora Serverless PostgreSQL is available now in US East (N. Virginia), US East (Ohio), US West (Oregon), EU (Ireland), and Asia Pacific (Tokyo). With Aurora Serverless, you pay on a per-second basis for the database capacity you use when the database is active, plus the usual Aurora storage costs.

For more information on Amazon Aurora, I recommend this great post explaining why and how it was created:

Amazon Aurora ascendant: How we designed a cloud-native relational database

It’s never been so easy to use a relational database in production. I am so excited to see what you are going to use it for!

Best Practices for Developing on AWS Lambda

Post Syndicated from George Mao original https://aws.amazon.com/blogs/architecture/best-practices-for-developing-on-aws-lambda/

In our previous post we discussed the various ways you can invoke AWS Lambda functions. In this post, we’ll provide some tips and best practices you can use when building your AWS Lambda functions.

One of the benefits of using Lambda, is that you don’t have to worry about server and infrastructure management. This means AWS will handle the heavy lifting needed to execute your Lambda functions. You should take advantage of this architecture with the tips below.

Tip #1: When to VPC-Enable a Lambda Function

Lambda functions always operate from an AWS-owned VPC. By default, your function has full ability to make network requests to any public internet address — this includes access to any of the public AWS APIs. For example, your function can interact with AWS DynamoDB APIs to PutItem or Query for records. You should only enable your functions for VPC access when you need to interact with a private resource located in a private subnet. An RDS instance is a good example.

RDS instance: When to VPC enable a Lambda function

Once your function is VPC-enabled, all network traffic from your function is subject to the routing rules of your VPC/Subnet. If your function needs to interact with a public resource, you will need a route through a NAT gateway in a public subnet.

Tip #2: Deploy Common Code to a Lambda Layer (i.e. the AWS SDK)

If you intend to reuse code in more than one function, consider creating a Layer and deploying it there. A great candidate would be a logging package that your team is required to standardize on. Another great example is the AWS SDK. AWS will include the AWS SDK for NodeJS and Python functions (and update the SDK periodically). However, you should bundle your own SDK and pin your functions to a version of the SDK you have tested.

Tip #3: Watch Your Package Size and Dependencies

Lambda functions require you to package all needed dependencies (or attach a Layer) — the bigger your deployment package, the slower your function will cold-start. Remove all unnecessary items, such as documentation and unused libraries. If you are using Java functions with the AWS SDK, only bundle the module(s) that you actually need to use — not the entire SDK.

Good:

<dependency>
    <groupId>software.amazon.awssdk</groupId>
    <artifactId>dynamodb</artifactId>
    <version>2.6.0</version>
</dependency>

Bad:

<!-- https://mvnrepository.com/artifact/software.amazon.awssdk/aws-sdk-java -->
<dependency>
    <groupId>software.amazon.awssdk</groupId>
    <artifactId>aws-sdk-java</artifactId>
    <version>2.6.0</version>
</dependency>

Tip #4: Monitor Your Concurrency (and Set Alarms)

Our first post in this series talked about how concurrency can effect your down stream systems. Since Lambda functions can scale extremely quickly, this means you should have controls in place to notify you when you have a spike in concurrency. A good idea is to deploy a CloudWatch Alarm that notifies your team when function metrics such as ConcurrentExecutions or Invocations exceeds your threshold. You should create an AWS Budget so you can monitor costs on a daily basis. Here is a great example of how to set up automated cost controls.

Tip #5: Over-Provision Memory (in some use cases) but Not Function Timeout

Lambda allocates compute power in proportion to the memory you allocate to your function. This means you can over provision memory to run your functions faster and potentially reduce your costs. You should benchmark your use case to determine where the breakeven point is for running faster and using more memory vs running slower and using less memory.

However, we recommend you do not over provision your function time out settings. Always understand your code performance and set a function time out accordingly. Overprovisioning function timeout often results in Lambda functions running longer than expected and unexpected costs.

About the Author

George MaoGeorge Mao is a Specialist Solutions Architect at Amazon Web Services, focused on the Serverless platform. George is responsible for helping customers design and operate Serverless applications using services like Lambda, API Gateway, Cognito, and DynamoDB. He is a regular speaker at AWS Summits, re:Invent, and various tech events. George is a software engineer and enjoys contributing to open source projects, delivering technical presentations at technology events, and working with customers to design their applications in the Cloud. George holds a Bachelor of Computer Science and Masters of IT from Virginia Tech.

Understanding the Different Ways to Invoke Lambda Functions

Post Syndicated from George Mao original https://aws.amazon.com/blogs/architecture/understanding-the-different-ways-to-invoke-lambda-functions/

In our first post, we talked about general design patterns to enable massive scale with serverless applications. In this post, we’ll review the different ways you can invoke Lambda functions and what you should be aware of with each invocation model.

Synchronous Invokes

Synchronous invocations are the most straight forward way to invoke your Lambda functions. In this model, your functions execute immediately when you perform the Lambda Invoke API call. This can be accomplished through a variety of options, including using the CLI or any of the supported SDKs.

Here is an example of a synchronous invoke using the CLI:

aws lambda invoke —function-name MyLambdaFunction —invocation-type RequestResponse —payload  “[JSON string here]”

The Invocation-type flag specifies a value of “RequestResponse”. This instructs AWS to execute your Lambda function and wait for the function to complete. When you perform a synchronous invoke, you are responsible for checking the response and determining if there was an error and if you should retry the invoke.

Many AWS services can emit events that trigger Lambda functions. Here is a list of services that invoke Lambda functions synchronously:

Asynchronous Invokes

Here is an example of an asynchronous invoke using the CLI:

aws lambda invoke —function-name MyLambdaFunction —invocation-type Event —payload  “[JSON string here]”

Notice, the Invocation-type flag specifies “Event.” If your function returns an error, AWS will automatically retry the invoke twice, for a total of three invocations.

Here is a list of services that invoke Lambda functions asynchronously:

Asynchronous invokes place your invoke request in Lambda service queue and we process the requests as they arrive. You should use AWS X-Ray to review how long your request spent in the service queue by checking the “dwell time” segment.

Poll based Invokes

This invocation model is designed to allow you to integrate with AWS Stream and Queue based services with no code or server management. Lambda will poll the following services on your behalf, retrieve records, and invoke your functions. The following are supported services:

AWS will manage the poller on your behalf and perform Synchronous invokes of your function with this type of integration. The retry behavior for this model is based on data expiration in the data source. For example, Kinesis Data streams store records for 24 hours by default (up to 168 hours). The specific details of each integration are linked above.

Conclusion

In our next post, we’ll provide some tips and best practices for developing Lambda functions. Happy coding!

 

About the Author

George MaoGeorge Mao is a Specialist Solutions Architect at Amazon Web Services, focused on the Serverless platform. George is responsible for helping customers design and operate Serverless applications using services like Lambda, API Gateway, Cognito, and DynamoDB. He is a regular speaker at AWS Summits, re:Invent, and various tech events. George is a software engineer and enjoys contributing to open source projects, delivering technical presentations at technology events, and working with customers to design their applications in the Cloud. George holds a Bachelor of Computer Science and Masters of IT from Virginia Tech.