A Name Resolver for the Distributed Web

The Domain Name System (DNS) matches names to resources. Instead of typing 104.18.26.46 to access the Cloudflare Blog, you type blog.cloudflare.com and, using DNS, the domain name resolves to 104.18.26.46, the Cloudflare Blog IP address.

Similarly, distributed systems such as Ethereum and IPFS rely on a naming system to be usable. DNS could be used, but its resolvers’ attributes run contrary to properties valued in distributed Web (dWeb) systems. Namely, dWeb resolvers ideally provide (i) locally verifiable data, (ii) built-in history, and (iii) have no single trust anchor.

At Cloudflare Research, we have been exploring alternative ways to resolve queries to responses that align with these attributes. We are proud to announce a new resolver for the Distributed Web, where IPFS content indexed by the Ethereum Name Service (ENS) can be accessed.

To discover how it has been built, and how you can use it today, read on.

Welcome to the Distributed Web

IPFS and its addressing system

The InterPlanetary FileSystem (IPFS) is a peer-to-peer network for storing content on a distributed file system. It is composed of a set of computers called nodes that store and relay content using a common addressing system.

This addressing system relies on the use of Content IDentifiers (CID). CIDs are self-describing identifiers, because the identifier is derived from the content itself. For example, QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco is the CID version 0 (CIDv0) of the wikipedia-on ipfs homepage.

To understand why a CID is defined as self-describing, we can look at its binary representation. For QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco, the CID looks like the following:

The first is the algorithm used to generate the CID (sha2-256 in this case); then comes the length of the encoded content (32 for a sha2-256 hash), and finally the content itself. When referring to the multicodec table, it is possible to understand how the content is encoded.

Name	Code (in hexadecimal)
identity	0x00
sha1	0x11
sha2-256	0x12 = 00010010
keccak-256	0x1b

This encoding mechanism is useful, because it creates a unique and upgradable content-addressing system across multiple protocols.

If you want to learn more, have a look at ProtoSchool’s tutorial.

Ethereum and decentralised applications

Ethereum is an account-based blockchain with smart contract capabilities. Being account-based, each account is associated with addresses and these can be modified by operations grouped in blocks and sealed by Ethereum’s consensus algorithm, Proof-of-Work.

There are two categories of accounts: user accounts and contract accounts. User accounts are controlled by a private key, which is used to sign transactions from the account. Contract accounts hold bytecode, which is executed by the network when a transaction is sent to their account. A transaction can include both funds and data, allowing for rich interaction between accounts.

When a transaction is created, it gets verified by each node on the network. For a transaction between two user accounts, the verification consists of checking the origin account signature. When the transaction is between a user and a smart contract, every node runs the smart contract bytecode on the Ethereum Virtual Machine (EVM). Therefore, all nodes perform the same suite of operations and end up in the same state. If one actor is malicious, nodes will not add its contribution. Since nodes have diverse ownership, they have an incentive to not cheat.

How to access IPFS content

As you may have noticed, while a CID describes a piece of content, it doesn’t describe where to find it. In fact, the CID describes the content, but not its location on the network. The location of the file would be retrieved by a query made to an IPFS node.

An IPFS URL (Unified Resource Locator) looks like this: ipfs://QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco. Accessing this URL means retrieving QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco using the IPFS protocol, denoted by ipfs://. However, typing such a URL is quite error-prone. Also, these URLs are not very human-friendly, because there is no good way to remember such long strings. To get around this issue, you can use DNSLink. DNSLink is a way of specifying IPFS CIDs within a DNS TXT record. For instance, wikipedia on ipfs has the following TXT record

$ dig +short TXT _dnslink.en.wikipedia-on-ipfs.org

_dnslink=/ipfs/QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco

In addition, its A record points to an IPFS gateway. This means that, when you access en.wikipedia-on-ipfs.org, your request is directed to an IPFS HTTP Gateway, which then looks out for the CID using your domain TXT record, and returns the content associated to this CID using the IPFS network.

This is trading ease-of-access against security. The web browser of the user doesn’t verify the integrity of the content served. This could be because the browser does not implement IPFS or because it has no way of validating domain signature — DNSSEC. We wrote about this issue in our previous blog post on End-to-End Integrity.

Human readable identifiers

DNS simplifies referring to IP addresses, in the same way that postal addresses are a way of referring to geolocation data, and contacts in your mobile phone abstract phone numbers. All these systems provide a human-readable format and reduce the error rate of an operation.

To verify these data, the trusted anchors, or “sources of truth”, are:

Root DNS Keys for DNS.
The government registry for postal addresses. In the UK, addresses are handled by cities, boroughs and local councils.
When it comes to your contacts, you are the trust anchor.

Ethereum Name Service, an index for the Distributed Web

An account is identified by its address. An address starts with “0x” and is followed by 20 bytes (ref 4.1 Ethereum yellow paper), for example: 0xf10326c1c6884b094e03d616cc8c7b920e3f73e0. This is not very readable, and can be pretty scary when transactions are not reversible and one can easily mistype a single character.

A first mitigation strategy was to introduce a new notation to capitalise some letters based on the hash of the address 0xF10326C1c6884b094E03d616Cc8c7b920E3F73E0. This can help detect mistype, but it is still not readable. If I have to send a transaction to a friend, I have no way of confirming she hasn’t mistyped the address.

The Ethereum Name Service (ENS) was created to tackle this issue. It is a system capable of turning human-readable names, referred to as domains, to blockchain addresses. For instance, the domain privacy-pass.eth points to the Ethereum address 0xF10326C1c6884b094E03d616Cc8c7b920E3F73E0.

To achieve this, the system is organised in two components, registries and resolvers.

A registry is a smart contract that maintains a list of domains and some information about each domain: the domain owner and the domain resolver. The owner is the account allowed to manage the domain. They can create subdomains and change ownership of their domain, as well as modify the resolver associated with their domain.

Resolvers are responsible for keeping records. For instance, Public Resolver is a smart contract capable of associating not only a name to blockchain addresses, but also a name to an IPFS content identifier. The resolver address is stored in a registry. Users then contact the registry to retrieve the resolver associated with the name.

Consider a user, Alice, who has direct access to the Ethereum state. The flow goes as follows: Alice would like to get Privacy Pass’s Ethereum address, for which the domain is privacy-pass.eth. She looks for privacy-pass.eth in the ENS Registry and figures out the resolver for privacy-pass.eth is at 0x1234… . She now looks for the address of privacy-pass.eth at the resolver address, which turns out to be 0xf10326c….

Accessing the IPFS content identifier for privacy-pass.eth works in a similar way. The resolver is the same, only the accessed data is different — Alice calls a different method from the smart contract.

Cloudflare Distributed Web Resolver

The goal was to be able to use this new way of indexing IPFS content directly from your web browser. However, accessing the ENS registry requires access to the Ethereum state. To get access to IPFS, you would also need to access the IPFS network.

To tackle this, we are going to use Cloudflare’s Distributed Web Gateway. Cloudflare operates both an Ethereum Gateway and an IPFS Gateway, respectively available at cloudflare-eth.com and cloudflare-ipfs.com.

EthLink

The first version of EthLink was built by Jim McDonald and is operated by True Name LTD at eth.link. Starting from next week, eth.link will transition to use the Cloudflare Distributed Web Resolver. To that end, we have built EthLink on top of Cloudflare Workers. This is a proxy to IPFS. It proxies all ENS registered domains when .link is appended. For instance, privacy-pass.eth should render the Privacy Pass homepage. From your web browser, https://privacy-pass.eth.link does it.

The resolution is done at the Cloudflare edge using a Cloudflare Worker. Cloudflare Workers allows JavaScript code to be run on Cloudflare infrastructure, eliminating the need to maintain a server and increasing the reliability of the service. In addition, it follows Service Workers API, so results returned from the resolver can be checked by end users if needed.

To do this, we setup a wildcard DNS record for *.eth.link to be proxied through Cloudflare and handled by a Cloudflare Worker. When a user Alice accesses privacy-pass.eth.link, the worker first gets the CID of the CID to be retrieved from Ethereum. Then, it requests the content matching this CID to IPFS, and returns it to Alice.

All parts can be run locally. The worker can be run in a service Worker, and the Ethereum Gateway can point to both a local Ethereum node and the IPFS gateway provided by IPFS Companion. It means that while Cloudflare provides resolution-as-a-service, none of the components has to be trusted.

Final notes

So are we distributed yet? No, but we are getting closer, building bridges between emerging technologies and current web infrastructure. By providing a gateway dedicated to the distributed web, we hope to make these services more accessible to everyone.

We thank the ENS team for their support of a new resolver on expanding the distributed web. The ENS team has been running a similar service at https://eth.link. On January 18th, they will switch https://eth.link to using our new service.

These services benefit from the added speed and security of the Cloudflare Worker platform, while paving the way to run distributed protocols in browsers.

Supporting teachers and students with remote learning through free video lessons

2021-01-13

Post Syndicated from original https://www.raspberrypi.org/blog/supporting-teachers-students-remote-learning-free-video-lessons/

Working with Oak National Academy, we’ve turned the materials from our Teach Computing Curriculum into more than 300 free, curriculum-mapped video lessons for remote learning.

A girl in a hijab learning at home at a laptop

A comprehensive set of free classroom materials

One of our biggest projects for teachers that we’ve worked on over the past two years is the Teach Computing Curriculum: a comprehensive set of free computing classroom materials for key stages 1 to 4 (learners aged 5 to 16). The materials comprise lesson plans, homework, progression mapping, and assessment materials. We’ve created these as part of the National Centre for Computing Education, but they are freely available for educators all over the world to download and use.

More than 300 free, curriculum-mapped video lessons

In the second half of 2020, in response to school closures, our team of experienced teachers produced over 100 hours of video to transform Teach Computing Curriculum materials into video lessons for learning at home. They are freely available for parents, educators, and learners to continue learning computing at home, wherever you are in the world.

Here’s the start of lesson 2 in the Year 8 ‘Computer systems’ unit

You’ll find our videos for more than 300 hour-long lessons on the Oak National Academy website. The progression of the lessons is mapped out clearly, and the videos cover England’s computing national curriculum. There are video lessons for:

Years 5 and 6 at key stage 2 (ages 7 to 11)
Years 7, 8, and 9 at key stage 3 (ages 11 to 14)
Examined (GCSE) as well as non-examined (Digital Literacy) at key stage 4 (ages 14 to 16)

To access the full set of classroom materials for teaching, visit the National Centre for Computing Education website.

The post Supporting teachers and students with remote learning through free video lessons appeared first on Raspberry Pi.

Comic for 2021.01.13

2021-01-13 Explosm.net

Post Syndicated from Explosm.net original http://explosm.net/comics/5767/

New Cyanide and Happiness Comic

AWS Managed Services by Anchor 2021-01-13 03:31:09

2021-01-13 Douglas Chang

Post Syndicated from Douglas Chang original https://www.anchor.com.au/blog/2021/01/25624/

If you’re an IT Manager or Operations Manager who has considered moving your company’s online assets into the AWS cloud, you may have started by wondering, what is it truly going to involve?

One of the first decisions you will need to make is whether you are going to approach the cloud with the assistance of an AWS managed service provider (AWS MSP), or whether you intend to fully self-manage.

Whether or not a fully managed service is the right option for you comes down to two pivotal questions;

Do you have the technical expertise required to properly deploy and maintain AWS cloud services?
Do you, or your team, have the time/capacity to take this on – not just right now, but in an ongoing capacity too?

Below, we’ll briefly cover some of the considerations you’ll need to make when choosing between fully managed AWS Cloud Services and Self-Managed AWS Cloud Services.

Self-Managed AWS Services

Why outsource the management of your AWS when you can train your own in-house staff to do it?

With self-managed AWS Services, this means you’re responsible for every aspect of the service from start to finish. Managing your own services allows for the benefit of ultimate control, which may be beneficial if you require very specific deployment conditions or software versions to run your applications. It can also allow you to very gradually test your applications within their new infrastructure, and learn as you go.

This will result in knowing how to manage and control your own services on a closer level, but it comes with the downside of a very heavy learning curve and time investment if you have never entered the cloud environment before. In the context of a business or corporate environment, you’d also need to ensure that multiple staff members go through this process to ensure redundancy for staff availability and turnover. You’d also need in either case to invest in continuous development to keep up with the latest best practices and security protocols, because the cloud, like any technical landscape, is fast-paced and ever-changing.

This can end up being a significant investment in training and staff development. As employees are never guaranteed to stay, there is the risk of that investment, or at least substantial portions of it, disappearing at some point.

At the time of writing, there are 450 items in the AWS learning library, for those looking to self-learn. In terms of taking exams to obtain official accreditation, AWS offers 3 levels of certification at present, starting with Foundational, through to Associate, and finally, Professional. To reach the Professional level, AWS requires “Two years of comprehensive experience designing, operating, and troubleshooting solutions using the AWS Cloud”.

Fully Managed AWS Services

Hand the reins over to accredited professionals.

Fully-managed AWS services mean you’ll reap all of the extensive benefits of moving your online infrastructure into the cloud, without taking on the responsibility of setting up or maintaining those services.

You will hand over the stress of managing backups, high availability, software versions, patches, fixes, dependencies, cost optimisation, network infrastructure, security, and various other aspects of keeping your cloud services secure and cost-effective. You won’t need to spend anything on staff training or development, and there is no risk of losing control of your services when internal staff come and go. Essentially, you will be handing the reins over to a team of experts who have already obtained their AWS certifications at the highest level, with collective decades of experience in all manner of business operations and requirements.

The main risk here is choosing where the right place to outsource your AWS management is. When choosing to outsource AWS cloud management, you’ll want to be sure the AWS partner you choose offers the level of support you are going to require, as well as hold all relevant certifications. When partnered with the right AWS MSP team, you’ll also often find that the management fees pay for themselves due to the greater level of AWS cost optimisation that can be achieved by seasoned professionals.

If you’re interested in finding out an estimation of professional AWS cloud management costs for your business or discussing how your business operations could be improved or revolutionised through the AWS cloud platform, please don’t hesitate to get in touch with our expert team for a free consultation. Our expert team can conduct a thorough assessment of your current infrastructure and business, and provide you with a report on how your business can specifically benefit from a migration to the AWS cloud platform.

The post appeared first on AWS Managed Services by Anchor.

[$] Debian discusses vendoring—again

2021-01-13

Post Syndicated from original https://lwn.net/Articles/842319/rss

The problems with “vendoring” in packages—bundling dependencies rather than
getting them from other packages—seems to crop up frequently these days.
We looked at Debian’s concerns about
packaging Kubernetes and its myriad of Go
dependencies back in October. A more recent discussion in that
distribution’s community looks at another famously dependency-heavy
ecosystem: JavaScript libraries from the npm repository. Even C-based ecosystems
are not immune to the problem, as we saw with
iproute2 and libbpf back in November; the discussion of vendoring seems
likely to recur over the coming years.

Trident – Real-time event processing at scale

2021-01-13 Grab Tech

Post Syndicated from Grab Tech original https://engineering.grab.com/trident-real-time-event-processing-at-scale

Ever wondered what goes behind the scenes when you receive advisory messages on a confirmed booking? Or perhaps how you are awarded with rewards or points after completing a GrabPay payment transaction? At Grab, thousands of such campaigns targeting millions of users are operated daily by a backbone service called Trident. In this post, we share how Trident supports Grab’s daily business, the engineering challenges behind it, and how we solved them.

*60-minute GrabMart delivery guarantee campaign operated via Trident*

What is Trident?

Trident is essentially Grab’s in-house real-time if this, then that (IFTTT) engine, which automates various types of business workflows. The nature of these workflows could either be to create awareness or to incentivize users to use other Grab services.

If you are an active Grab user, you might have noticed new rewards or messages that appear in your Grab account. Most likely, these originate from a Trident campaign. Here are a few examples of types of campaigns that Trident could support:

After a user makes a GrabExpress booking, Trident sends the user a message that says something like “Try out GrabMart too”.
After a user makes multiple ride bookings in a week, Trident sends the user a food reward as a GrabFood incentive.
After a user is dropped off at his office in the morning, Trident awards the user a ride reward to use on the way back home on the same evening.
If a GrabMart order delivery takes over an hour of waiting time, Trident awards the user a free-delivery reward as compensation.
If the driver cancels the booking, then Trident awards points to the user as a compensation.
With the current COVID pandemic, when a user makes a ride booking, Trident sends a message to both the passenger and driver reminding about COVID protocols.

Trident processes events based on campaigns, which are basically a logic configuration on what event should trigger what actions under what conditions. To illustrate this better, let’s take a sample campaign as shown in the image below. This mock campaign setup is taken from the Trident Internal Management portal.

This sample setup basically translates to: for each user, count his/her number of completed GrabMart orders. Once he/she reaches 2 orders, send him/her a message saying “Make one more order to earn a reward”. And if the user reaches 3 orders, award him/her the reward and send a congratulatory message. 😁

Other than the basic event, condition, and action, Trident also allows more fine-grained configurations such as supporting the overall budget of a campaign, adding limitations to avoid over awarding, experimenting A/B testing, delaying of actions, and so on.

An IFTTT engine is nothing new or fancy, but building a high-throughput real-time IFTTT system poses a challenge due to the scale that Grab operates at. We need to handle billions of events and run thousands of campaigns on an average day. The amount of actions triggered by Trident is also massive.

In the month of October 2020, more than 2,000 events were processed every single second during peak hours. Across the entire month, we awarded nearly half a billion rewards, and sent over 2.5 billion communications to our end-users.

Now that we covered the importance of Trident to the business, let’s drill down on how we designed the Trident system to handle events at a massive scale and overcame the performance hurdles with optimization.

Architecture design

We designed the Trident architecture with the following goals in mind:

Independence: It must run independently of other services, and must not bring performance impacts to other services.
Robustness: All events must be processed exactly once (i.e. no event missed, no event gets double processed).
Scalability: It must be able to scale up processing power when the event volume surges and withstand when popular campaigns run.

The following diagram depicts how the overall system architecture looks like.

Trident consumes events from multiple Kafka streams published by various backend services across Grab (e.g. GrabFood orders, Transport rides, GrabPay payment processing, GrabAds events). Given the nature of Kafka streams, Trident is completely decoupled from all other upstream services.

Each processed event is given a unique event key and stored in Redis for 24 hours. For any event that triggers an action, its key is persisted in MySQL as well. Before storing records in both Redis and MySQL, we make sure any duplicate event is filtered out. Together with the at-least-once delivery guaranteed by Kafka, we achieve exactly-once event processing.

Scalability is a key challenge for Trident. To achieve high performance under massive event volume, we needed to scale on both the server level and data store level. The following mind map shows an outline of our strategies.

Scale servers

Our source of events are Kafka streams. There are mostly two factors that could affect the load on our system:

Number of events produced in the streams (more rides, food orders, etc. results in more events for us to process).
Number of campaigns running.
Nature of campaigns running. The campaigns that trigger actions for more users cause higher load on our system.

There are naturally two types of approaches to scale up server capacity:

Distribute workload among server instances.
Reduce load (i.e. reduce the amount of work required to process each event).

Distribute load

Distributing workload seems trivial with the load balancing and auto-horizontal scaling based on CPU usage that cloud providers offer. However, an additional server sits idle until it can consume from a Kafka partition.

Each Kafka partition can only be consumed by one consumer within the same consumer group (our auto-scaling server group in this case). Therefore, any scaling in or out requires matching the Kafka partition configuration with the server auto-scaling configuration.

Here’s an example of a bad case of load distribution:

*Kafka partitions config mismatches server auto-scaling config*

And here’s an example of a good load distribution where the configurations for the Kafka partitions and the server auto-scaling match:

*Kafka partitions config matches server auto-scaling config*

Within each server instance, we also tried to increase processing throughput while keeping the resource utilization rate in check. Each Kafka partition consumer has multiple goroutines processing events, and the number of active goroutines is dynamically adjusted according to the event volume from the partition and time of the day (peak/off-peak).

Reduce load

You may ask how we reduced the amount of processing work for each event. First, we needed to see where we spent most of the processing time. After performing some profiling, we identified that the rule evaluation logic was the major time consumer.

What is rule evaluation?

Recall that Trident needs to operate thousands of campaigns daily. Each campaign has a set of rules defined. When Trident receives an event, it needs to check through the rules for all the campaigns to see whether there is any match. This checking process is called rule evaluation.

More specifically, a rule consists of one or more conditions combined by AND/OR Boolean operators. A condition consists of an operator with a left-hand side (LHS) and a right-hand side (RHS). The left-hand side is the name of a variable, and the right-hand side a value. A sample rule in JSON:

Country is Singapore and taxi type is either JustGrab or GrabCar.
  {
    "operator": "and",
    "conditions": [
    {
      "operator": "eq",
      "lhs": "var.country",
      "rhs": "sg"
      },
      {
        "operator": "or",
        "conditions": [
        {
          "operator": "eq",
          "lhs": "var.taxi",
          "rhs": <taxi-type-id-for-justgrab>
          },
          {
            "operator": "eq",
            "lhs": "var.taxi",
            "rhs": <taxi-type-id-for-grabcard>
          }
        ]
      }
    ]
  }

When evaluating the rule, our system loads the values of the LHS variable, evaluates against the RHS value, and returns as result (true/false) whether the rule evaluation passed or not.

To reduce the resources spent on rule evaluation, there are two types of strategies:

Avoid unnecessary rule evaluation
Evaluate “cheap” rules first

We implemented these two strategies with event prefiltering and weighted rule evaluation.

Event prefiltering

Just like the DB index helps speed up data look-up, having a pre-built map also helped us narrow down the range of campaigns to evaluate. We loaded active campaigns from the DB every few minutes and organized them into an in-memory hash map, with event type as key, and list of corresponding campaigns as the value. The reason we picked event type as the key is that it is very fast to determine (most of the time just a type assertion), and it can distribute events in a reasonably even way.

When processing events, we just looked up the map, and only ran rule evaluation on the campaigns in the matching hash bucket. This saved us at least 90% of the processing time.

Weighted rule evaluation

Evaluating different rules comes with different costs. This is because different variables (i.e. LHS) in the rule can have different sources of values:

The value is already available in memory (already consumed from the event stream).
The value is the result of a database query.
The value is the result of a call to an external service.

These three sources are ranked by cost:

In-memory < database < external service

We aimed to maximally avoid evaluating expensive rules (i.e. those that require calling external service, or querying a DB) while ensuring the correctness of evaluation results.

First optimization – Lazy loading

Lazy loading is a common performance optimization technique, which literally means “don’t do it until it’s necessary”.

Take the following rule as an example:

A & B

If we load the variable values for both A and B before passing to evaluation, then we are unnecessarily loading B if A is false. Since most of the time the rule evaluation fails early (for example, the transaction amount is less than the given minimum amount), there is no point in loading all the data beforehand. So we do lazy loading ie. load data only when evaluating that part of the rule.

Second optimization – Add weight

Let’s take the same example as above, but in a different order.

B & A
Source of data for A is memory and B is external service

Now even if we are doing lazy loading, in this case, we are loading the external data always even though it potentially may fail at the next condition whose data is in memory.

Since most of our campaigns are targeted, a popular condition is to check if a user is in a certain segment, which is usually the first condition that a campaign creator sets. This data resides in another service. So it becomes quite expensive to evaluate this condition first even though the next condition’s data can be already in memory (e.g. if the taxi type is JustGrab).

So, we did the next phase of optimization here, by sorting the conditions based on weight of the source of data (low weight if data is in memory, higher if it’s in our database and highest if it’s in an external system). If AND was the only logical operator we supported, then it would have been quite simple. But the presence of OR made it complex. We came up with an algorithm that sorts the evaluation based on weight keeping in mind the AND/OR. Here’s what the flowchart looks like:

An example:

Conditions: A & ( B | C ) & ( D | E )

Actual result: true & ( false | false ) & ( true | true ) --> false

Weight: B < D < E < C < A

Expected check order: B, D, C

Firstly, we start validating B which is false. Apparently, we cannot skip the sibling conditions here since B and C are connected by |. Next, we check D. D is true and its only sibling E is connected by | so we can mark E “skip”. Then, we check E but since E has been marked “skip”, we just skip it. Still, we cannot get the final result yet, so we need to continue validating C which is false. Now we know (B | C) is false so the whole condition is false too. We can stop now.

Sub-streams

After investigation, we learned that we consumed a particular stream that produced terabytes of data per hour. It caused our CPU usage to shoot up by 30%. We found out that we process only a handful of event types from that stream. So we introduced a sub-stream in between, which contains the event types we want to support. This stream is populated from the main stream by another server, thereby reducing the load on Trident.

Protect downstream

While we scaled up our servers wildly, we needed to keep in mind that there were many downstream services that received more traffic. For example, we call the GrabRewards service for awarding rewards or the LocaleService for checking the user’s locale. It is crucial for us to have control over our outbound traffic to avoid causing any stability issues in Grab.

Therefore, we implemented rate limiting. There is a total rate limit configured for calling each downstream service, and the limit varies in different time ranges (e.g. tighter limit for calling critical service during peak hour).

Scale data store

We have two types of storage in Trident: cache storage (Redis) and persistent storage (MySQL and others).

Scaling cache storage is straightforward, since Redis Cluster already offers everything we need:

High performance: Known to be fast and efficient.
Scaling capability: New shards can be added at any time to spread out the load.
Fault tolerance: Data replication makes sure that data does not get lost when any single Redis instance fails, and auto election mechanism makes sure the cluster can always auto restore itself in case of any single instance failure.

All we needed to make sure is that our cache keys can be hashed evenly into different shards.

As for scaling persistent data storage, we tackled it in two ways just like we did for servers:

Distribute load
Reduce load (both overall and per query)

Distribute load

There are two levels of load distribution for persistent storage: infra level and DB level. On the infra level, we split data with different access patterns into different types of storage. Then on the DB level, we further distributed read/write load onto different DB instances.

Infra level

Just like any typical online service, Trident has two types of data in terms of access pattern:

Online data: Frequent access. Requires quick access. Medium size.
Offline data: Infrequent access. Tolerates slow access. Large size.

For online data, we need to use a high-performance database, while for offline data, we can just use cheap storage. The following table shows Trident’s online/offline data and the corresponding storage.

*Trident’s online/offline data and storage*

Writing of offline data is done asynchronously to minimize performance impact as shown below.

For retrieving data for the users, we have high timeout for such APIs.

DB level

We further distributed load on the MySQL DB level, mainly by introducing replicas, and redirecting all read queries that can tolerate slightly outdated data to the replicas. This relieved more than 30% of the load from the master instance.

Going forward, we plan to segregate the single MySQL database into multiple databases, based on table usage, to further distribute load if necessary.

Reduce load

To reduce the load on the DB, we reduced the overall number of queries and removed unnecessary queries. We also optimized the schema and query, so that query completes faster.

Query reduction

We needed to track usage of a campaign. The tracking is just incrementing the value against a unique key in the MySQL database. For a popular campaign, it’s possible that multiple increment (a write query) queries are made to the database for the same key. If this happens, it can cause an IOPS burst. So we came up with the following algorithm to reduce the number of queries.

Have a fixed number of threads per instance that can make such a query to the DB.
The increment queries are queued into above threads.
If a thread is idle (not busy in querying the database) then proceed to write to the database then itself.
If the thread is busy, then increment in memory.
When the thread becomes free, increment by the above sum in the database.

To prevent accidental over awarding of benefits (rewards, points, etc), we require campaign creators to set the limits. However, there are some campaigns that don’t need a limit, so the campaign creators just specify a large number. Such popular campaigns can cause very high QPS to our database. We had a brilliant trick to address this issue- we just don’t track if the number is high. Do you think people really want to limit usage when they set the per user limit to 100,000? 😉

Query optimization

One of our requirements was to track the usage of a campaign – overall as well as per user (and more like daily overall, daily per user, etc). We used the following query for this purpose:

INSERT INTO … ON DUPLICATE KEY UPDATE value = value + inc

The table had a unique key index (combining multiple columns) along with a usual auto-increment integer primary key. We encountered performance issues arising from MySQL gap locks when high write QPS hit this table (i.e. when popular campaigns ran). After testing out a few approaches, we ended up making the following changes to solve the problem:

Removed the auto-increment integer primary key.
Converted the secondary unique key to the primary key.

Conclusion

Trident is Grab’s in-house real-time IFTTT engine, which processes events and operates business mechanisms on a massive scale. In this article, we discussed the strategies we implemented to achieve large-scale high-performance event processing. The overall ideas of distributing and reducing load may be straightforward, but there were lots of thoughts and learnings shared in detail. If you have any comments or questions about Trident, feel free to leave a comment below.

All the examples of campaigns given in the article are for demonstration purpose only, they are not real live campaigns.

Join us

Grab is more than just the leading ride-hailing and mobile payments platform in Southeast Asia. We use data and technology to improve everything from transportation to payments and financial services across a region of more than 620 million people. We aspire to unlock the true potential of Southeast Asia and look for like-minded individuals to join us on this ride.

If you share our vision of driving South East Asia forward, apply to join our team today.

Unlocking LUKS2 volumes with TPM2, FIDO2, PKCS#11 Security Hardware on systemd 248

2021-01-13

Post Syndicated from original http://0pointer.net/blog/unlocking-luks2-volumes-with-tpm2-fido2-pkcs11-security-hardware-on-systemd-248.html

TL;DR: It’s now easy to unlock your LUKS2 volume with a FIDO2
security token (e.g. YubiKey, Nitrokey FIDO2, AuthenTrend
ATKey.Pro). And TPM2 unlocking is easy now too.

Blogging is a lot of work, and a lot less fun than hacking. I mostly
focus on the latter because of that, but from time to time I guess
stuff is just too interesting to not be blogged about. Hence here,
finally, another blog story about exciting new features in systemd.

With the upcoming systemd v248 the
systemd-cryptsetup
component of systemd (which is responsible for assembling encrypted
volumes during boot) gained direct support for unlocking encrypted
storage with three types of security hardware:

Unlocking with FIDO2 security tokens (well, at least with those
which implement the hmac-secret extension; most do). i.e. your
YubiKeys (series 5 and above), Nitrokey FIDO2, AuthenTrend
ATKey.Pro and such.
Unlocking with TPM2 security chips (pretty ubiquitous on non-budget
PCs/laptops/…)
Unlocking with PKCS#11 security tokens, i.e. your smartcards and
older YubiKeys (the ones that implement PIV). (Strictly speaking
this was supported on older systemd already, but was a lot more
“manual”.)

For completeness’ sake, let’s keep in mind that the component also
allows unlocking with these more traditional mechanisms:

Unlocking interactively with a user-entered passphrase (i.e. the
way most people probably already deploy it, supported since
about forever)
Unlocking via key file on disk (optionally on removable media
plugged in at boot), supported since forever.
Unlocking via a key acquired through trivial
AF_UNIX/SOCK_STREAM socket IPC. (Also new in v248)
Unlocking via recovery keys. These are pretty much the same
thing as a regular passphrase (and in fact can be entered wherever
a passphrase is requested) — the main difference being that they
are always generated by the computer, and thus have guaranteed high
entropy, typically higher than user-chosen passphrases. They are
generated in a way they are easy to type, in many cases even if the
local key map is misconfigured. (Also new in v248)

In this blog story, let’s focus on the first three items, i.e. those
that talk to specific types of hardware for implementing unlocking.

To make working with security tokens and TPM2 easy, a new, small tool
was added to the systemd tool set:
systemd-cryptenroll. It’s
only purpose is to make it easy to enroll your security token/chip of
choice into an encrypted volume. It works with any LUKS2 volume, and
embeds a tiny bit of meta-information into the LUKS2 header with
parameters necessary for the unlock operation.

Unlocking with FIDO2

So, let’s see how this fits together in the FIDO2 case. Most likely
this is what you want to use if you have one of these fancy FIDO2 tokens
(which need to implement the hmac-secret extension, as
mentioned). Let’s say you already have your LUKS2 volume set up, and
previously unlocked it with a simple passphrase. Plug in your token,
and run:

# systemd-cryptenroll --fido2-device=auto /dev/sda5

(Replace /dev/sda5 with the underlying block device of your volume).

This will enroll the key as an additional way to unlock the volume,
and embeds all necessary information for it in the LUKS2 volume
header. Before we can unlock the volume with this at boot, we need to
allow FIDO2 unlocking via
/etc/crypttab. For
that, find the right entry for your volume in that file, and edit it
like so:

myvolume /dev/sda5 - fido2-device=auto

Replace myvolume and /dev/sda5 with the right volume name, and
underlying device of course. Key here is the fido2-device=auto
option you need to add to the fourth column in the file. It tells
systemd-cryptsetup to use the FIDO2 metadata now embedded in the
LUKS2 header, wait for the FIDO2 token to be plugged in at boot
(utilizing systemd-udevd, …) and unlock the volume with it.

And that’s it already. Easy-peasy, no?

Note that all of this doesn’t modify the FIDO2 token itself in any
way. Moreover you can enroll the same token in as many volumes as you
like. Since all enrollment information is stored in the LUKS2 header
(and not on the token) there are no bounds on any of this. (OK, well,
admittedly, there’s a cap on LUKS2 key slots per volume, i.e. you
can’t enroll more than a bunch of keys per volume.)

Unlocking with PKCS#11

Let’s now have a closer look how the same works with a PKCS#11
compatible security token or smartcard. For this to work, you need a
device that can store an RSA key pair. I figure most security
tokens/smartcards that implement PIV qualify. How you actually get the
keys onto the device might differ though. Here’s how you do this for
any YubiKey that implements the PIV feature:

# ykman piv reset
# ykman piv generate-key -a RSA2048 9d pubkey.pem
# ykman piv generate-certificate --subject "Knobelei" 9d pubkey.pem
# rm pubkey.pem

(This chain of commands erases what was stored in PIV feature of your
token before, be careful!)

For tokens/smartcards from other vendors a different series of
commands might work. Once you have a key pair on it, you can enroll it
with a LUKS2 volume like so:

# systemd-cryptenroll --pkcs11-token-uri=auto /dev/sda5

Just like the same command’s invocation in the FIDO2 case this enrolls
the security token as an additional way to unlock the volume, any
passphrases you already have enrolled remain enrolled.

For the PKCS#11 case you need to edit your /etc/crypttab entry like this:

myvolume /dev/sda5 - pkcs11-uri=auto

If you have a security token that implements both PKCS#11 PIV and
FIDO2 I’d probably enroll it as FIDO2 device, given it’s the more
contemporary, future-proof standard. Moreover, it requires no special
preparation in order to get an RSA key onto the device: FIDO2 keys
typically just work.

Unlocking with TPM2

Most modern (non-budget) PC hardware (and other kind of hardware too)
nowadays comes with a TPM2 security chip. In many ways a TPM2 chip is
a smartcard that is soldered onto the mainboard of your system. Unlike
your usual USB-connected security tokens you thus cannot remove them
from your PC, which means they address quite a different security
scenario: they aren’t immediately comparable to a physical key you can
take with you that unlocks some door, but they are a key you leave at
the door, but that refuses to be turned by anyone but you.

Even though this sounds a lot weaker than the FIDO2/PKCS#11 model TPM2
still bring benefits for securing your systems: because the
cryptographic key material stored in TPM2 devices cannot be extracted
(at least that’s the theory), if you bind your hard disk encryption to
it, it means attackers cannot just copy your disk and analyze it
offline — they always need access to the TPM2 chip too to have a
chance to acquire the necessary cryptographic keys. Thus, they can
still steal your whole PC and analyze it, but they cannot just copy
the disk without you noticing and analyze the copy.

Moreover, you can bind the ability to unlock the harddisk to specific
software versions: for example you could say that only your trusted
Fedora Linux can unlock the device, but not any arbitrary OS some
hacker might boot from a USB stick they plugged in. Thus, if you trust
your OS vendor, you can entrust storage unlocking to the vendor’s OS
together with your TPM2 device, and thus can be reasonably sure
intruders cannot decrypt your data unless they both hack your OS
vendor and steal/break your TPM2 chip.

Here’s how you enroll your LUKS2 volume with your TPM2 chip:

# systemd-cryptenroll --tpm2-device=auto --tpm2-pcrs=7 /dev/sda5

This looks almost as straightforward as the two earlier
sytemd-cryptenroll command lines — if it wasn’t for the
--tpm2-pcrs= part. With that option you can specify to which TPM2
PCRs you want to bind the enrollment. TPM2 PCRs are a set of
(typically 24) hash values that every TPM2 equipped system at boot
calculates from all the software that is invoked during the boot
sequence, in a secure, unfakable way (this is called
“measurement”). If you bind unlocking to a specific value of a
specific PCR you thus require the system has to follow the same
sequence of software at boot to re-acquire the disk encryption
key. Sounds complex? Well, that’s because it is.

For now, let’s see how we have to modify your /etc/crypttab to
unlock via TPM2:

myvolume /dev/sda5 - tpm2-device=auto

This part is easy again: the tpm2-device= option is what tells
systemd-cryptsetup to use the TPM2 metadata from the LUKS2 header
and to wait for the TPM2 device to show up.

Bonus: Recovery Key Enrollment

FIDO2, PKCS#11 and TPM2 security tokens and chips pair well with
recovery keys: since you don’t need to type in your password everyday
anymore it makes sense to get rid of it, and instead enroll a
high-entropy recovery key you then print out or scan off screen and
store a safe, physical location. i.e. forget about good ol’
passphrase-based unlocking, go for FIDO2 plus recovery key instead!
Here’s how you do it:

# systemd-cryptenroll --recovery-key /dev/sda5

This will generate a key, enroll it in the LUKS2 volume, show it to
you on screen and generate a QR code you may scan off screen if you
like. The key has highest entropy, and can be entered wherever you can
enter a passphrase. Because of that you don’t have to modify
/etc/crypttab to make the recovery key work.

Future

There’s still plenty room for further improvement in all of this. In
particular for the TPM2 case: what the text above doesn’t really
mention is that binding your encrypted volume unlocking to specific
software versions (i.e. kernel + initrd + OS versions) actually sucks
hard: if you naively update your system to newer versions you might
lose access to your TPM2 enrolled keys (which isn’t terrible, after
all you did enroll a recovery key — right? — which you then can use
to regain access). To solve this some more integration with
distributions would be necessary: whenever they upgrade the system
they’d have to make sure to enroll the TPM2 again — with the PCR
hashes matching the new version. And whenever they remove an old
version of the system they need to remove the old TPM2
enrollment. Alternatively TPM2 also knows a concept of signed PCR
hash values. In this mode the distro could just ship a set of PCR
signatures which would unlock the TPM2 keys. (But quite frankly I
don’t really see the point: whether you drop in a signature file on
each system update, or enroll a new set of PCR hashes in the LUKS2
header doesn’t make much of a difference). Either way, to make TPM2
enrollment smooth some more integration work with your distribution’s
system update mechanisms need to happen. And yes, because of this OS
updating complexity the example above — where I referenced your trusty
Fedora Linux — doesn’t actually work IRL (yet? hopefully…). Nothing
updates the enrollment automatically after you initially enrolled it,
hence after the first kernel/initrd update you have to manually
re-enroll things again, and again, and again … after every update.

The TPM2 could also be used for other kinds of key policies, we might
look into adding later too. For example, Windows uses TPM2 stuff to
allow short (4 digits or so) “PINs” for unlocking the harddisk,
i.e. kind of a low-entropy password you type in. The reason this is
reasonably safe is that in this case the PIN is passed to the TPM2
which enforces that not more than some limited amount of unlock
attempts may be made within some time frame, and that after too many
attempts the PIN is invalidated altogether. Thus making dictionary
attacks harder (which would normally be easier given the short length
of the PINs).

Postscript

(BTW: Yubico sent me two YubiKeys for testing, Nitrokey a Nitrokey
FIDO2, and AuthenTrend three ATKey.Pro tokens, thank you! — That’s why
you see all those references to YubiKey/Nitrokey/AuthenTrend devices
in the text above: it’s the hardware I had to test this with. That
said, I also tested the FIDO2 stuff with a SoloKey I bought, where it
also worked fine. And yes, you!, other vendors!, who might be reading
this, please send me your security tokens for free, too, and I
might test things with them as well. No promises though. And I am not
going to give them back, if you do, sorry. ;-))

1/10,000th Scale World

2021-01-13

Post Syndicated from original https://xkcd.com/2411/

OCEAN PLAY AREA RULES: No running, no horseplay, no megatsunamis, and no trying to pry the wreck of the Titanic off the bottom.

Patch Tuesday – January 2021

2021-01-13 Richard Tsang

Post Syndicated from Richard Tsang original https://blog.rapid7.com/2021/01/12/patch-tuesday-january-2021/

Patch Tuesday - January 2021

We arrive at the first Patch Tuesday of 2021 (2021-Jan) with 83 vulnerabilities across our standard spread of products. Windows Operating System vulnerabilities dominated this month’s advisories, followed by Microsoft Office (which includes the SharePoint family of products), and lastly some from less frequent products such as Microsoft System Center and Microsoft SQL Server.

Vulnerability Breakdown by Software Family

Family	Vulnerability Count
Windows	65
ESU	35
Microsoft Office	11
Developer Tools	5
SQL Server	1
Apps	1
System Center	1
Azure	1
Browser	1

Microsoft Defender Remote Code Execution Vulnerability (CVE-2021-1647)

CVE-2021-1647 is marked as a CVSS 7.8, actively exploited, remote code execution vulnerability through the Microsoft Malware Protection Engine (mpengine.dll) between version 1.1.17600.5 up to 1.1.17700.4.

As a default, Microsoft’s affected antimalware software will automatically keep the Microsoft Malware Protection Engine up to date. What this means, however, is that no further action is needed to resolve this vulnerability unless non-standard configurations are used.

This vulnerability affects Windows Defender or the supported Endpoint Protection pieces of the System Center family of products (2012, 2012 R2, and namesake version: Microsoft System Center Endpoint Protection).

Patching Windows Operating Systems Next

Another confirmation of the standard advice of prioritizing Operating System patches whenever possible is that 11 of the 13 top CVSS-scoring (CVSSv3 8.8) vulnerabilities addressed in this month’s Patch Tuesday would be immediately covered through these means. As an interesting observation, the Windows Remote Procedure Call Runtime component appears to have been given extra scrutiny this month. This RPC Runtime component accounts for the 9 of the 13 top CVSS scoring vulnerabilities along with half of all the 10 Critical Remote Code Execution vulnerabilities being addressed.

More Work to be Done

Lastly, some minor calls to note that this Patch Tuesday includes SQL Server as that is an atypical family covered during Patch Tuesdays and, arguably more notable, is a reminder that Adobe Flash has officially reached end-of-life and would’ve been actively removed from all browsers via Windows Update (already).

Summary Tables

Here are this month’s patched vulnerabilities split by the product family.

Azure Vulnerabilities

CVE	Vulnerability Title	Exploited	Disclosed	CVSS3	FAQ?
CVE-2021-1677	Azure Active Directory Pod Identity Spoofing Vulnerability	No	No	5.5	Yes

Browser Vulnerabilities

CVE	Vulnerability Title	Exploited	Disclosed	CVSS3	FAQ?
CVE-2021-1705	Microsoft Edge (HTML-based) Memory Corruption Vulnerability	No	No	4.2	No

Developer Tools Vulnerabilities

cve	Vulnerability Title	Exploited	Disclosed	CVSS3	FAQ?
CVE-2020-26870	Visual Studio Remote Code Execution Vulnerability	No	No	7	Yes
CVE-2021-1725	Bot Framework SDK Information Disclosure Vulnerability	No	No	5.5	Yes
CVE-2021-1723	ASP.NET Core and Visual Studio Denial of Service Vulnerability	No	No	7.5	No

Developer Tools Windows Vulnerabilities

CVE	Vulnerability Title	Exploited	Disclosed	CVSS3	FAQ?
CVE-2021-1651	Diagnostics Hub Standard Collector Elevation of Privilege Vulnerability	No	No	7.8	No
CVE-2021-1680	Diagnostics Hub Standard Collector Elevation of Privilege Vulnerability	No	No	7.8	No

Microsoft Office Vulnerabilities

CVE	title	Exploited	Disclosed	CVSS3	FAQ?
CVE-2021-1715	Microsoft Word Remote Code Execution Vulnerability	No	No	7.8	Yes
CVE-2021-1716	Microsoft Word Remote Code Execution Vulnerability	No	No	7.8	Yes
CVE-2021-1641	Microsoft SharePoint Spoofing Vulnerability	No	No	4.6	No
CVE-2021-1717	Microsoft SharePoint Spoofing Vulnerability	No	No	4.6	No
CVE-2021-1718	Microsoft SharePoint Server Tampering Vulnerability	No	No	8	No
CVE-2021-1707	Microsoft SharePoint Server Remote Code Execution Vulnerability	No	No	8.8	Yes
CVE-2021-1712	Microsoft SharePoint Elevation of Privilege Vulnerability	No	No	8	No
CVE-2021-1719	Microsoft SharePoint Elevation of Privilege Vulnerability	No	No	8	No
CVE-2021-1711	Microsoft Office Remote Code Execution Vulnerability	No	No	7.8	Yes
CVE-2021-1713	Microsoft Excel Remote Code Execution Vulnerability	No	No	7.8	Yes
CVE-2021-1714	Microsoft Excel Remote Code Execution Vulnerability	No	No	7.8	Yes

SQL Server Vulnerabilities

CVE	title	Exploited	Disclosed	CVSS3	FAQ?
CVE-2021-1636	Microsoft SQL Elevation of Privilege Vulnerability	No	No	8.8	Yes

System Center Vulnerabilities

CVE	title	Exploited	Disclosed	CVSS3	FAQ?
CVE-2021-1647	Microsoft Defender Remote Code Execution Vulnerability	Yes	No	7.8	Yes

Windows Vulnerabilities

CVE	title	Exploited	Disclosed	CVSS3	FAQ?
CVE-2021-1681	Windows WalletService Elevation of Privilege Vulnerability	No	No	7.8	No
CVE-2021-1686	Windows WalletService Elevation of Privilege Vulnerability	No	No	7.8	No
CVE-2021-1687	Windows WalletService Elevation of Privilege Vulnerability	No	No	7.8	No
CVE-2021-1690	Windows WalletService Elevation of Privilege Vulnerability	No	No	7.8	No
CVE-2021-1646	Windows WLAN Service Elevation of Privilege Vulnerability	No	No	6.6	No
CVE-2021-1650	Windows Runtime C++ Template Library Elevation of Privilege Vulnerability	No	No	7.8	No
CVE-2021-1663	Windows Projected File System FS Filter Driver Information Disclosure Vulnerability	No	No	5.5	Yes
CVE-2021-1670	Windows Projected File System FS Filter Driver Information Disclosure Vulnerability	No	No	5.5	Yes
CVE-2021-1672	Windows Projected File System FS Filter Driver Information Disclosure Vulnerability	No	No	5.5	Yes
CVE-2021-1689	Windows Multipoint Management Elevation of Privilege Vulnerability	No	No	7.8	No
CVE-2021-1682	Windows Kernel Elevation of Privilege Vulnerability	No	No	7	No
CVE-2021-1697	Windows InstallService Elevation of Privilege Vulnerability	No	No	7.8	No
CVE-2021-1662	Windows Event Tracing Elevation of Privilege Vulnerability	No	No	7.8	No
CVE-2021-1703	Windows Event Logging Service Elevation of Privilege Vulnerability	No	No	7.8	No
CVE-2021-1645	Windows Docker Information Disclosure Vulnerability	No	No	5	Yes
CVE-2021-1637	Windows DNS Query Information Disclosure Vulnerability	No	No	5.5	Yes
CVE-2021-1638	Windows Bluetooth Security Feature Bypass Vulnerability	No	No	7.7	No
CVE-2021-1683	Windows Bluetooth Security Feature Bypass Vulnerability	No	No	5	No
CVE-2021-1684	Windows Bluetooth Security Feature Bypass Vulnerability	No	No	5	No
CVE-2021-1642	Windows AppX Deployment Extensions Elevation of Privilege Vulnerability	No	No	7.8	No
CVE-2021-1685	Windows AppX Deployment Extensions Elevation of Privilege Vulnerability	No	No	7.3	No
CVE-2021-1648	Microsoft splwow64 Elevation of Privilege Vulnerability	No	Yes	7.8	Yes
CVE-2021-1710	Microsoft Windows Media Foundation Remote Code Execution Vulnerability	No	No	7.8	No
CVE-2021-1691	Hyper-V Denial of Service Vulnerability	No	No	7.7	No
CVE-2021-1692	Hyper-V Denial of Service Vulnerability	No	No	7.7	No
CVE-2021-1643	HEVC Video Extensions Remote Code Execution Vulnerability	No	No	7.8	Yes
CVE-2021-1644	HEVC Video Extensions Remote Code Execution Vulnerability	No	No	7.8	Yes

Windows Apps Vulnerabilities

CVE	title	Exploited	Disclosed	CVSS3	FAQ?
CVE-2021-1669	Windows Remote Desktop Security Feature Bypass Vulnerability	No	No	8.8	Yes

Windows ESU Vulnerabilities

CVE	title	Exploited	Disclosed	CVSS3	FAQ?
CVE-2021-1709	Windows Win32k Elevation of Privilege Vulnerability	No	No	7	No
CVE-2021-1694	Windows Update Stack Elevation of Privilege Vulnerability	No	No	7.5	Yes
CVE-2021-1702	Windows Remote Procedure Call Runtime Elevation of Privilege Vulnerability	No	No	7.8	No
CVE-2021-1674	Windows Remote Desktop Protocol Core Security Feature Bypass Vulnerability	No	No	8.8	No
CVE-2021-1695	Windows Print Spooler Elevation of Privilege Vulnerability	No	No	7.8	No
CVE-2021-1676	Windows NT Lan Manager Datagram Receiver Driver Information Disclosure Vulnerability	No	No	5.5	Yes
CVE-2021-1706	Windows LUAFV Elevation of Privilege Vulnerability	No	No	7.3	No
CVE-2021-1661	Windows Installer Elevation of Privilege Vulnerability	No	No	7.8	No
CVE-2021-1704	Windows Hyper-V Elevation of Privilege Vulnerability	No	No	7.3	No
CVE-2021-1696	Windows Graphics Component Information Disclosure Vulnerability	No	No	5.5	Yes
CVE-2021-1708	Windows GDI+ Information Disclosure Vulnerability	No	No	5.7	Yes
CVE-2021-1657	Windows Fax Compose Form Remote Code Execution Vulnerability	No	No	7.8	No
CVE-2021-1679	Windows CryptoAPI Denial of Service Vulnerability	No	No	6.5	No
CVE-2021-1652	Windows CSC Service Elevation of Privilege Vulnerability	No	No	7.8	No
CVE-2021-1653	Windows CSC Service Elevation of Privilege Vulnerability	No	No	7.8	No
CVE-2021-1654	Windows CSC Service Elevation of Privilege Vulnerability	No	No	7.8	No
CVE-2021-1655	Windows CSC Service Elevation of Privilege Vulnerability	No	No	7.8	No
CVE-2021-1659	Windows CSC Service Elevation of Privilege Vulnerability	No	No	7.8	No
CVE-2021-1688	Windows CSC Service Elevation of Privilege Vulnerability	No	No	7.8	No
CVE-2021-1693	Windows CSC Service Elevation of Privilege Vulnerability	No	No	7.8	No
CVE-2021-1699	Windows (modem.sys) Information Disclosure Vulnerability	No	No	5.5	Yes
CVE-2021-1656	TPM Device Driver Information Disclosure Vulnerability	No	No	5.5	Yes
CVE-2021-1658	Remote Procedure Call Runtime Remote Code Execution Vulnerability	No	No	8.8	No
CVE-2021-1660	Remote Procedure Call Runtime Remote Code Execution Vulnerability	No	No	8.8	No
CVE-2021-1666	Remote Procedure Call Runtime Remote Code Execution Vulnerability	No	No	8.8	No
CVE-2021-1667	Remote Procedure Call Runtime Remote Code Execution Vulnerability	No	No	8.8	No
CVE-2021-1673	Remote Procedure Call Runtime Remote Code Execution Vulnerability	No	No	8.8	No
CVE-2021-1664	Remote Procedure Call Runtime Remote Code Execution Vulnerability	No	No	8.8	No
CVE-2021-1671	Remote Procedure Call Runtime Remote Code Execution Vulnerability	No	No	8.8	No
CVE-2021-1700	Remote Procedure Call Runtime Remote Code Execution Vulnerability	No	No	8.8	No
CVE-2021-1701	Remote Procedure Call Runtime Remote Code Execution Vulnerability	No	No	8.8	No
CVE-2021-1678	NTLM Security Feature Bypass Vulnerability	No	No	4.3	No
CVE-2021-1668	Microsoft DTV-DVD Video Decoder Remote Code Execution Vulnerability	No	No	7.8	No
CVE-2021-1665	GDI+ Remote Code Execution Vulnerability	No	No	7.8	No
CVE-2021-1649	Active Template Library Elevation of Privilege Vulnerability	No	No	7.8	No

Summary Graphs

Note: Graph data is reflective of data presented by Microsoft’s CVRF at the time of writing.

Unlocking LUKS2 volumes with TPM2, FIDO2, PKCS#11 Security Hardware on systemd 248

2021-01-13

Post Syndicated from original http://0pointer.net/blog/unlocking-luks2-volumes-with-tpm2-fido2-pkcs11-security-hardware-on-systemd-248.html

Introducing message archiving and analytics for Amazon SNS

2021-01-12 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/introducing-message-archiving-and-analytics-for-amazon-sns/

This blog post is courtesy of Sebastian Caceres (AWS Consultant, DevOps), Otavio Ferreira (Sr. Manager, Amazon SNS), Prachi Sharma and Mary Gao (Software Engineers, Amazon SNS).

Today, we are announcing the release of a message delivery protocol for Amazon SNS based on Amazon Kinesis Data Firehose. This is a new way to integrate SNS with storage and analytics services, without writing custom code.

SNS provides topics for push-based, many-to-many pub/sub messaging to help you decouple distributed systems, microservices, and event-driven serverless applications. As applications grow, so does the need to archive messages to meet compliance goals. These archives can also provide important operational and business insights.

Previously, custom code was required to create data pipelines, using general-purpose SNS subscription endpoints, such as Amazon SQS queues or AWS Lambda functions. You had to manage data transformation, data buffering, data compression, and the upload to data stores.

Overview

With the new native integration between SNS and Kinesis Data Firehose, you can send messages to storage and analytics services, using a purpose-built SNS subscription type.

Once you configure a subscription, messages published to the SNS topic are sent to the subscribed Kinesis Data Firehose delivery stream. The messages are then delivered to the destination endpoint configured in the delivery stream, which can be an Amazon S3 bucket, an Amazon Redshift table, or an Amazon Elasticsearch Service index.

You can also use a third-party service provider as the destination of a delivery stream, including Datadog, New Relic, MongoDB, and Splunk. No custom code is required to bridge the services. For more information, see Fanout to Kinesis Data Firehose streams, in the SNS Developer Guide.

The new Kinesis Data Firehose subscription type and its destinations are part of the application-to-application (A2A) messaging offering of SNS. The addition of this subscription type expands the SNS A2A offering to include the following use cases:

Run analytics on SNS messages, using Amazon Kinesis Data Analytics, Amazon Elasticsearch Service, or Amazon Redshift as a delivery stream destination. You can use this option to gain insights and detect anomalies in workloads.
Index and search SNS messages, using Amazon Elasticsearch Service as a delivery stream destination. From there, you can create dashboards using Kibana, a data visualization and exploration tool.
Store SNS messages for backup and auditing purposes, using S3 as a destination of choice. You can then use Amazon Athena to query the S3 bucket for analytics purposes.
Apply transformation to SNS messages. For example, you may obfuscate personally identifiable information (PII) or protected health information (PHI) using a Lambda function invoked by the delivery stream.
Feed SNS messages into cloud-based application monitoring and observability tools, using Datadog, New Relic, or Splunk as a destination. You can choose this option to enrich DevOps or marketing workflows.

As with all supported message delivery protocols, you can filter, monitor, and encrypt messages.

To simplify architecture and further avoid custom code, you can use an SNS subscription filter policy. This enables you to route only the relevant subset of SNS messages to the Kinesis Data Firehose delivery stream. For more information, see SNS message filtering.

To monitor the throughput, you can check the NumberOfMessagesPublished and the NumberOfNotificationsDelivered metrics for SNS, and the IncomingBytes, IncomingRecords, DeliveryToS3.Records and DeliveryToS3.Success metrics for Kinesis Data Firehose. For additional information, see Monitoring SNS topics using CloudWatch and Monitoring Kinesis Data Firehose using CloudWatch.

For security purposes, you can choose to have data encrypted at rest, using server-side encryption (SSE), in addition to encrypted in transit using HTTPS. For more information, see SNS SSE, Kinesis Data Firehose SSE, and S3 SSE.

Applying SNS message archiving and analytics in a use case

For example, consider an airline ticketing platform that operates in a regulated environment. The compliance framework requires that the company archives all ticket sales for at least 5 years.

The platform is based on an event-driven serverless architecture. It has a ticket seller Lambda function that publishes an event to an SNS topic for every ticket sold. The SNS topic fans out the event to subscribed systems that are interested in processing this type of event. In the preceding diagram, two systems are interested: one focused on payment processing, and another on fraud control. Each subscribed system is invoked by an SQS queue and an event processing Lambda function.

To meet the compliance goal on data retention, the airline company subscribes a Kinesis Data Firehose delivery stream to their existing SNS topic. They use an S3 bucket as the stream destination. After this, all events published to the SNS topic are archived in the S3 bucket.

The company can then use Athena to query the S3 bucket with standard SQL to run analytics and gain insights on ticket sales. For example, they can query for the most popular flight destinations or the most frequent flyers.

Subscribing a Kinesis Data Firehose stream to an SNS topic

You can set up a Kinesis Data Firehose subscription to an SNS topic using the AWS Management Console, the AWS CLI, or the AWS SDKs. You can also use AWS CloudFormation to automate the provisioning of these resources.

We use CloudFormation for this example. The provided CloudFormation template creates the following resources:

An SNS topic
An S3 bucket
A Kinesis Data Firehose delivery stream
A Kinesis Data Firehose subscription in SNS
Two SQS subscriptions in SNS
Two IAM roles with access to deliver messages:
- From SNS to Kinesis Data Firehose
- From Kinesis Data Firehose to S3

To provision the infrastructure, use the following template:

---
AWSTemplateFormatVersion: '2010-09-09'
Description: Template for creating an SNS archiving use case
Resources:
  ticketUploadStream:
    DependsOn:
    - ticketUploadStreamRolePolicy
    Type: AWS::KinesisFirehose::DeliveryStream
    Properties:
      S3DestinationConfiguration:
        BucketARN: !Sub 'arn:${AWS::Partition}:s3:::${ticketArchiveBucket}'
        BufferingHints:
          IntervalInSeconds: 60
          SizeInMBs: 1
        CompressionFormat: UNCOMPRESSED
        RoleARN: !GetAtt ticketUploadStreamRole.Arn
  ticketArchiveBucket:
    Type: AWS::S3::Bucket
  ticketTopic:
    Type: AWS::SNS::Topic
  ticketPaymentQueue:
    Type: AWS::SQS::Queue
  ticketFraudQueue:
    Type: AWS::SQS::Queue
  ticketQueuePolicy:
    Type: AWS::SQS::QueuePolicy
    Properties:
      PolicyDocument:
        Statement:
          Effect: Allow
          Principal:
            Service: sns.amazonaws.com
          Action:
            - sqs:SendMessage
          Resource: '*'
          Condition:
            ArnEquals:
              aws:SourceArn: !Ref ticketTopic
      Queues:
        - !Ref ticketPaymentQueue
        - !Ref ticketFraudQueue
  ticketUploadStreamSubscription:
    Type: AWS::SNS::Subscription
    Properties:
      TopicArn: !Ref ticketTopic
      Endpoint: !GetAtt ticketUploadStream.Arn
      Protocol: firehose
      SubscriptionRoleArn: !GetAtt ticketUploadStreamSubscriptionRole.Arn
  ticketPaymentQueueSubscription:
    Type: AWS::SNS::Subscription
    Properties:
      TopicArn: !Ref ticketTopic
      Endpoint: !GetAtt ticketPaymentQueue.Arn
      Protocol: sqs
  ticketFraudQueueSubscription:
    Type: AWS::SNS::Subscription
    Properties:
      TopicArn: !Ref ticketTopic
      Endpoint: !GetAtt ticketFraudQueue.Arn
      Protocol: sqs
  ticketUploadStreamRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Sid: ''
          Effect: Allow
          Principal:
            Service: firehose.amazonaws.com
          Action: sts:AssumeRole
  ticketUploadStreamRolePolicy:
    Type: AWS::IAM::Policy
    Properties:
      PolicyName: FirehoseticketUploadStreamRolePolicy
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Action:
          - s3:AbortMultipartUpload
          - s3:GetBucketLocation
          - s3:GetObject
          - s3:ListBucket
          - s3:ListBucketMultipartUploads
          - s3:PutObject
          Resource:
          - !Sub 'arn:aws:s3:::${ticketArchiveBucket}'
          - !Sub 'arn:aws:s3:::${ticketArchiveBucket}/*'
      Roles:
      - !Ref ticketUploadStreamRole
  ticketUploadStreamSubscriptionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            Service:
            - sns.amazonaws.com
          Action:
          - sts:AssumeRole
      Policies:
      - PolicyName: SNSKinesisFirehoseAccessPolicy
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Action:
            - firehose:DescribeDeliveryStream
            - firehose:ListDeliveryStreams
            - firehose:ListTagsForDeliveryStream
            - firehose:PutRecord
            - firehose:PutRecordBatch
            Effect: Allow
            Resource:
            - !GetAtt ticketUploadStream.Arn

To test, publish a message to the SNS topic. After the delivery stream buffer interval of 60 seconds, the message appears in the destination S3 bucket. For information on message formats, see Amazon SNS message formats in Amazon Kinesis Data Firehose destinations.

Cleaning up

After testing, avoid incurring usage charges by deleting the resources you created during the walkthrough. If you used the CloudFormation template, delete all the objects from the S3 bucket before deleting the stack.

Conclusion

In this post, we show how SNS delivery to Kinesis Data Firehose enables you to integrate SNS with storage and analytics services. The example shows how to create an SNS subscription to use a Kinesis Data Firehose delivery stream to store SNS messages in an S3 bucket.

You can adapt this configuration for your needs for storage, encryption, data transformation, and data pipeline architecture. For more information, see Fanout to Kinesis Data Firehose streams in the SNS Developer Guide.

For details on pricing, see SNS pricing and Kinesis Data Firehose pricing. For more serverless learning resources, visit Serverless Land.

A set of stable kernels

2021-01-12

Post Syndicated from original https://lwn.net/Articles/842410/rss

Stable kernels 5.10.7, 5.4.89, 4.19.167, 4.14.215, 4.9.251, and 4.4.251 have been released. They all contain
important fixes and users should upgrade.

Securing access to EMR clusters using AWS Systems Manager

2021-01-12 Sai Sriparasa

Post Syndicated from Sai Sriparasa original https://aws.amazon.com/blogs/big-data/securing-access-to-emr-clusters-using-aws-systems-manager/

Organizations need to secure infrastructure when enabling access to engineers to build applications. Opening SSH inbound ports on instances to enable engineer access introduces the risk of a malicious entity running unauthorized commands. Using a Bastion host or jump server is a common approach used to allow engineer access to Amazon EMR cluster instances by enabling SSH inbound ports. In this post, we present a more secure way to access your EMR cluster launched in a private subnet that eliminates the need to open inbound ports or use a Bastion host.

We strive to answer the following three questions in this post:

Why use AWS Systems Manager Session Manager with Amazon EMR?
Who can use Session Manager?
How can Session Manager be configured on Amazon EMR?

After answering these questions, we will walk you through configuring Amazon EMR with Session Manager and creating an AWS Identity and Access Management (IAM) policy to enable Session Manager capabilities on Amazon EMR. We also walk you through the steps required to configure secure tunneling to access Hadoop application web interfaces such as YARN Resource Manager and, Spark Job Server.

Creating an IAM role

AWS Systems Manager provides a unified user interface so you can view and manage your Amazon Elastic Compute Cloud (Amazon EC2) instances. Session Manager provides secure and auditable instance management. Systems Manager integration with IAM provides centralized access control to your EMR cluster. By default, Systems Manager doesn’t have permissions to perform actions on cluster instances. You must grant access by attaching an IAM role on the instance. Before you get started, create an IAM service role for cluster EC2 instances with the least privilege access policy.

Create an IAM service role (Amazon EMR role for Amazon EC2) for cluster EC2 instances and attach the AWS managed Systems Manager core instance (AmazonSSMManagedInstanceCore) policy.

Create an IAM policy with least privilege to allow the principal to initiate a Session Manager session on Amazon EMR cluster instances:

{
"Version": "2012-10-17",
    	"Statement": [
		{
            	"Effect": "Allow",
           "Action": [
                     "ssm:DescribeInstanceProperties",
                     "ssm:DescribeSessions",
    	             "ec2:describeInstances",
     	             "ssm:GetConnectionStatus"
            	],
           "Resource": "*"
        	},
        	{
            	"Effect": "Allow",
            	"Action": [
                		"ssm:StartSession"
            	],
            	"Resource": ["arn:aws:ec2:${Region}:${Account-Id}:instance/*"],
                "Condition": {
                		"StringEquals": { "ssm:resourceTag/ClusterType": [ "QACluster" ] }
            }
        }
    ]
}

Attach the least privilege policy to the IAM principal (role or user).

How Amazon EMR works with AWS Systems Manager Agent

You can install and configure AWS Systems Manager Agent (SSM Agent) on Amazon EMR cluster node(s) using bootstrap actions. SSM Agent makes it possible for Session Manager to update, manage and configure these resources. Session Manager is available at no additional cost to manage Amazon EC2 instances, for cost on additional features refer Systems Manager pricing page. The agent processes requests from the Session Manager service in the AWS Cloud, and then runs them as specified in the user request. You can achieve dynamic port forwarding by installing the Systems Manager plug-in on a local computer. IAM policies provide centralized access control on the EMR cluster.

The following diagram illustrates a high-level integration of AWS Systems Manager interaction with an EMR cluster.

Configuring SSM Agent on an EMR Cluster:

To configure SSM Agent on your cluster, complete the following steps:

While launching the EMR cluster, in the Bootstrap Actions section, choose add bootstrap action.
Choose “Custom action”.
Add a bootstrap action to run the following script from Amazon Simple Storage Service (Amazon S3) to install and configure SSM Agent on Amazon EMR cluster instances.

SSM Agent expects localhost entry in the hosts file to allow traffic redirection from a local computer to the EMR cluster instance when dynamic port forwarding is used.

#!/bin/bash
## Name: SSM Agent Installer Script
## Description: Installs SSM Agent on EMR cluster EC2 instances and update hosts file
##
sudo yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm
sudo status amazon-ssm-agent >>/tmp/ssm-status.log
## Update hosts file
echo "\n ########### localhost mapping check ########### \n" > /tmp/localhost.log
lhost=`sudo cat /etc/hosts | grep localhost | grep '127.0.0.1' | grep -v '^#'`
v_ipaddr=`hostname --ip-address`
lhostmapping=`sudo cat /etc/hosts | grep $v_ipaddr | grep -v '^#'`
if [ -z "${lhostmapping}" ];
then
echo "\n ########### IP address to localhost mapping NOT defined in hosts files. add now ########### \n " >> /tmp/localhost.log
sudo echo "${v_ipaddr} localhost" >>/etc/hosts
else
echo "\n IP address to localhost mapping already defined in hosts file \n" >> /tmp/localhost.log
fi
echo "\n ########### IP Address to localhost mapping check complete and below is the content ########### " >> /tmp/localhost.log
sudo cat /etc/hosts >> /tmp/localhost.log

echo "\n ########### Exit script ########### " >> /tmp/localhost.log

In the Security Options section, under Permissions, select Custom.
For EMR role, choose IAM role you created.

For EMR role, choose IAM role you created.

After the cluster successfully launches, on the Session Manager console, choose Managed Instances.
Select your cluster instance
On the Actions menu, choose Start Session

On the Actions menu, choose Start Session.

Dynamic port forwarding to access Hadoop applications web UIs

To gain access to Hadoop applications web UIs such as YARN Resource Manager, Spark Job Server, and more on the Amazon EMR primary node, you create a secure tunnel between your computer and the Amazon EMR primary node using Session Manager. By doing so, you avoid needing to create and manage a SOCKS proxy and other add-ons such as FoxyProxy etc.

Before configuring port forwarding on your laptop, you must install the System Manager CLI extension (version 1.1.26.0 or more recent).

When the prerequisites are met, you use the StartPortForwardingSession feature to create secure tunneling onto EMR cluster instances.

aws ssm start-session --target "Your Instance ID" --document-name AWS-StartPortForwardingSession --parameters "portNumber"=["8080"],"localPortNumber"=["8158"]

The following code demonstrates port forwarding from your laptop local port [8158] to a remote port [8080] on an EMR instance to access the Hadoop Resource Manager web UI:

aws ssm start-session --target i-05a3f37cfc08ed176 --document-name AWS-StartPortForwardingSession --parameters '{"portNumber":["8080"], "localPortNumber":["8158"]}'

Restricting IAM principal access based on Instance Tags

In a multi-tenant Amazon EMR cluster environment, you can restrict access to Amazon EMR cluster instances based on specific Amazon EC2 tags. In the following example code, the IAM principal (IAM user or role) is allowed to start a session on any instance (Resource: arn:aws:ec2:*:*:instance/*) with the condition that the instance is a QACluster (ssm:resourceTag/ClusterType: QACluster).

{
    "Version": "2012-10-17",
    "Statement": [
    		{
            	"Effect": "Allow",
            	"Action": [
                      "ssm:DescribeInstanceProperties",
 	     	          "ssm:DescribeSessions",
                       "ec2:describeInstances",
                       "ssm:GetConnectionStatus"
            	],
            	"Resource": "*"
        	},
        	{
            	"Effect": "Allow",
            	"Action": [ "ssm:StartSession" ],
            	"Resource": [ "arn:aws:ec2:${Region}:${Account-Id}:instance/*" ],
            	"Condition": {
                		"StringEquals": { "aws:username": "${aws:username}"
                },
                		"StringLike": {
                    		"ssm:resourceTag/ClusterType": [ "QACluster" ]
                }
            }
        }
    ]

}

If the IAM principal initiates a session to an instance that isn’t tagged or that has any tag other than ClusterType: QACluster, the execution results show is not authorized to perform ssm:StartSession.

Restricting access to root-level commands on instance

You can change the default user login behavior to restrict elevated permissions (root login) on a given user’s session. By default, sessions are launched using the credentials of a system-generated ssm-user. You can instead launch sessions using credentials of an operating system account by tagging an IAM user or role with the tag key SSMSessionRunAs or specify an operating system user name. Updates to Session Manager preferences enables this support.

The following screenshots show a configuration for the IAM user appdev2, who is always allowed to start a session with ec2-user instead of the default ssm-user.

The following screenshots show a configuration for the IAM user appdev2, who is always allowed to start a session with ec2-user instead of the default ssm-user

Conclusion

Amazon EMR with Session Manager can greatly improve your confidence in security and audit posture by centralizing access control and mitigating risk of managing access keys and inbound ports. It also reduces the overall cost, because as you get free from intermediate Bastion hosts.

About the Authors

Sai Sriparasa is a Sr. Big Data & Security Consultant with AWS Professional Services. He works with our customers to provide strategic and tactical big data solutions with an emphasis on automation, operations, governance & security on AWS. In his spare time, he follows sports and current affairs.

Ravi Kadiri is a security data architect at AWS, focused on helping customers build secure data lake solutions using native AWS security services. He enjoys using his experience as a Big Data architect to provide guidance and technical expertise on Big Data & Analytics space. His interests include staying fit, traveling and spend time with friends & family.

Update on SolarWinds Supply-Chain Attack: SUNSPOT and New Malware Family Associations

2021-01-12 boB Rudis

Post Syndicated from boB Rudis original https://blog.rapid7.com/2021/01/12/update-on-solarwinds-supply-chain-attack-sunspot-and-new-malware-family-associations/

Update on SolarWinds Supply-Chain Attack: SUNSPOT and New Malware Family Associations

This update is a continuation of our previous coverage of the SolarWinds supply-chain attack that was discovered by FireEye in December 2020. As of Jan. 11, 2021, new research has been published that expands the security community’s understanding of the breadth and depth of the SolarWinds attack.

Two recent developments warrant your attention:

New in-depth research from CrowdStrike provides technical analysis of the malware—dubbed "SUNSPOT" (the industry is going to run out of stellar-themed names at this rate)—that was used to insert the SUNBURST backdoor into SolarWinds Orion software builds.
New technical analysis from researchers at Kaspersky discusses their discovery of feature overlap between the SUNBURST malware code and the Kazuar backdoor.

The SUNSPOT build implant

On Monday, Jan. 11, 2021, CrowdStrike’s intelligence team published technical analysis on SUNSPOT, a newly identified type of malware that appears to have been used as part of the SolarWinds supply chain attack. CrowdStrike describes SUNSPOT as “a malicious tool that was deployed into the build environment to inject [the SUNBURST] backdoor into the SolarWinds Orion platform.”

While SUNSPOT infection is part of the attack chain that allows for SUNBURST backdoor compromise, SUNSPOT has distinct host indicators of attack (including executables and related files), artifacts, and TTPs (tactics, techniques, and procedures).

CrowdStrike provides a thorough breakdown of how SUNSPOT operates, including numerous indicators of compromise. Here are the critical highlights:

SUNSPOT’s on-disk executable is named taskhostsvc.exe and has an initial, likely build date of Feb. 20, 2020. It maintains persistence through a scheduled task that executes on boot and has the SeDebugPrivilege grant, which is what enables it to read the memory of other processes.

It uses this privilege to watch for MsBuild.exe (a Visual Studio development component) execution and modifies the target source code before the compiler has a chance to read it. SUNSPOT then looks for a specific Orion software source code component and replaces it with one that will inject SUNBURST during the build process. SUNSPOT also has validation checks to ensure no build errors are triggered during the build process, which helps it escape developer and other detection.

The last half of the CrowdStrike analysis has details on tactics, techniques, and procedures, along with host indicators of attack, ATT&CK framework mappings, and YARA rules specific to SUNSPOT. Relevant indicators have been incorporated into Rapid7’s SIEM, InsightIDR, and Managed Detection and Response instances and workflows.

SolarWinds has updated their blog with a reference to this new information on SUNSPOT. Because SUNSPOT, SUNBURST, and related tooling have not been definitively mapped to a known adversary, CrowdStrike has christened the actors responsible for these intrusions “StellarParticle.”

SUNBURST’s Kazuar lineage

Separately, Kaspersky Labs also published technical analysis on Monday, Jan. 11, 2020 that builds a case for a connection between the SUNBURST backdoor and another backdoor called Kazuar. Kazuar, which Palo Alto Networks’ Unit42 team first described in May of 2017 as a “multiplatform espionage backdoor with API access,” is a .NET backdoor that Kaspersky says appears to share several “unusual features” with SUNBURST. (Palo Alto linked Kazuar to the Turla APT group back in 2017, which Kaspersky says their own observations support, too.)

Shared features Kaspersky has identified so far include the use of FNV-1a hashing throughout Kazua and SUNBURST code, a similar algorithm used to generate unique victim identifiers, and customized (thought not exactly the same) implementations of a sleeping algorithm that delays between connections to a C2 server and makes network activity less obvious. Kaspersky has a full, extremely detailed list of similar and different features across both backdoors in their post.

Kaspersky does not definitively state that the two backdoors are the work of the same actor. Instead, they offer five possible explanations for the similarities they’ve identified between Kazuar and SUNBURST. The potential explanations below have been taken directly from their post:

Sunburst was developed by the same group as Kazuar.
The Sunburst developers adopted some ideas or code from Kazuar, without having a direct connection (they used Kazuar as an inspiration point).
Both groups, DarkHalo/UNC2452 and the group using Kazuar, obtained their malware from the same source.
Some of the Kazuar developers moved to another team, taking knowledge and tools with them.
The Sunburst developers introduced these subtle links as a form of false flag, in order to shift blame to another group.

As Kaspersky notes, the knowledge of a potential lineage connection to Kazaur changes little for defenders, but is worth keeping an eye on, as a confirmed connection may help those in more highly targeted sectors use previous Kazuar detection and prevention methods to enhance their response to the SolarWinds compromise.

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Best practices and advanced patterns for Lambda code signing

2021-01-12 Cassia Martin

Post Syndicated from Cassia Martin original https://aws.amazon.com/blogs/security/best-practices-and-advanced-patterns-for-lambda-code-signing/

Amazon Web Services (AWS) recently released Code Signing for AWS Lambda. By using this feature, you can help enforce the integrity of your code artifacts and make sure that only trusted developers can deploy code to your AWS Lambda functions. Today, let’s review a basic use case along with best practices for lambda code signing. Then, let’s dive deep and talk about two advanced patterns—one for centralized signing and one for cross account layer validation. You can use these advanced patterns to use code signing in a distributed ownership model, where you have separate groups for developers writing code and for groups responsible for enforcing specific signing profiles or for publishing layers.

Secure software development lifecycle

For context of what this capability gives you, let’s look at the secure software development lifecycle (SDLC). You need different kinds of security controls for each of your development phases. An overview of the secure SDLC development stages—code, build, test, deploy, and monitor—, along with applicable security controls, can be found in Figure 1. You can use code signing for Lambda to protect the deployment stage and give a cryptographically strong hash verification.

Figure 1: Code signing provides hash verification in the deployment phase of a secure SDLC

Adding Security into DevOps and Implementing DevSecOps Using AWS CodePipeline provide additional information on building a secure SDLC, with a particular focus on the code analysis controls.

Basic pattern:

Figure 2 shows the basic pattern described in Code signing for AWS Lambda and in the documentation. The basic code signing pattern uses AWS Signer on a ZIP file and calls a create API to install the signed artifact in Lambda.

Figure 2: The basic code signing pattern

The basic pattern illustrated in Figure 2 is as follows:

An administrator creates a signing profile in AWS Signer. A signing profile is analogous to a code signing certificate and represents a publisher identity. Administrators can provide access via AWS Identity and Access Management (IAM) for developers to use the signing profile to sign their artifacts.
Administrators create a code signing configuration (CSC)—a new resource in Lambda that specifies the signing profiles that are allowed to sign code and the signature validation policy that defines whether to warn or reject deployments that fail the signature checks. CSC can be attached to existing or new Lambda functions to enable signature validations on deployment.
Developers use one of the allowed signing profiles to sign the deployment artifact—a ZIP file—in AWS Signer.
Developers deploy the signed deployment artifact to a function using either the CreateFunction API or the UpdateFunctionCode API.

Lambda performs signature checks before accepting the deployment. The deployment fails if the signature checks fail and you have set the signature validation policy in the CSC to reject deployments using ENFORCE mode.

Code signing checks

Code signing for Lambda provides four signature checks. First, the integrity check confirms that the deployment artifact hasn’t been modified after it was signed using AWS Signer. Lambda performs this check by matching the hash of the artifact with the hash from the signature. The second check is the source mismatch check, which detects if a signature isn’t present or if the artifact is signed by a signing profile that isn’t specified in the CSC. The third, expiry check, will fail if a signature is past its point of expiration. The fourth is the revocation check, which is used to see if anyone has explicitly marked the signing profile used for signing or the signing job as invalid by revoking it.

The integrity check must succeed or Lambda will not run the artifact. The other three checks can be configured to either block invocation or generate a warning. These checks are performed in order until one check fails or all checks succeed. As a security leader concerned about the security of code deployments, you can use the Lambda code signing checks to satisfy different security assurances:

Integrity – Provides assurance that code has not been tampered with, by ensuring that the signature on the build artifact is cryptographically valid.
Source mismatch – Provides assurance that only trusted entities or developers can deploy code.
Expiry – Provides assurance that code running in your environment is not stale, by making sure that signatures were created within a certain date and time.
Revocation – Allows security administrators to remove trust by invalidating signatures after the fact so that they cannot be used for code deployment if they have been exposed or are otherwise no longer trusted.

The last three checks are enforced only if you have set the signature validation policy—UntrustedArtifactOnDeployment parameter—in the CSC to ENFORCE. If the policy is set to WARN, then failures in any of the mismatch, expiry, and revocation checks will log a metric called a signature validation error in Amazon CloudWatch. The best practice for this setting is to initially set the policy to WARN. Then, you can monitor the warnings, if any, and update the policy to enforce when you’re confident in the findings in CloudWatch.

Centralized signing enforcement

In this scenario, you have a security administrators team that centrally manages and approves signing profiles. The team centralizes signing profiles in order to enforce that all code running on Lambda is authored by a trusted developer and isn’t tampered with after it’s signed. To do this, the security administrators team wants to enforce that developers—in the same account—can only create Lambda functions with signing profiles that the team has approved. By owning the signing profiles used by developer teams, the security team controls the lifecycle of the signatures and the ability to revoke the signatures. Here are instructions for creating a signing profile and CSC, and then enforcing their use.

Create a signing profile

To create a signing profile, you’ll use the AWS Command Line Interface (AWS CLI). Start by logging in to your account as the central security role. This is an administrative role that is scoped with permissions needed for setting up code signing. You’ll create a signing profile to use for an application named ABC. These example commands are written with prepopulated values for things like profile names, IDs, and descriptions. Change those as appropriate for your application.

To create a signing profile

Run this command:
```
aws signer put-signing-profile --platform-id "AWSLambda-SHA384-ECDSA" --profile-name profile_for_application_ABC
```
Running this command will give you a signing profile version ARN. It will look something like arn:aws:signer:sa-east-1:XXXXXXXXXXXX:/signing-profiles/profile_for_application_ABC/XXXXXXXXXX. Make a note of this value to use in later commands.

As the security administrator, you must grant the developers access to use the profile for signing. You do that by using the add-profile-permission command. Note that in this example, you are explicitly only granting permission for the signer:StartSigningJob action. You might want to grant permissions to other actions, such as signer:GetSigningProfile or signer:RevokeSignature, by making additional calls to add-profile-permission.

Run this command, replacing <role-name> with the principal you’re using:

aws signer add-profile-permission \
--profile-name profile_for_application_ABC \
--action signer:StartSigningJob \
--principal <role-name> \
--statement-id testStatementId

Create a CSC

You also want to make a CSCwith the signing profile that you, as the security administrator, want all your developers to use.

To create a CSC

Run this command, replacing <signing-profile-version-arn> with the output from Step 1 of the preceding procedure—Create a signing profile:

aws lambda create-code-signing-config \
--description "Application ABC CSC" \
--allowed-publishers SigningProfileVersionArns=<signing-profile-version-arn> \
--code-signing-policies "UntrustedArtifactOnDeployment"="Enforce"

Running this command will give you a CSCARN that will look something like arn:aws:lambda:sa-east-1:XXXXXXXXXXXX:code-signing-config:approved-csc-XXXXXXXXXXXXXXXXX. Make a note of this value to use later.

Write an IAM policy using the new CSC

Now that the security administrators team has created this CSC, how do they ensure that all the developers use it? Administrators can use IAM to grant access to the CreateFunction API, while using the new lambda:CodeSigningConfig condition key with the CSC ARN you created. This will ensure that developers can create functions only if code signing is enabled.

This IAM policy will allow the developer roles to create Lambda functions, but only when they are using the approved CSC. The additional clauses Deny the developers from creating their own signing profiles or CSCs, so that they are forced to use the ones provided by the central team.

To write an IAM policy

Run the following command. Replace <code-signing-config-arn> with the CSC ARN you created previously.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "lambda:CreateFunction",
        "lambda:PutFunctionCodeSigningConfig"
      ],
      "Resource": "*",
      "Condition": {
        "ForAnyValue:StringEquals": {
          "lambda:CodeSigningConfig": ["<code-signing-config-arn>"]
          }
         }        
        },
       {
         "Effect": "Deny", 
         "Action": [
        "signer:PutSigningProfile",
        "lambda:DeleteFunctionCodeSigningConfig",
        "lambda:UpdateCodeSigningConfig",
        "lambda:DeleteCodeSigningConfig",
        "lambda:CreateCodeSigningConfig"
      ],
         "Resource": "*"
       }
  ]
}

Create a signed Lambda function

Now, the developers have permission to create new Lambda functions, but only if the functions are configured with the approved CSC. The approved CSC can specify the settings for Lambda signing policies, and lists exactly what profiles are approved for signing the function code with. This means that developers in that account will only be able to create functions if the functions are signed with a profile approved by the central team and the developer permissions have been added to the signing profile used.

To create a signed Lambda function

Upload any Lambda code file to an Amazon Simple Storage Service (Amazon S3) bucket with the name main-function.zip. Note that your S3 bucket must be version enabled.

Sign the zipped Lambda function using AWS Signer and the following command, replacing <lambda-bucket> and <version-string> with the correct details from your uploaded main-function.zip.

aws signer start-signing-job \ 
--source 's3={bucketName=<lambda-bucket>, version=<version-string>, key=main-function.zip}' \
--destination 's3={bucketName=<lambda-bucket>, prefix=signed-}' \
--profile-name profile_for_application_ABC

Download the newly created ZIP file from your Lambda bucket. It will be called something like signed-XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX.zip.
For convenience, rename it to signed-main-function.zip.

Run the following command, replacing <lambda-role> with the ARN of your Lambda execution role, and replacing <code-signing-config-arn> with the result of the earlier procedure Create a CSC.

aws lambda create-function \
    --function-name "signed-main-function" \
    --runtime "python3.8" \
    --role <lambda-role> \
    --zip-file "fileb://signed-main-function.zip" \
    --handler lambda_function.lambda_handler \ 
    --code-signing-config-arn <code-signing-config-arn>

Cross-account centralization

This pattern supports the use case where the security administrators and the developers are working in the same account. You might want to implement this across different accounts, which requires creating CSCs in specific accounts where developers need to deploy and update Lambda functions. To do this, you can use AWS CloudFormation StackSets to deploy CSCs. Stack sets allow you to roll out CloudFormation stacks across multiple AWS accounts. Use AWS CloudFormation StackSets for Multiple Accounts in an AWS Organization illustrates how to use an AWS CloudFormation template for deployment to multiple accounts.

The security administrators can detect and react to any changes to the stack set deployed CSCs by using drift detection. Drift detection is an AWS CloudFormation feature that detects unmanaged changes to the resources deployed using StackSets. To complete the solution, Implement automatic drift remediation for AWS CloudFormation using Amazon CloudWatch and AWS Lambda shares a solution for taking automated remediation when drift is detected in a CloudFormation stack.

Cross-account validation for Lambda layers

So far, you have the tools to sign your own Lambda code so that no one can tamper with it, and you’ve reviewed a pattern where one team creates and owns the signing profiles to be used by different developers. Let’s look at one more advanced pattern where you publish code as a signed Lambda layer in one account, and you then use it in a Lambda function in a separate account. A Lambda layer is an archive containing additional code that you can include in a function.

For this, let’s consider how to set up code signing when you’re using layers across two accounts. Layers allow you to use libraries in your function without needing to include them in your deployment package. It’s also possible to publish a layer in one account, and have a different account consume that layer. Let’s act as a publisher of a layer. In this use case, you want to use code signing so that consumers of your layer can have the security assurance that no one has tampered with the layer. Note that if you enable code signing to verify signatures on a layer, Lambda will also verify the signatures on the function code. Therefore, all of your deployment artifacts must be signed, using a profile listed in the CSC attached to the function.

Figure 3 illustrates the cross-account layer pattern, where you sign a layer in a publishing account and a function uses that layer in another consuming account.

Figure 3: This advanced pattern supports cross-account layers

Here are the steps to build this setup. You’ll be logging in to two different accounts, your publishing account and your consuming account.

Make a publisher signing profile

Running this command will give you a profile version ARN. Make a note of the value returned to use in a later step.

To make a publisher signing profile

In the AWS CLI, log in to your publishing account.

Run this command to make a signing profile for your publisher:

aws signer put-signing-profile --platform-id "AWSLambda-SHA384-ECDSA" --profile-name publisher_approved_profile1

Sign your layer code using signing profile

Next, you want to sign your layer code with this signing profile. For this example, use the blank layer code from this GitHub project. You can make your own layer by creating a ZIP file with all your code files included in a directory supported by your Lambda runtime. AWS Lambda layers has instructions for creating your own layer.

You can then sign your layer code using the signing profile.

To sign your layer code

Name your Lambda layer code file blank-python.zip and upload it to your S3 bucket.

Sign the zipped Lambda function using AWS Signer with the following command. Replace <lambda-bucket> and <version-string> with the details from your uploaded blank-python.zip.

aws signer start-signing-job \ 
--source 's3={bucketName=<lambda-bucket>, version=<version-string>, key=blank-python.zip}' \
--destination 's3={bucketName=<lambda-bucket>, prefix=signed-}' \
--profile-name publisher_approved_profile1

Publish your signed layer

Now publish the resulting, signed layer. Note that the layers themselves don’t have signature validation on deployment. However, the signatures will be checked when they’re added to a function.

To publish your signed layer

Download your new signed ZIP file from your S3 bucket, and rename it signed-layer.zip.

Run the following command to publish your layer:

aws lambda publish-layer-version \
--layer-name lambda_signing \
--zip-file "fileb://signed-layer.zip" \
--compatible-runtimes python3.8 python3.7

This command will return information about your newly published layer. Search for the LayerVersionArn and make a note of it for use later.

Grant read access

For the last step in the publisher account, you must grant read access to the layer using the add-layer-version-permission command. In the following command, you’re granting access to an individual account using the principal parameter.

(Optional) You could instead choose to grant access to all accounts in your organization by using “*” as the principal and adding the organization-id parameter.

To grant read access

Run the following command to grant read access to your layer, replacing <consuming-account-id> with the account ID of your second account:

aws lambda add-layer-version-permission \
--layer-name lambda_signing \
--version-number 1 \
--statement-id for-consuming-account \
--action lambda:GetLayerVersion \
--principal <consuming-account-id>

Create a CSC

It’s time to switch your AWS CLI to work with the consuming account. This consuming account can create a CSC for their Lambda functions that specifies what signing profiles are allowed.

To create a CSC

In the AWS CLI, log out from your publishing account and into your consuming account.
The consuming account will need a signing profile of its own to sign the main Lambda code. Run the following command to create one:
```
aws signer put-signing-profile --platform-id "AWSLambda-SHA384-ECDSA" --profile-name consumer_approved_profile1
```
Run the following command to create a CSC that allows code to be signed either by the publisher or the consumer. Replace <consumer-signing-profile-version-arn> with the profile version ARN you created in the preceding step. Replace <publisher-signing-profile-version-arn> with the signing profile from the Make a publisher signing profile procedure. Make a note of the CSC returned by this command to use in later steps.
```
aws lambda create-code-signing-config \
--description "Allow layers from publisher" \
--allowed-publishers SigningProfileVersionArns="<publisher-signing-profile-version-arn>,<consumer-signing-profile-version-arn>" \
--code-signing-policies "UntrustedArtifactOnDeployment"="Enforce"
```

Create a Lambda function using the CSC

When creating the function that uses the signed layer, you can pass in the CSC that you created. Lambda will check the signature on the function code in this step.

To create a Lambda function

Use your own lambda code function, or make a copy of blank-python.zip, and rename it consumer-main-function.zip.) Upload consumer-main-function.zip to a versioned S3 bucket in your consumer account.

Note: If the S3 bucket doesn’t have versioning enabled, the procedure will fail.

Sign the function with the signing profile of the consumer account. Replace <consumers-lambda-bucket> and <version-string> in the following command with the name of the S3 bucket you uploaded the consumer-main-function.zip to and the version.

aws signer start-signing-job \ 
--source 's3={bucketName=<consumers-lambda-bucket>, version=<version-string>, key=consumer-main-function.zip}' \
--destination 's3={bucketName=<consumers-lambda-bucket>, prefix=signed-}' \
--profile-name consumer_approved_profile1

Download your new file and rename it to signed-consumer-main-function.zip.

Run the following command to create a new Lambda function, replacing <lambda-role> with a valid Lambda execution role and <code-signing-config-arn> with the value returned from the previous procedure: Creating a CSC.

aws lambda create-function \
    --function-name "signed-consumer-main-function" \
    --runtime "python3.8" \
    --role <lambda-role> \
    --zip-file "fileb://signed-consumer-main-function.zip" \
    --handler lambda_function.lambda_handler \ 
    --code-signing-config <code-signing-config-arn>

Finally, add the signed layer from the publishing account into the configuration of that function. Run the following command, replacing <lamba-layer-arn> with the result from the preceding step Publish your signed layer.
```
aws lambda update-function-configuration \
--function-name "signed-consumer-main-function" \
--layers "<lambda-layer-arn>"   
```

Lambda will check the signature on the layer code in this step. If the signature of any deployed layer artifact is corrupt, the Lambda function stops you from attaching the layer and deploying your code. This is true regardless of the mode you choose—WARN or ENFORCE. If you have multiple layers to add to your function, you must sign all layers invoked in a Lambda function.

This capability allows layer publishers to share signed layers. A publisher can sign all layers using a specific signing profile and ask all the layer consumers to use that signing profile as one of the allowed profiles in their CSCs. When someone uses the layer, they can trust that the layer comes from that publisher and hasn’t been tampered with.

Conclusion

You’ve learned some best practices and patterns for using code signing for AWS Lambda. You know how code signing fits in the secure SDLC, and what value you get from each of the code signing checks. You also learned two patterns for using code signing for distributed ownership—one for centralized signing and one for cross account layer validation. No matter your role—as a developer, as a central security team, or as a layer publisher—you can use these tools to help enforce the integrity of code artifacts in your organization.

You can learn more about Lambda code signing in Configure code signing for AWS Lambda.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Lambda forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Google series on in-the-wild exploits

2021-01-12

Post Syndicated from original https://lwn.net/Articles/842395/rss

The Google Project Zero blog is carrying a
six-part series exploring, in great detail, a set of sophisticated
exploits discovered in the wild. “These exploit chains are designed
for efficiency & flexibility through their modularity. They are
well-engineered, complex code with a variety of novel exploitation methods,
mature logging, sophisticated and calculated post-exploitation techniques,
and high volumes of anti-analysis and targeting checks. We believe that
teams of experts have designed and developed these exploit chains. We hope
this blog post series provides others with an in-depth look at exploitation
from a real world, mature, and presumably well-resourced actor.”

Field Notes: Applying Machine Learning to Vegetation Management using Amazon SageMaker

2021-01-12 Sameer Goel

Post Syndicated from Sameer Goel original https://aws.amazon.com/blogs/architecture/field-notes-applying-machine-learning-to-vegetation-management-using-amazon-sagemaker/

This post was co-written by Soheil Moosavi, a data scientist consultant in Accenture Applied Intelligence (AAI) team, and Louis Lim, a manager in Accenture AWS Business Group.

Virtually every electric customer in the US and Canada has, at one time or another, experienced a sustained electric outage as a direct result of a tree and power line contact. According to the report from Federal Energy Regulatory Commission (FERC.gov), Electric utility companies actively work to mitigate these threats.

Vegetation Management (VM) programs represent one of the largest recurring maintenance expenses for electric utility companies in North America. Utilities and regulators generally agree that keeping trees and vegetation from conflicting with overhead conductors. It is a critical and expensive responsibility of all utility companies concerned about electric service reliability.

Vegetation management such as tree trimming and removal is essential for electricity providers to reduce unwanted outages and be rated with a low System Average Interruption Duration Index (SAIDI) score. Electricity providers are increasingly interested in identifying innovative practices and technologies to mitigate outages, streamline vegetation management activities, and maintain acceptable SAIDI scores. With the recent democratization of machine learning leveraging the power of cloud, utility companies are identifying unique ways to solve complex business problems on top of AWS. The Accenture AWS Business Group, a strategic collaboration by Accenture and AWS, helps customers accelerate their pace of innovation to deliver disruptive products and services. Learning how to machine learn helps enterprises innovate and disrupt unlocking business value.

In this blog post, you learn how Accenture and AWS collaborated to develop a machine learning solution for an electricity provider using Amazon SageMaker. The goal was to improve vegetation management and optimize program cost.

Overview of solution

VM is generally performed on a cyclical basis, prioritizing circuits solely based on the number of outages in the previous years. A more sophisticated approach is to use Light Detection and Ranging (LIDAR) and imagery from aircraft and low earth orbit (LEO) satellites with Machine Learning models, to determine where VM is needed. This provides the information for precise VM plans, but is more expensive due to cost to acquire the LIDAR and imagery data.

In this blog, we show how a machine learning (ML) solution can prioritize circuits based on the impacts of tree-related outages on the coming year’s SAIDI without using imagery data.

We demonstrate how to implement a solution that cross-references, cleans, and transforms time series data from multiple resources. This then creates features and models that predict the number of outages in the coming year, and sorts and prioritizes circuits based on their impact on the coming year’s SAIDI. We show how you use an interactive dashboard designed to browse circuits and the impact of performing VM on SAIDI reduction based on your budget.

Walkthrough

Source data is first transferred into an Amazon Simple Storage Service (Amazon S3) bucket from the client’s data center.
Next, AWS Glue Crawlers are used to crawl the data from the source bucket. Glue Jobs were used to cross-reference data files to create features for modeling and data for the dashboards.
We used Jupyter Notebooks on Amazon SageMaker to train and evaluate models. The best performing model was saved as a pickle file on Amazon S3 and Glue was used to add the predicted number of outages for each circuit to the data prepared for the dashboards.
Lastly, Operations users were granted access to Amazon QuickSight dashboards, sourced data from Athena, to browse the data and graphs, while VM users were additionally granted access to directly edit the data prepared for dashboards, such as the latest VM cost for each circuit.
We used Amazon QuickSight to create interactive dashboards for the VM team members to visualize analytics and predictions. These predictions are a list of circuits prioritized based on their impact on SAIDI in the coming year. The solution allows our team to analyze the data and experiment with different models in a rapid cycle.

Modeling

We were provided with 6 years worth of data across 127 circuits. Data included VM (VM work start and end date, number of trees trimmed and removed, costs), asset (pole count, height, and materials, wire count, length, and materials, and meter count and voltage), terrain (elevation, landcover, flooding frequency, wind erodibility, soil erodibility, slope, soil water absorption, and soil loss tolerance from GIS ESRI layers), and outages (outage coordinated, dates, duration, total customer minutes, total customers affected). In addition, we collected weather data from NOAA and DarkSky datasets, including wind, gust, precipitation, temperature.

Starting with 762 records (6 years * 127 circuits) and 226 features, we performed a series of data cleaning and feature engineering tasks including:

Dropped sparse, non-variant, and non-relevant features
Capped selected outliers based on features’ distributions and percentiles
Normalized imbalanced features
Imputed missing values
- Used “0” where missing value meant zero (for example, number of trees removed)
- Used 3650 (equivalent to 10 years) where missing values are days for VM work (for example, days since previous tree trimming job)
- Used average of values for each circuit when applicable, and average of values across all circuits for circuits with no existing values (for example, pole mean height)
Merged conceptually relevant features
Created new features such as ratios (for example, tree trim cost per trim) and combinations(for example, % of land cover for low and medium intensity areas combined)

After further dropping highly correlated features to remove multi-collinearity for our models, we were left with 72 features for model development. The following diagram shows a high-level overview data partitioning and number of outages prediction.

Our best performing model out of Gradient Boosting Trees, Random Forest, and Feed Forward Neural Networks was Elastic Net, with Mean Absolute Error of 6.02 when using a combination of only 10 features. Elastic Net is appropriate for smaller sample for this dataset, good at feature selection, likely to generalize on a new dataset, and consistently showed a lower error rate. Exponential expansion of features showed small improvements in predictions, but we kept the non-expanded version due to better interpretability.

When analyzing the model performance, predictions were more accurate for circuits with lower outage count, and models suffered from under-predicting when the number of outages was high. This is due to having few circuits with a high number of outages for the model to learn from.

The following chart below shows the importance of each feature used in the model. An error of 6.02 means on average we over or under predict six outages for each circuit.

Dashboard

We designed two types of interactive dashboards for the VM team to browse results and predictions. The first set of dashboards show historical or predicted outage counts for each circuit on a geospatial map. Users can further filter circuits based on criteria such as the number of days since VM, as shown in the following screenshot.

The second type of dashboard shows predicted post-VM SAIDI on the y-axis and VM cost on the x-axis. This dashboard is used by the client to determine the reduction in SAIDI based on available VM budget for the year and dispatch the VM crew accordingly. Clients can also upload a list of update VM cost for each circuit, and the graph will automatically readjust.

Conclusion

This solution for Vegetation management demonstrates how we used Amazon SageMaker to train and evaluate machine learning models. Using this solution an Electric Utility can save time and cost, and scale easily to include more circuits within a set VM budget. We demonstrated how a utility can leverage machine learning to predict unwanted outages and also maintain vegetation, without incurring the cost of high-resolution imagery.

Further, to improve these predictions we recommend:

A yearly collection of asset and terrain data (if data is only available for the most recent year, it is impossible for models to learn from each years’ changes),
Collection of VM data per month per location (if current data is collected only at the end of each VM cycle and only per circuit, monthly, and subcircuit modeling is impossible), and
Purchasing LiDAR imagery or tree inventory data to include features such as tree density, height, distance to wires, and more.

Accelerating Innovation with the Accenture AWS Business Group (AABG)

By working with the Accenture AWS Business Group (AABG), you can learn from the resources, technical expertise, and industry knowledge of two leading innovators, helping you accelerate the pace of innovation to deliver disruptive products and services. The AABG helps customers ideate and innovate cloud solutions with customers through rapid prototype development.

Connect with our team at [email protected] to learn and accelerate how to use machine learning in your products and services.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Soheil Moosavi

Louis Lim is a manager in Accenture AWS Business Group, his team focus on helping enterprises to explore the art of possible through rapid prototyping and cloud-native solution.

Cloudflare Radar’s 2020 Year In Review

2021-01-12 John Graham-Cumming

Post Syndicated from John Graham-Cumming original https://blog.cloudflare.com/cloudflare-radar-2020-year-in-review/

Cloudflare Radar's 2020 Year In Review

Throughout 2020, we tracked changing Internet trends as the SARS-Cov-2 pandemic forced us all to change the way we were living, working, exercising and learning. In early April, we created a dedicated website https://builtforthis.net/ that showed some of the ways in which Internet use had changed, suddenly, because of the crisis.

On that website, we showed how traffic patterns had changed; for example, where people accessed the Internet from, how usage had jumped up dramatically, and how Internet attacks continued unabated and ultimately increased.

Today we are launching a dedicated Year In Review page with interactive maps and charts you can use to explore what changed on the Internet in 2020. Year In Review is part of Cloudflare Radar. We launched Radar in September 2020 to give anyone access to Internet use and abuse trends that Cloudflare normally had reserved only for employees.

Where people accessed the Internet

To get a sense for the Year In Review, let’s zoom in on London (you can do the same with any city from a long list of locations that we’ve analyzed). Here’s a map showing the change in Internet use comparing April (post-lockdown) and February (pre-lockdown). This map compares working hours Internet use on a weekday between those two months.

As you can clearly see, with offices closed in central London (and elsewhere), Internet use dropped (the blue colour) while usage increased in largely residential areas. Looking out to the west of London, a blue area near Windsor shows how Internet usage dropped at London’s Heathrow airport and surrounding areas.

A similar story plays out slightly later in the San Francisco Bay Area.

But that trend reverses in July, with an increase in Internet use in many places that saw a rapid decrease in April.

When you select a city from the map, a second chart shows the overall trend in Internet use for the country in which that city is located. For example, here’s the chart for the United States. The Y-axis shows the percentage change in Internet traffic compared to the start of the year.

Internet use really took off in March (when the lockdowns began) and rapidly increased to 40% higher than the start of the year. And usage has pretty much stayed there for all of 2020: that’s the new normal.

Here’s what happened in France (when selecting Paris) on the map view.

Internet use was flat until the lockdowns began. At that point, it took off and grew close to 40% over the beginning of the year. But there’s a visible slow down during the summer months, with Internet use up “only” 20% over the start of the year. Usage picked up again at “la rentrée” in September, with a new normal of about 30% growth in 2020.

What people did on the Internet

Returning to London, we can zoom into what people did on the Internet as the lockdowns began. The UK government announced a lockdown on March 23. On that day, the mixture of Internet use looked like this:

A few days later, the E-commerce category had jumped from 12.9% to 15.1% as people shopped online for groceries, clothing, webcams, school supplies, and more. Travel dropped from 1.5% of traffic to 1.1% (a decline of 30%).

And then by early mid-April E-commerce had increased to 16.2% of traffic with Travel remaining low.

But not all the trends are pandemic-related. One question is: to what extent is Black Friday (November 27, 2020) an event outside the US? We can answer that by moving the London slider to late November and look at the change in E-commerce. Watch carefully as E-commerce traffic grows towards Black Friday and actually peaks at 21.8% of traffic on Saturday, November 28.

As Christmas approached, E-commerce dropped off, but another category became very important: Entertainment. Notice how it peaked on Christmas Eve, as Britons, no doubt, turned to entertainment online during a locked-down Christmas.

And Hacking 2020

Of course, a pandemic didn’t mean that hacking activity decreased. Throughout 2020 and across the world, hackers continued to run their tools to attack websites, overwhelm APIs, and try to exfiltrate data.

Explore More

To explore data for 2020, you can check out Cloudflare Radar’s Year In Review page. To go deep into any specific country with up-to-date data about current trends, start at Cloudflare Radar’s homepage.

Security updates for Tuesday

2021-01-12

Post Syndicated from original https://lwn.net/Articles/842382/rss

Security updates have been issued by openSUSE (chromium), Oracle (firefox), Red Hat (kernel), Scientific Linux (firefox), Slackware (sudo), SUSE (firefox, nodejs10, nodejs12, and nodejs14), and Ubuntu (apt, linux, linux-aws, linux-aws-5.4, linux-azure, linux-azure-4.15, linux-azure-5.4, linux-gcp, linux-gcp-5.4, linux-hwe-5.4, linux-hwe-5.8, linux-oem-5.6, linux-oracle, linux-oracle-5.4, nvidia-graphics-drivers-390, nvidia-graphics-drivers-450, nvidia-graphics-drivers-460, python-apt, and xdg-utils).