Celebrate Black history this month with code!

Post Syndicated from Kevin Johnson original https://www.raspberrypi.org/blog/black-history-month-2022-free-coding-resources/

For those of us living in the USA, February is Black History Month, our month-long celebration of Black history. This is an occasion to highlight the amazing accomplishments of Black Americans through time. Simply put, the possibilities are endless! Black history touches every area of our lives, and it is so important that we seize the opportunity to honor Black freedom fighters who fought for the equality and freedom of ALL people.

That’s why we encourage you to join us in celebrating Black History Month with the help of free, specially chosen coding and computing education resources. We’ve got something for everyone: whether you’re a learner, an educator, a volunteer, or any lover of tech, everyone can participate.

For learners: Celebrate Black History Month with free coding resources

This month, we want to empower young people to think about how they can use code as a tool to celebrate Black history with innovation and creativity. We’ve designed a project card listing the perfect projects to jumpstart young learners’ imagination: 

There are projects for beginner coders, as well as intermediate and advanced coders, in Scratch, Python, HTML/CSS, and Ruby plus Raspberry Pi.

Three young tech creators show off their tech project at Coolest Projects.

For educators: Support Black learners and their communities

We’re working on research to better understand how to support the Black community and other underrepresented communities to engage with computer science.

At Coolest Projects, a group of people explore a coding project.

Take some time this month to explore the following resources to make sure we’re growing into a more diverse and inclusive community: 

  • Culturally relevant pedagogy guide: We’ve worked with a group of teachers and researchers to co-create a guide sharing the key elements of a culturally relevant and responsive teaching approach to curriculum design and teaching in the classroom. Download the guide to see how to teach computing and computer science in a way that values all your learners’ knowledge, ways of learning, and heritage.
A female computing educator with three female students at laptops in a classroom.

For everyone: Listen to Black voices

Uplifting Black voices is one of the best things we can all do this February in observance of Black History Month. We’ve had the privilege of hearing from members in our community about their experiences in tech, and their stories are incredibly insightful and inspiring. 

  • Community stories: Yolanda Payne
    • Meet Yolanda Payne, a highly regarded community member from Atlanta, Georgia who is passionate about connecting young people in her community to opportunities to create with technology.
  • Community stories: Avye 
    • Meet Avye, an accomplished 13-year old girl who is taking the world of robotics by storm and works to help other girls get involved too.

Happy Black History Month! Share with us on Twitter, LinkedIn, Facebook, or Instagram how you’re celebrating in your community.

The post Celebrate Black history this month with code! appeared first on Raspberry Pi.

Monitoring Juniper Mist wireless network

Post Syndicated from Brian van Baekel original https://blog.zabbix.com/monitoring-juniper-mist-wireless-network/19093/

As Premium Zabbix partner, Opensource ICT Solutions is building Zabbix solutions all over the world. That means we have customers with a broad variety of requirements, thoughts on how to monitor things, which metrics are important and how to alert upon it. If one of those customers approaches us with a question concerning a task the likes of which we have never done before, it’s a challenge. And we love challenges! This blog post will cover one such challenge that we solved some time ago.

Quanza is a leading infrastructure operator offering a broad portfolio of services to completely take over the management of networks, data centers and cloud services. With more than 70 colleagues and at least as many specializations, everyone at Quanza works towards the same goal: designing, building, and operating an optimal IT infrastructure. Exactly like you would expect it… and then some. Quanza understands that you prefer to focus on your own innovation. By continuously mapping out your wishes, Quanza provides customized solutions that keep your network up and running 24×7. Today and in the future.

With a relentless focus on mission-critical environments, often of relevance to society, Quanza has an impressive line-up of customers. Some enterprises that chose to partner up with Quanza are SURF, Payvision, the Volksbank, and the Amsterdam Internet Exchange (AMS-IX), one of the world’s largest internet hubs.

Recently, customers started asking Quanza to embed Juniper MIST products for wired and wireless networks in their service portfolio. In order to fully support the network’s lifecycle (build, operate and innovate), the Juniper MIST products will need to be monitored by their 24×7 NOC. This is where we came into play, with our Zabbix knowledge.

We quickly decided to combine the knowledge Quanza has of the Juniper MIST equipment and API and our Zabbix knowledge to build the best possible monitoring solution.

SNMP or cloud?

The Juniper MIST solution is a cloud-based solution that provides a single pane of management for Juniper Networks products. As it’s cloud-based, it’s not a “traditional” network solution. As such, SNMP is not an option for device monitoring as they are communicating only with “the cloud” and we cannot access them directly like we used to do with traditional network equipment.

So, we started to investigate other options. One of the most common options right now is talking to some sort of API and pulling the metrics from that API. With Zabbix “HTTP agent” item key, this is no problem at all. Unfortunately, that’s not how the MIST API works. It’s pushing data instead of letting you pull it (actually, it does – but this doesn’t scale at all). Now, the Zabbix HTTP Agent item type allows trapping, but only in a specific Zabbix sender format. Of course, the MIST API does not allow that.

This means we have a problem. SNMP is not available. Pulling data is not a viable, scalable option. Pushing the data is an option, but Zabbix does not understand that.

Since we are not talking about some sort of proprietary monitoring tool which is completely closed and way too static, there is always a solution with Zabbix as long as you’re creative enough.

Getting data into Zabbix

We needed some middleware. Something that was able to receive that data from MIST and convert it into something that we can push into Zabbix.

That’s exactly what we did. We, together with Quanza, built a middleware that uses an API token to authenticate against the MIST API endpoint. Once the authentication is successful, the middleware is allowed to subscribe to certain “channels”. These channels provide event and performance data. You can compare it with MQTT, where a subscription to channels/topics is needed to get the information you are interested in.

Mist Middleware explained

  • Step 1:  Authenticate using an API token.
  • Step 2: Subscribe to channels
  • Step 3: Receive performance and event data
  • Step 4: Filter out only the relevant (performance) data for Zabbix
  • Step 5: Push into Zabbix

Once we had this in place, the MIST part was finished. We had our data and were able to push it into the monitoring solution.

Parsing in Zabbix

So, right now we have the data available for Zabbix. Time to find a neat way to use it. As the environments (both inventory and the types of equipment that are used) might be dynamic, we definitely do not want to apply any manual work to monitor newly added sites/equipment.

That means that low-level discovery rules are pretty much the only viable solution.

Here we go:

Describing host prototypes

 

 

Within Zabbix, we configure 1 host (the Discovery host) and apply a template on that host, with exactly 1 LLD rule: Query our middleware, and based on the information received, create new hosts (Host prototypes).

The data that is received looks like this:

{
"NODEID":"<NODEID>",
"NAME":"AP-<SITE>-<NUMBER",
"SITENAME":"<SITENAME>",
"SITEID":"<SITEID>",
"MAC":"<MAC ADDRESS>",
"ORGNAME":"<ORG NAME>"
},

Those new/discovered hosts will have the names of the AP and corresponding organization and location (in Mist: site). We also link a template to the discovered host and add it to a Host group with the variables we’ll need later, such as the organization, site name, siteID etc.

So, We need to parse those JSON elements. Luckily Zabbix provides, within the LLD rule config the option to parse this into LLD macros, so for example the Node id is parsed into {#ID} with the use of JSONPath $.NODEID:

LLD macro configuration

Once this process is complete, we have a new host per AP. Of course, there is no data on that host and querying the middleware or Mist is a bad idea. Scalability will be extremely problematic with more than a few organizations and sites configured in the Mist environment. As we’re building this with a big network integrator, scalability is a thing and we do not want to risk having a noticeable performance impact by using polling.

How about pushing data from the middleware into Zabbix? Once the data is received from Mist by the middleware, it’s parsed, filtered and then it pushes out whatever must be pushed out to Zabbix. We decided the best option is to push per host as we have those already available in Zabbix.

Now we should ensure two things:

    • do not overwhelm Zabbix with data being pushed in
    • Getting all the data with the least number of ‘pushes’ into Zabbix

Again, the flexibility of Zabbix is extremely useful here. On the AP hosts, there is a template with exactly 1 trapper item: receive performance data. From there, everything will be handled by the Zabbix ‘Master/Dependent’ item concept. We then extract data like temperatures, CPU load, memory usage, etc.

At the same time, we receive data regarding network usage (interface statistics) and radio information. As we do not know upfront how many network interfaces and radio’s there are on a particular Access Point, we do not want to hard-code such information. Here we are combining the concept of low-level discovery with dependent items (The following blog post covers the logic behind such an approach: Low-Level Discovery with Dependent items – Zabbix Blog)

Using ‘low-level discovery with dependent Items’, all relevant items are created ‘dynamically’ in such a way that a change on the MIST side (for example a new type of Access Point) doesn’t require changes on the Zabbix side. Monitoring starts within minutes and you’ll never miss any problem that might arise!
Just to give you an idea of the flow:
The Master Item gets a JSON format like this (and we’ve parsed only a small portion here) pushed into it from the middleware:

{
"mac":"<MAC ADDRESS>",
"model":"<MODEL>",
"port_stat":{
"eth0":{
"up":true,
"speed":1000,
"full_duplex":true,
"tx_bytes":37291245,
"tx_pkts":169443,
"rx_bytes":123742282,
"rx_pkts":779249,
"rx_errors":0,
"rx_peak_bps":14184,
"tx_peak_bps":5444
}
},
"cpu_util":2,
"cpu_user":652611,
"cpu_system":901455,
"radio_stat":{
"band_5":{
"num_clients":<CLIENTS>,
"channel":<CHANNEL>,
"bandwidth":0,
"power":0,
"tx_bytes":0,
"tx_pkts":0,
"rx_bytes":0,
"rx_pkts":0,
"noise_floor":<NOISE>,
"disabled":true,
"usage":"5",
"util_all":0,
"util_tx":0,
"util_rx_in_bss":0,
"util_rx_other_bss":0,
"util_unknown_wifi":0,
"util_non_wifi":0
}
"env_stat":{
"cpu_temp":<CPU TEMP>,
"ambient_temp":<AMBIENT TEMP>,
"humidity":0,
"attitude":0,
"pressure":0
}
}

Within the Master item, we’re basically not parsing anything, it’s just there to receive the values and push them into the Dependent items. In the dependent items, we start “cherry-picking” only those metrics that we would like to see. As it’s JSON format, preprocessing step “JSONPath” comes in handy. At the same time, we’re looking into efficiency, so a second step is added: discard unchanged with heartbeat (1d):

Example: Getting out the statistics of the 2.4Ghz band radio:

Item prototype proprocessing

Of course, this has to be done with all items.

So far, we’ve heavily focussed on the technical part, but Zabbix does have quite a few options to visualize the data as well. As we’re waiting on the next LTS release, we have only set up a very small dashboard with a few widgets. One of the better ones:

number_clients

Here we’re using the new graph type widget, but instead of plotting the number of clients per AP, we’re plotting a dataset with an “aggregate” function. Of course, if we look at the dashboard widgets, there are many more things that can be visualized…

Efficiency and security considerations

As we were building this, we had 2 main considerations:

    • Efficiency
    • Security

Efficiency, as we are anticipating that Quanza will be responsible for quite a few MIST environments on top of the current environments in the near future, combined with a strict limit of allowed API calls against the MIST API. As such, it is really important to keep those API calls as low as possible. Next to that, with every new Access Point added, the load on the Zabbix server is increasing. Now that is not really a problem, as Zabbix is perfectly capable of monitoring thousands of metrics simultaneously, though it has its limits. And you do not want to hit those limits in a production environment with the only solution being migration to beefier hardware.

Security-wise this challenge had a few things going on since we’re talking to an external exposed API. MIST can invoke webhooks. This might’ve been a bit easier (we explored it, but there were of course other things to keep in mind while going down that road), but the main concern was the requirement that Zabbix / an interface to Zabbix is exposed to the internet. That didn’t look too appealing and required a bit more maintenance. The preferable solution was to create that middleware where we have full control of what queries are executed, how the API token is protected, which connections are established etc. etc.

Conclusion

Although this question was challenging, together with Quanza we created a scalable, secure, and dynamic solution. Zabbix is flexible enough to facilitate the tricks required to provide reliable monitoring and alerting in an efficient and secure manner. We strongly believe the only limitation is your own creativity and this case proves that once again.

Quanza can now ensure the availability of their customer Juniper MIST-based networks, and in case something breaks their 24×7 manned NOC will be able to take whatever action is required to ensure the availability of the customers’ network – all thanks to the flexibility of Zabbix.

The post Monitoring Juniper Mist wireless network appeared first on Zabbix Blog.

Monday Night Itch #1: Mystery Trap Adventure

Post Syndicated from Eevee original https://eev.ee/blog/2022/01/31/monday-night-itch-1-mystery-trap-adventure/

Welcome to Monday Night Itch, a harebrained scheme to encourage folks to play more non-AAA games by adding a touch of social gamification. I thought I would be tweeting my adventures here, but I just had an experience so profound it can only be captured within a blog post.

The rules

Rules” is a strong word, but nevertheless:

  • Every Monday, find a game on itch.io, and pay at least $2 for it.

    You can buy a game with a price tag, or download a free game and leave a tip, but the point of this endeavor is to put money into more places in the ecosystem. (Note that it is possible, though uncommon, for a developer to disable payments altogether.)

  • Play it.

  • Leave a nice comment.

  • Tell at least one person what you played, and what you thought about it.

That’s it. Buy a game, play it, tell someone about it. You can stream it, tweet it, screenshot it, or just tell your boyfriend about it. You don’t have to like it

Your score is how many times you’ve done this, and your streak is how many weeks you’ve done it in a row.

Some other quick tips about itch

The itch app is cool. It’s a pretty thin wrapper around the website, but it adds automatic updating and big red “Launch” buttons and other stuff to make it feel a bit more like a Steam-ish thing. Do keep in mind that devs can upload whatever they want, and sometimes the itch app gets confused.

If you’re not a fan of running mystery software you downloaded from the Internet, you can just play web games and leave tips on those.

There are a lot of NSFW games on itch, but they’re hidden from the main browse pages by default. You can enable them site-wide in your user settings, or add /nsfw to the end of a browse page URL (for example, https://itch.io/gameshttps://itch.io/games/nsfw) to force a list of only NSFW games.

The main event

I decided I wanted to reward Linux releases, and also chip a few bucks towards games with a price tag that aren’t necessarily getting much exposure, so I went to the full list of recent paid Linux games. This is how I discovered Mystery Trap Adventure.

I found myself very much wanting to play this, but I also found myself wondering what sort of impact I should be trying for as the very first iteration of this project. Would I torpedo it if I played a game made by a less experienced dev? Are people looking to this expecting me to uncover unknown indie gems, like I’m wandering a beach with a metal detector?

I checked the dev’s itch profile and this is their ninth project. Every single previous work of their has only a single comment: from them, announcing that comments can be left below. That’s heartbreaking to me, and what made me absolutely sure I wanted to play this. I want to make their day.

And then, dear reader, I felt ashamed. Because who the fuck cares. The world already has enough people who believe that indie games are only valuable if they create the illusion of an eight-digit budget, and I am not here to enable them. Creative work does not need to be polished, mass-appeal, least common denominator stuff handed down from heaven by a billion-dollar international corporation in order to be interesting or worthwhile.

But more importantly, it’s my thing and I’m gonna do whatever the hell I want.

The title screen for Mystery Trap Adventure: a collage of mismatched artwork on a nearly cyan background

And so, Mystery Trap Adventure.

The first thing to note is that the game does not, in fact, have a Linux release. I did strongly suspect this, since a single download is flagged as all of Windows, Mac, and Linux, but the only way to be sure was to buy it. (They’re asking $4; I paid them $10.) Even Wine had trouble with it, for some reason, so I had to play it on our Windows media center.

It’s a sidescrolling platformer where you play as a dragon; you can jump about one tile high (roughly your own height) and shoot fireballs (useful for destroying bricks and defeating the boss). The main obstacle is spikes, which kill you instantly.

Right at the beginning, there’s a block you have to jump on top of, and it was very obvious that I sort of “stuck” to the side of it if I touched it. I thought at first that this was the result of a common platforming gotcha: if you model the player as a dynamic body and implement movement (including air control) as a force on them, then they will stick to walls as long as the corresponding direction is held. This happens because forces on dynamic bodies are external, as though a giant ghost hand were pushing them — so if a player is trying to air control into a wall, the friction against the wall will hold them in place, just as if you were holding a book against a wall with your hand.

(Solving that problem is beyond the scope of this post, sorry.)

Okay, common pitfall, no big deal. I wander ahead a bit. I encounter a slice of watermelon, which allows me to teleport a short distance once. I screw this up the first time while messing with the controls — there’s a wall directly in front of it, so the teleport must be used to skip past that wall — and have to restart.

Now something interesting happens. I’m in a pit with walls on both sides. I can’t teleport again, and even if I could, there are spikes beyond the next wall, so that would kill me immediately.

A screenshot of the situation just described

It dawns on me that this microscopic game has walljumping.

I’m still fairly certain that the player character is a dynamic body, but now I wonder: is the wall stickiness actually due to the friction interaction, or is it a deliberate feature to enable walljumping?

Or, perhaps more likely, is it both? Did the developer trip over this pitfall, and decide to make a gameplay feature out of it? It almost seems unbelievable. I wouldn’t consider walljumping a basic platforming ability, and it’s not obvious how to solve the friction problem, but it seems that this relatively new developer may have solved both problems by simply smashing them together.

And if that’s the case, dearest reader: I fucking love it. That is the true spirit of game development, I think — you have a big complicated simulation you want to make, and you have a big complicated engine that you want to make do it, and you have to kinda mold both of them into fitting better with the other.

I don’t know. I could be completely wrong about this came to be. Or they could have copy/pasted from someone else who had this idea. Either way, it made me smile to see.

The walljumping controls are, ahem, not exactly intuitive, which is why it took me nonzero time to realize it was an ability at all. But honestly, I liked that too. Nowadays, everyone knows exactly how every platforming ability is “supposed” to work, because devs are all copying the same ideas from each other that have been refined over a thousand different iterations. This reminded me of playing games in the early and mid 90s, before everything had standardized as much, when part of the game itself was just working out the right muscle memory to make the right things happen. It’s surprising to find nostalgia in a game because it’s not like others I’ve played before, but there it was. Working out the right timing without any visual cues felt like a puzzle in itself, and getting out of the pit without landing in the spikes was remarkably satisfying. (If it helps: I used different hands for movement and jumping, and I landed on top of the right wall before trying to jump over the spikes.)

Beyond this, the tone changes somewhat to IWBTG-esque traps with no telegraphing. Walking directly to the right will cause spikes to appear from the ground, killing you instantly. Thankfully there aren’t too many of these, and the game is very short, so simply memorizing the handful of places they appear is easy enough.

I have less to say about the rest of the game; you get another quirky powerup you only use once, dodge another couple surprise traps, and face a single boss. The boss is a very large human warrior dude who walks straight at you and swings his sword, which kills you. There’s another fruit above you, but it seems out of reach. He is definitely too tall to jump over. The only solution I found is to simply spam fireballs at him before he can reach you, but I don’t know if this is intended. It seems like it can’t be, since his “health bar” takes the form of a grid of his face behind him, and from where you enter the area, you can’t actually see the whole grid? So surely I’m supposed to be able to get further to the right? But I don’t know.


I finished the game and came back to the following reply to my original thread about this whole concept:

most, i.e. all, small Indy games are terrible.

What a snotty, entitled, mean-spirited sentiment. As if the very existence of a game with lower production values than Resident Evil 8 were a personal offense. It seems to be fairly common, too, and I just do not understand it. Small indie games aren’t trying to squeeze you for more money, lure you in with gambling, exploit your friendships, make your entire life revolve around them. They’re just there.

This attitude is like showing up to everyone who mentions YouTube just to proclaim that everything on it sucks, because Paramount movies are better. That’s great, no one asked! Sometimes I just want to see a seven-second clip of a kitten filmed in a dark room by a $20 phone, because dammit, kittens are still fun to watch. No one makes a point of dunking on videos like that, so I don’t know why anyone is so harsh on amateur games either. Especially when making games is so much more difficult!

Mystery Trap Adventure is that video. Someone had an idea, worked out how to express it, and put it out into the world just because they wanted to. I don’t expect anyone else to buy it or play it; I just want you to know that I did, and it made me smile for a few minutes.

No, a researcher didn’t find Olympics app spying on you

Post Syndicated from original https://blog.erratasec.com/2022/01/no-researcher-didnt-find-olympics-app.html

For the Beijing 2022 Winter Olympics, the Chinese government requires everyone to download an app onto their phone. It has many security/privacy concerns, as CitizenLab documents. However, another researcher goes further, claiming his analysis proves the app is recording all audio all the time. His analysis is fraudulent. He shows a lot of technical content that looks plausible, but nowhere does he show anything that substantiates his claims.

Average techies may not be able to see this. It all looks technical. Therefore, I thought I’d describe one example of the problems with this data — something the average techie can recognize.

His “evidence” consists screenshots from reverse-engineering tools, with red arrows pointing to the suspicious bits. An example of one of these screenshots is this on:

This screenshot is that of a reverse-engineering tool (Hopper, I think) that takes code and “disassembles” it. When you dump something into a reverse-engineering tool, it’ll make a few assumptions about what it sees. These assumptions are usually wrong. There’s a process where the human user looks at the analyzed output, does a “sniff-test” on whether it looks reasonable, and works with the tool until it gets the assumptions correct.
That’s the red flag above: the researcher has dumped the results of a reverse-engineering tool without recognizing that something is wrong in the analysis.

It fails the sniff test. Different researchers will notice different things first. Famed google researcher Tavis Ormandy points out one flaw. In this post, I describe what jumps out first to me. That would be the ‘imul’ (multiplication) instruction shown in the blowup below:

It’s obviously ASCII. In other words, it’s a series of bytes. The tool has tried to interpret these bytes as Intel x86 instructions (like ‘and’, ‘insd’, ‘das’, ‘imul’, etc.). But it’s obviously not Intel x86, because those instructions make no sense.

That ‘imul’ instruction is multiplying something by the (hex) number 0x6b657479. That doesn’t look like a number — it looks like four lower-case ASCII letters. ASCII lower-case letters are in the range 0x61 through 0x7A, so it’s not the single 4-byte number 0x6b657479 but the 4 individual bytes 6b 65 74 79, which map to the ASCII letters ‘k’, ‘e’, ‘t’, ‘y’ (actually, because “little-endian”, reverse order, so “ytek”).

No, no. Techies aren’t expected to be able to read hex this way. Instead, we are expected to recognize what’s going on. I just used a random website to interpret hex bytes as ASCII.

There are 26 lower case letters, roughly 10% of the 256 possible values for a byte. Thus, the chance that a random 4 byte number will consist of all lower-case letters is 1-in-10,000. Moreover, multiplication by strange constants happens even more rarely. You’ll commonly see multiplications by small numbers like 48, or large well-formed numbers like 0x1000000. You pretty much never see multiplication by a number like 0x6b657479, baring something rare like an LCG.

QED: this isn’t actually an Intel x86 imul instruction, it’s ASCII text that the tool has tried to interpret as x86 instructions.

Conclusion

At first glance, all those screenshots by the researcher look very technical, which many will assume supports his claims. But when we actually look at them, none of them support his claims. Instead, it’s all just handwaving nonsense. It’s clear the researchers doesn’t understand them, either.

Security practices in AWS multi-tenant SaaS environments

Post Syndicated from Keith P original https://aws.amazon.com/blogs/security/security-practices-in-aws-multi-tenant-saas-environments/

Securing software-as-a-service (SaaS) applications is a top priority for all application architects and developers. Doing so in an environment shared by multiple tenants can be even more challenging. Identity frameworks and concepts can take time to understand, and forming tenant isolation in these environments requires deep understanding of different tools and services.

While security is a foundational element of any software application, specific considerations apply to SaaS applications. This post dives into the challenges, opportunities and best practices for securing multi-tenant SaaS environments on Amazon Web Services (AWS).

SaaS application security considerations

Single tenant applications are often deployed for a specific customer, and typically only deal with this single entity. While security is important in these environments, the threat profile does not include potential access by other customers. Multi-tenant SaaS applications have unique security considerations when compared to single tenant applications.

In particular, multi-tenant SaaS applications must pay special attention to identity and tenant isolation. These considerations are in addition to the security measures all applications must take. This blog post reviews concepts related to identity and tenant isolation, and how AWS can help SaaS providers build secure applications.

Identity

SaaS applications are accessed by individual principals (often referred to as users). These principals may be interactive (for example, through a web application) or machine-based (for example, through an API). Each principal is uniquely identified, and is usually associated with information about the principal, including email address, name, role and other metadata.

In addition to the unique identification of each individual principal, a SaaS application has another construct: a tenant. A paper on multi-tenancy defines a tenant as a group of one or more users sharing the same view on an application they use. This view may differ for different tenants. Each individual principal is associated with a tenant, even if it is only a 1:1 mapping. A tenant is uniquely identified, and contains information about the tenant administrator, billing information and other metadata.

When a principal makes a request to a SaaS application, the principal provides their tenant and user identifier along with the request. The SaaS application validates this information and makes an authorization decision. In well-designed SaaS applications, this authorization step should not rely on a centralized authorization service. A centralized authorization service is a single point of failure in an application. If it fails, or is overwhelmed with requests, the application will no longer be able to process requests.

There are two key techniques to providing this type of experience in a SaaS application: using an identity provider (IdP) and representing identity or authorization in a token.

Using an Identity Provider (IdP)

In the past, some web applications often stored user information in a relational database table. When a principal authenticated successfully, the application issued a session ID. For subsequent requests, the principal passed the session ID to the application. The application made authorization decisions based on this session ID. Figure 1 provides an example of how this setup worked.

Figure 1 - An example of legacy application authentication.

Figure 1 – An example of legacy application authentication.

In applications larger than a simple web application, this pattern is suboptimal. Each request usually results in at least one database query or cache look up, creating a bottleneck on the data store holding the user or session information. Further, because of the tight coupling between the application and its user management, federation with external identity providers becomes difficult.

When designing your SaaS application, you should consider the use of an identity provider like Amazon Cognito, Auth0, or Okta. Using an identity provider offloads the heavy lifting required for managing identity by having user authentication, including federation, handled by external identity providers. Figure 2 provides an example of how a SaaS provider can use an identity provider in place of the self-managed solution shown in Figure 1.

Figure 2 – An example of an authentication flow that involves an identity provider.

Figure 2 – An example of an authentication flow that involves an identity provider.

Once a user authenticates with an identity provider, the identity provider issues a standardized token. This token is the same regardless of how a user authenticates, which means your application does not need to build in support for multiple different authentication methods tenants might use.

Identity providers also commonly support federated access. Federated access means that a third party maintains the identities, but the identity provider has a trust relationship with this third party. When a customer tries to log in with an identity managed by the third party, the SaaS application’s identity provider handles the authentication transaction with the third-party identity provider.

This authentication transaction commonly uses a protocol like Security Assertion Markup Language (SAML) 2.0. The SaaS application’s identity provider manages the interaction with the tenant’s identity provider. The SaaS application’s identity provider issues a token in a format understood by the SaaS application. Figure 3 provides an example of how a SaaS application can provide support for federation using an identity provider.

Figure 3 - An example of authentication that involves a tenant-provided identity provider

Figure 3 – An example of authentication that involves a tenant-provided identity provider

For an example, see How to set up Amazon Cognito for federated authentication using Azure AD.

Representing identity with tokens

Identity is usually represented by signed tokens. JSON Web Signatures (JWS), often referred to as JSON Web Tokens (JWT), are signed JSON objects used in web applications to demonstrate that the bearer is authorized to access a particular resource. These JSON objects are signed by the identity provider, and can be validated without querying a centralized database or service.

The token contains several key-value pairs, called claims, which are issued by the identity provider. Besides several claims relating to the issuance and expiration of the token, the token can also contain information about the individual principal and tenant.

Sample access token claims

The example below shows the claims section of a typical access token issued by Amazon Cognito in JWT format.

{
  "sub": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "cognito:groups": [
"TENANT-1"
  ],
  "token_use": "access",
  "auth_time": 1562190524,
  "iss": "https://cognito-idp.us-west-2.amazonaws.com/us-west-2_example",
  "exp": 1562194124,
  "iat": 1562190524,
  "origin_jti": "bbbbbbbbb-cccc-dddd-eeee-aaaaaaaaaaaa",
  "jti": "cccccccc-dddd-eeee-aaaa-bbbbbbbbbbbb",
  "client_id": "12345abcde",

The principal, and the tenant the principal is associated with, are represented in this token by the combination of the user identifier (the sub claim) and the tenant ID in the cognito:groups claim. In this example, the SaaS application represents a tenant by creating a Cognito group per tenant. Other identity providers may allow you to add a custom attribute to a user that is reflected in the access token.

When a SaaS application receives a JWT as part of a request, the application validates the token and unpacks its contents to make authorization decisions. The claims within the token set what is known as the tenant context. Much like the way environment variables can influence a command line application, the tenant context influences how the SaaS application processes the request.

By using a JWT, the SaaS application can process a request without frequent reference to an external identity provider or other centralized service.

Tenant isolation

Tenant isolation is foundational to every SaaS application. Each SaaS application must ensure that one tenant cannot access another tenant’s resources. The SaaS application must create boundaries that adequately isolate one tenant from another.

Determining what constitutes sufficient isolation depends on your domain, deployment model and any applicable compliance frameworks. The techniques for isolating tenants from each other depend on the isolation model and the applications you use. This section provides an overview of tenant isolation strategies.

Your deployment model influences isolation

How an application is deployed influences how tenants are isolated. SaaS applications can use three types of isolation: silo, pool, and bridge.

Silo deployment model

The silo deployment model involves customers deploying one set of infrastructure per tenant. Depending on the application, this may mean a VPC-per-tenant, a set of containers per tenant, or some other resource that is deployed for each tenant. In this model, there is one deployment per tenant, though there may be some shared infrastructure for cross-tenant administration. Figure 4 shows an example of a siloed deployment that uses a VPC-per-tenant model.

Figure 4 - An example of a siloed deployment that provisions a VPC-per-tenant

Figure 4 – An example of a siloed deployment that provisions a VPC-per-tenant

Pool deployment model

The pool deployment model involves a shared set of infrastructure for all tenants. Tenant isolation is implemented logically in the application through application-level constructs. Rather than having separate resources per tenant, isolation enforcement occurs within the application. Figure 5 shows an example of a pooled deployment model that uses serverless technologies.

Figure 5 - An example of a pooled deployment model using serverless technologies

Figure 5 – An example of a pooled deployment model using serverless technologies

In Figure 5, an AWS Lambda function that retrieves an item from an Amazon DynamoDB table shared by all tenants needs temporary credentials issued by the AWS Security Token Service. These credentials only allow the requester to access items in the table that belong to the tenant making the request. A requester gets these credentials by assuming an AWS Identity and Access Management (IAM) role. This allows a SaaS application to share the underlying infrastructure, while still isolating tenants from one another. See Isolation enforcement depends on service below for more details on this pattern.

Bridge deployment model

The bridge model combines elements of both the silo and pool models. Some resources may be separate, others may be shared. For example, suppose your application has a shared application layer and an Amazon Relational Database Service (RDS) instance per tenant. The application layer evaluates each request and connects to the database for the tenant that made the request.

This model is useful in a situation where each tenant may require a certain response time and one set of resources acts as a bottleneck. In the RDS example, the application layer could handle the requests imposed by the tenants, but a single RDS instance could not.

The decision on which isolation model to implement depends on your customer’s requirements, compliance needs or industry needs. You may find that some customers can be deployed onto a pool model, while larger customers may require their own silo deployment.

Your tiering strategy may also influence the type of isolation model you use. For example, a basic tier customer might be deployed onto pooled infrastructure, while an enterprise tier customer is deployed onto siloed infrastructure.

For more information about different tenant isolation models, read the tenant isolation strategies whitepaper.

Isolation enforcement depends on service

Most SaaS applications will need somewhere to store state information. This could be a relational database, a NoSQL database, or some other storage medium which persists state. SaaS applications built on AWS use various mechanisms to enforce tenant isolation when accessing a persistent storage medium.

IAM provides fine grain access controls access for the AWS API. Some services, like Amazon Simple Storage Service (Amazon S3) and DynamoDB, provide the ability to control access to individual objects or items with IAM policies. When possible, your application should use IAM’s built-in functionality to limit access to tenant resources. See Isolating SaaS Tenants with Dynamically Generated IAM Policies for more information about using IAM to implement tenant isolation.

AWS IAM also offers the ability to restrict access to resources based on tags. This is known as attribute-based access control (ABAC). This technique allows you to apply tags to supported resources, and make access control decisions based on which tags are applied. This is a more scalable access control mechanism than role-based access control (RBAC), because you do not need to modify an IAM policy each time a resource is added or removed. See How to implement SaaS tenant isolation with ABAC and AWS IAM for more information about how this can be applied to a SaaS application.

Some relational databases offer features that can enforce tenant isolation. For example, PostgreSQL offers a feature called row level security (RLS). Depending on the context in which the query is sent to the database, only tenant-specific items are returned in the results. See Multi-tenant data isolation with PostgreSQL Row Level Security for more information about row level security in PostgreSQL.

Other persistent storage mediums do not have fine grain permission models. They may, however, offer some kind of state container per tenant. For example, when using MongoDB, each tenant is assigned a MongoDB user and a MongoDB database. The secret associated with the user can be stored in AWS Secrets Manager. When retrieving a tenant’s data, the SaaS application first retrieves the secret, then authenticates with MongoDB. This creates tenant isolation because the associated credentials only have permission to access collections in a tenant-specific database.

Generally, if the persistent storage medium you’re using offers its own permission model that can enforce tenant isolation, you should use it, since this keeps you from having to implement isolation in your application. However, there may be cases where your data store does not offer this level of isolation. In this situation, you would need to write application-level tenant isolation enforcement. Application-level tenant isolation means that the SaaS application, rather than the persistent storage medium, makes sure that one tenant cannot access another tenant’s data.

Conclusion

This post reviews the challenges, opportunities and best practices for the unique security considerations associated with a multi-tenant SaaS application, and describes specific identity considerations, as well as tenant isolation methods.

If you’d like to know more about the topics above, the AWS Well-Architected SaaS Lens Security pillar dives deep on performance management in SaaS environments. It also provides best practices and resources to help you design and improve performance efficiency in your SaaS application.

Get Started with the AWS Well-Architected SaaS Lens

The AWS Well-Architected SaaS Lens focuses on SaaS workloads, and is intended to drive critical thinking for developing and operating SaaS workloads. Each question in the lens has a list of best practices, and each best practice has a list of improvement plans to help guide you in implementing them.

The lens can be applied to existing workloads, or used for new workloads you define in the tool. You can use it to improve the application you’re working on, or to get visibility into multiple workloads used by the department or area you’re working with.

The SaaS Lens is available in all Regions where the AWS Well-Architected Tool is offered, as described in the AWS Regional Services List. There are no costs for using the AWS Well-Architected Tool.

If you’re an AWS customer, find current AWS Partners that can conduct a review by learning about AWS Well-Architected Partners and AWS SaaS Competency Partners.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Security Hub forum. To start your 30-day free trial of Security Hub, visit AWS Security Hub.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Keith P

Keith is a senior partner solutions architect on the SaaS Factory team.

Andy Powell

Andy is the global lead partner for solutions architecture on the SaaS Factory team.

[$] Restartable sequences in glibc

Post Syndicated from original https://lwn.net/Articles/883104/rss

“Restartable sequences” are small segments of user-space code designed to
access per-CPU data structures without the need for heavyweight locking.
It is a relatively obscure feature, despite having been supported by the
Linux kernel since the 4.18 release. Among other things, there is no
support in the GNU C Library (glibc) for this feature. That is about to
change with the upcoming glibc 2.35
release, though, so a look at the user-space API
for this feature is warranted.

Mocking service integrations with AWS Step Functions Local

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/mocking-service-integrations-with-aws-step-functions-local/

This post is written by Sam Dengler, Principal Specialist Solutions Architect, and Dhiraj Mahapatro, Senior Specialist Solutions Architect.

AWS Step Functions now supports over 200 AWS Service integrations via AWS SDK Integration. Developers want to build and test control flow logic for workflows using branching logicerror handling, and retries. This allows for precise workflow execution with deterministic results. Additionally, developers use Step Functions’ input and output processing features to transform data as it enters and exits tasks.

Developers can test their state machines locally using Step Functions Local before deploying them to an AWS account. However state machines that use service integrations like AWS Lambda, Amazon SQS, or Amazon SNS require Step Functions Local to perform calls to AWS service endpoints. Often, developers want to test the control and data flows of their state machine executions in isolation, without any dependency on service integration availability.

Today, AWS is releasing Mocked Service Integrations for Step Functions Local. This allows developers to define sample outputs from AWS service integrations. You can combine them into test case scenarios to validate workflow control and data flow definitions. You can find the code used in this post in the Step Functions examples GitHub repository.

Sales lead generation sample workflow

In this example, new sales leads are created in a customer relationship management system. This triggers the sample workflow execution using input data, which provides information about the contact.

Using the sales lead data, the workflow first validates the contact’s identity and address. If valid, it uses Step Functions’ AWS SDK integration for Amazon Comprehend to call the DetectSentiment API. It uses the sales lead’s comments as input for sentiment analysis.

If the comments have a positive sentiment, it adds the sales leads information to a DynamoDB table for follow-up. The event is published to Amazon EventBridge to notify subscribers.

If the sales lead data is invalid or a negative sentiment is detected, it publishes events to EventBridge for notification. No record is added to the Amazon DynamoDB table. The following Step Functions Workflow Studio diagram shows the control logic:

The full workflow definition is available in the code repository. Note the workflow task names in the diagram, such as DetectSentiment, which are important when defining the mocked responses.

Sentiment analysis test case

In this example, you test a scenario in which:

  1. The identity and address are successfully validated using a Lambda function.
  2. A positive sentiment is detected using the Comprehend.DetectSentiment API after three retries.
  3. A contact item is written to a DynamoDB table successfully
  4. An event is published to an EventBridge event bus successfully

The execution path for this test scenario is shown in the following diagram (the red and green numbers have been added). 0 represents the first execution; 1, 2, and 3 represent the max retry attempts (MaxAttempts), in case of an InternalServerException.

Mocked response configuration

To use service integration mocking, create a mock configuration file with sections specifying mock AWS service responses. These are grouped into test cases that can be activated when executing state machines locally. The following example provides code snippets and the full mock configuration is available in the code repository.

To mock a successful Lambda function invocation, define a mock response that conforms to the Lambda.Invoke API response elements. Associate it to the first request attempt:

"CheckIdentityLambdaMockedSuccess": {
  "0": {
    "Return": {
      "StatusCode": 200,
      "Payload": {
        "statusCode": 200,
        "body": "{\"approved\":true,\"message\":\"identity validation passed\"
}"
      }
    }
  }
}

To mock the DetectSentiment retry behavior, define failure and successful mock responses that conform to the Comprehend.DetectSentiment API call. Associate the failure mocks to three request attempts, and associate the successful mock to the fourth attempt:

"DetectSentimentRetryOnErrorWithSuccess": {
  "0-2": {
    "Throw": {
      "Error": "InternalServerException",
      "Cause": "Server Exception while calling DetectSentiment API in Comprehend Service"
    }
  },
  "3": {
    "Return": {
      "Sentiment": "POSITIVE",
      "SentimentScore": {
        "Mixed": 0.00012647535,
        "Negative": 0.00008031699,
        "Neutral": 0.0051454515,
        "Positive": 0.9946478
      }
    }
  }
}

Note that Step Functions Local does not validate the structure of the mocked responses. Ensure that your mocked responses conform to actual responses before testing. To review the structure of service responses, either perform the actual service calls using Step Functions or view the documentation for those services.

Next, associate the mocked responses to a test case identifier:

"RetryOnServiceExceptionTest": {
  "Check Identity": "CheckIdentityLambdaMockedSuccess",
  "Check Address": "CheckAddressLambdaMockedSuccess",
  "DetectSentiment": "DetectSentimentRetryOnErrorWithSuccess",
  "Add to FollowUp": "AddToFollowUpSuccess",
  "CustomerAddedToFollowup": "CustomerAddedToFollowupSuccess"
}

With the test case and mock responses configured, you can use them for testing with Step Functions Local.

Test case execution using Step Functions Local

The Step Functions Developer Guide describes the steps used to set up Step Functions Local on your workstation and create a state machine.

After these steps are complete, you can run a workflow locally using the start-execution AWS CLI command. Activate the mocked responses by appending a pound sign and the test case identifier to the state machine ARN:

aws stepfunctions start-execution \
  --endpoint http://localhost:8083 \
  --state-machine arn:aws:states:us-east-1:123456789012:stateMachine: LeadGenerationStateMachine#RetryOnServiceExceptionTest \
  --input file://events/sfn_valid_input.json

Test case validation

To validate the workflow executed correctly in the test case, examine the state machine execution events using the StepFunctions.GetExecutionHistory API. This ensures that the correct states are used. There are a variety of validation tools available. This post shows how to achieve this using the AWS CLI filtering feature using JMESPath syntax.

In this test case, you validate the TaskFailed and TaskSucceeded events match the retry definition for the DetectSentiment task, which specifies three retries. Use the following AWS CLI command to get the execution history and filter on the execution events:

aws stepfunctions get-execution-history \
  --endpoint http://localhost:8083 \
  --execution-arn <ExecutionArn>
  --query 'events[?(type==`TaskFailed` && contains(taskFailedEventDetails.cause, `Server Exception while calling DetectSentiment API in Comprehend Service`)) || (type==`TaskSucceeded` && taskSucceededEventDetails.resource==`comprehend:detectSentiment`)]'

The results include matching events:

{
  "timestamp": "2022-01-13T17:24:32.276000-05:00",
  "type": "TaskFailed",
  "id": 19,
  "previousEventId": 18,
  "taskFailedEventDetails": {
    "error": "InternalServerException",
    "cause": "Server Exception while calling DetectSentiment API in Comprehend Service"
  }
}

These results should be compared to the test acceptance criteria to verify the execution behavior. Test cases, acceptance criteria, and validation expressions vary by customer and use case. These techniques are flexible to accommodate various happy path and error scenarios. To explore additional sample test cases and examples, visit the example code repository.

Conclusion

This post introduces a new robust way to test AWS Step Functions state machines in isolation. With mocking, developers get more control over the type of scenarios that a state machine can handle, leading to assertion of multiple behaviors. Testing a state machine with mocks can also be part of the software release. Asserting on behaviors like error handling, branching, parallel, dynamic parallel (map state) helps test the entire state machine’s behavior. For any new behavior in the state machine, such as a new type of exception from a state, you can mock and add as a test.

See the Step Functions Developer Guide for more information on service mocking with Step Functions Local. The sample application covers basic scenarios of testing a state machine. You can use a similar approach for complex scenarios including other Step Functions flows, like map and wait.

For more serverless learning resources, visit Serverless Land.

Using the circuit breaker pattern with AWS Step Functions and Amazon DynamoDB

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/using-the-circuit-breaker-pattern-with-aws-step-functions-and-amazon-dynamodb/

This post is written by Anitha Deenadayalan, Developer Specialist SA, DevAx

Modern applications use microservices as an architectural and organizational approach to software development, where the application comprises small independent services that communicate over well-defined APIs.

When multiple microservices collaborate to handle requests, one or more services may become unavailable or exhibit a high latency. Microservices communicate through remote procedure calls, and it is always possible that transient errors could occur in the network connectivity, causing failures.

This can cause performance degradation in the entire application during synchronous execution because of the cascading of timeouts or failures causing poor user experience. When complex applications use microservices, an outage in one microservice can lead to application failure. This post shows how to use the circuit breaker design pattern to help with a graceful service degradation.

Introducing circuit breakers

Michael Nygard popularized the circuit breaker pattern in his book, Release It. This design pattern can prevent a caller service from retrying another callee service call that has previously caused repeated timeouts or failures. It can also detect when the callee service is functional again.

Fallacies of distributed computing are a set of assertions made by Peter Deutsch and others at Sun Microsystems. They say the programmers new to distributed applications invariably make false assumptions. The network reliability, zero-latency expectations, and bandwidth limitations result in software applications written with minimal error handling for network errors.

During a network outage, applications may indefinitely wait for a reply and continually consume application resources. Failure to retry the operations when the network becomes available can also lead to application degradation. If API calls to a database or an external service time-out due to network issues, repeated calls with no circuit breaker can affect cost and performance.

The circuit breaker pattern

There is a circuit breaker object that routes the calls from the caller to the callee in the circuit breaker pattern. For example, in an ecommerce application, the order service can call the payment service to collect the payments. When there are no failures, the order service routes all calls to the payment service by the circuit breaker:

Circuit breaker with no failures

Circuit breaker with no failures

If the payment service times out, the circuit breaker can detect the timeout and track the failure. If the timeouts exceed a specified threshold, the application opens the circuit:

Circuit breaker with payment service failure

Circuit breaker with payment service failure

Once the circuit is open, the circuit breaker object does not route the calls to the payment service. It returns an immediate failure when the order service calls the payment service:

Circuit breaker stops routing to payment service

Circuit breaker stops routing to payment service

The circuit breaker object periodically tries to see if the calls to the payment service are successful:

Circuit breaker retries payment service

Circuit breaker retries payment service

When the call to payment service succeeds, the circuit is closed, and all further calls are routed to the payment service again:

Circuit breaker with working payment service again

Circuit breaker with working payment service again

Architecture overview

This example uses the AWS Step Functions, AWS Lambda, and Amazon DynamoDB to implement the circuit breaker pattern:

Circuit breaker architecture

Circuit breaker architecture

The Step Functions workflow provides circuit breaker capabilities. When a service wants to call another service, it starts the workflow with the name of the callee service.

The workflow gets the circuit status from the CircuitStatus DynamoDB table, which stores the currently degraded services. If the CircuitStatus contains a record for the service called, then the circuit is open. The Step Functions workflow returns an immediate failure and exit with a FAIL state.

If the CircuitStatus table does not contain an item for the called service, then the service is operational. The ExecuteLambda step in the state machine definition invokes the Lambda function sent through a parameter value. The Step Functions workflow exits with a SUCCESS state, if the call succeeds.

The items in the DynamoDB table have the following attributes:

DynamoDB items list

DynamoDB items list

If the service call fails or a timeout occurs, the application retries with exponential backoff for a defined number of times. If the service call fails after the retries, the workflow inserts a record in the CircuitStatus table for the service with the CircuitStatus as OPEN, and the workflow exits with a FAIL state. Subsequent calls to the same service return an immediate failure as long as the circuit is open.

I enter the item with an associated time-to-live (TTL) value to ensure eventual connection retries and the item expires at the defined TTL time. DynamoDB’s time to live (TTL) allows you to define a per-item timestamp to determine when an item is no longer needed. Shortly after the date and time of the specified timestamp, DynamoDB deletes the item from your table without consuming write throughput.

For example, if you set the TTL value to 60 seconds to check a service status after a minute, DynamoDB deletes the item from the table after 60 seconds. The workflow invokes the service to check for availability when a new call comes in after the item has expired.

Circuit breaker Step Function

Circuit breaker Step Function

Prerequisites

For this walkthrough, you need:

Setting up the environment

Use the .NET Core 3.1 code in the GitHub repository and the AWS SAM template to create the AWS resources for this walkthrough. These include IAM roles, DynamoDB table, the Step Functions workflow, and Lambda functions.

  1. You need an AWS access key ID and secret access key to configure the AWS Command Line Interface (AWS CLI). To learn more about configuring the AWS CLI, follow these instructions.
  2. Clone the repo:
    git clone https://github.com/aws-samples/circuit-breaker-netcore-blog
  3. After cloning, this is the folder structure:

    Project file structure

    Project file structure

Deploy using Serverless Application Model (AWS SAM)

The AWS Serverless Application Model (AWS SAM) CLI provides developers with a local tool for managing serverless applications on AWS.

  1. The sam build command processes your AWS SAM template file, application code, and applicable language-specific files and dependencies. The command copies build artifacts in the format and location expected for subsequent steps in your workflow. Run these commands to process the template file:
    cd circuit-breaker
    sam build
  2. After you build the application, test using the sam deploy command. AWS SAM deploys the application to AWS and displays the output in the terminal.
    sam deploy --guided

    Output from sam deploy

    Output from sam deploy

  3. You can also view the output in AWS CloudFormation page.

    Output in CloudFormation console

    Output in CloudFormation console

  4. The Step Functions workflow provides the circuit-breaker function. Refer to the circuitbreaker.asl.json file in the statemachine folder for the state machine definition in the Amazon States Language (ASL).

To deploy with the CDK, refer to the GitHub page.

Running the service through the circuit breaker

To provide circuit breaker capabilities to the Lambda microservice, you must send the name or function ARN of the Lambda function to the Step Functions workflow:

{
  "TargetLambda": "<Name or ARN of the Lambda function>"
}

Successful run

To simulate a successful run, use the HelloWorld Lambda function provided by passing the name or ARN of the Lambda function the stack has created. Your input appears as follows:

{
  "TargetLambda": "circuit-breaker-stack-HelloWorldFunction-pP1HNkJGugQz"
}

During the successful run, the Get Circuit Status step checks the circuit status against the DynamoDB table. Suppose that the circuit is CLOSED, which is indicated by zero records for that service in the DynamoDB table. In that case, the Execute Lambda step runs the Lambda function and exits the workflow successfully.

Step Function with closed circuit

Step Function with closed circuit

Service timeout

To simulate a timeout, use the TestCircuitBreaker Lambda function by passing the name or ARN of the Lambda function the stack has created. Your input appears as:

{
  "TargetLambda": "circuit-breaker-stack-TestCircuitBreakerFunction-mKeyyJq4BjQ7"
}

Again, the circuit status is checked against the DynamoDB table by the Get Circuit Status step in the workflow. The circuit is CLOSED during the first pass, and the Execute Lambda step runs the Lambda function and timeout.

The workflow retries based on the retry count and the exponential backoff values, and finally returns a timeout error. It runs the Update Circuit Status step where a record is inserted in the DynamoDB table for that service, with a predefined time-to-live value specified by TTL attribute ExpireTimeStamp.

Step Function with open circuit

Step Function with open circuit

Repeat timeout

As long as there is an item for the service in the DynamoDB table, the circuit breaker workflow returns an immediate failure to the calling service. When you re-execute the call to the Step Functions workflow for the TestCircuitBreaker Lambda function within 20 seconds, the circuit is still open. The workflow immediately fails, ensuring the stability of the overall application performance.

Step Function workflow immediately fails until retry

Step Function workflow immediately fails until retry

The item in the DynamoDB table expires after 20 seconds, and the workflow retries the service again. This time, the workflow retries with exponential backoffs, and if it succeeds, the workflow exits successfully.

Cleaning up

To avoid incurring additional charges, clean up all the created resources. Run the following command from a terminal window. This command deletes the created resources that are part of this example.

sam delete --stack-name circuit-breaker-stack --region <region name>

Conclusion

This post showed how to implement the circuit breaker pattern using Step Functions, Lambda, DynamoDB, and .NET Core 3.1. This pattern can help prevent system degradation in service failures or timeouts. Step Functions and the TTL feature of DynamoDB can make it easier to implement the circuit breaker capabilities.

To learn more about developing microservices on AWS, refer to the whitepaper on microservices. To learn more about serverless and AWS SAM, visit the Sessions with SAM series and find more resources at Serverless Land.

Scaling DLT to 1M TPS on AWS: Optimizing a Regulated Liabilities Network

Post Syndicated from Erica Salinas original https://aws.amazon.com/blogs/architecture/scaling-dlt-to-1m-tps-on-aws-optimizing-a-regulated-liabilities-network/

SETL is an open source, distributed ledger technology (DLT) company that enables tokenisation, digital custody, and DLT for securities markets and payments. In mid-2021, they developed a blueprint for a Regulated Liabilities Network (RLN) that enables holding and managing a variety of tokenized value irrespective of its form. In a December 2021 collaboration with Amazon Web Services (AWS), SETL demonstrated that their RLN platform could support one million transactions per second. These findings were published in their whitepaper, The Regulated Liability Network: Whitepaper on scalability and performance.

This AWS-hosted architecture scaled each processing component and the complete transaction flow. It was built using Amazon Elastic Kubernetes Service (EKS), Amazon Elastic Cloud Compute (EC2), and Amazon Managed Streaming for Apache Kafka (MSK).

This post discusses the technical implementation of the simulated network. We show how scaling characteristics were achieved, while maintaining the business requirements of atomicity and finality. We also discuss how each RLN component was optimized for high performance.

Background: What is an RLN?

In The Regulated Internet of Value, Tony McLaughlin of Citi proposed a single network to record ownership of multiple token types, each which represents a liability of a regulated entity. The RLN would give commercial banks, e-money providers, and central banks, the ability to issue liabilities and immutably record all balances and owners. Because it is designed to maintain a globally sequenced set of state changes, an unambiguous ledger of token balances can be computed and updated. Transactions, which result in two or more ledger updates, are proposed to the network. They are authorized by all impacted network participants, ordered for distribution to participant ledgers, and then persisted by multiple participant ledgers. This process is completed with five processing components.

Figure 1. Core components and queues of RLN

Figure 1. Core components and queues of RLN

Given the existing performance limitations faced by many DLT platforms, a main concern with the network design was its ability to meet throughput requirements. SETL collaborated with AWS to define an architecture that integrated decoupled processing components, as shown in Figure 2. The target throughput was 1,000,000 TPS for each component through the simulated network.

Figure 2. Simulated RLN architecture in the AWS Cloud

Figure 2. Simulated RLN architecture in the AWS Cloud

A queue – process – queue architectural model

The basic architectural model applied was “queue – process – queue.” Each component consumed transactions from one or more Kafka topics, performed its requisite activities, and produced output to a second Kafka topic. Components of the message flow used Amazon EKS, Amazon EC2, and Amazon MSK.

Specific techniques were applied to achieve the scaling characteristics of components in the main message flow.

  • A distinct EKS-managed node group was used for each RLN component. Each node within the group had its own EC2 instance that housed at most two Kubernetes Pods. This technique decreased potential network throttling and enabled the monitoring of pod resource utilization using Amazon CloudWatch at the EC2 level. This aided in identifying bottlenecks, which can be challenging if large EC2 nodes house many Kubernetes Pods.
  • Each RLN queue component had its own dedicated Kafka cluster. By isolating the Kafka topics to their own cluster, we were able to scale and monitor the cluster individually. This decreased potential chokepoints.
  • A combination of eksctl, kubectl, and EC2 Auto Scaling groups was used to deploy and horizontally scale each RLN component. The tooling enabled rapid results by controlling pod configuration, automating deployments, and supporting multiple performance testing iterations.

Fine tuning RLN system components

The key metric tested was message throughput in transactions per second (TPS) consumed, processed, and produced for each component. An end-to-end test was performed, which measured throughput of the complete simulated network. Each RLN component was tuned to meet this metric as follows.

The Transaction Generator acted as the customer entering transactions into the system (see Figure 3.) The Kafka producer property buffer.memory was changed to 536870912 to simulate an increase in producer TPS for each component.

Figure 3. Simulated RLN Transaction Generator

Figure 3. Simulated RLN Transaction Generator

The Scheduler’s scaling technique was a dedicated Compute (one Pod/EC2 node) that listened to one Kafka partition in the Transaction Queue as seen in Figure 4. Additionally, turning on gzip compression (compression.type) on the producer resolved a network bottleneck for the downstream Kafka cluster.

Figure 4. Simulated RLN Scheduler

Figure 4. Simulated RLN Scheduler

The Approver’s scaling technique was like the Scheduler, however, the Proposal Queue had 10 topics as opposed to one. The approver is stateless, so it randomly selects a Kafka topic and partition to listen to. When scaling to 100 Kubernetes Pods, there would be roughly one Kafka partition per pod. As shown in Figure 5, each pod has its own EC2 instance. The Assembler’s scaling techniques were the same as for the Scheduler and Approver components.

Figure 5. Simulated RLN Approver

Figure 5. Simulated RLN Approver

The Sequencer, as designed, cannot scale horizontally due to the nature of the sequencing algorithm. However, without any special tuning of the Kafka consumer configuration or any significant increase in EC2 sizing, it was able to achieve one million TPS. Its architecture is shown in Figure 6.

Figure 6. Simulated RLN Sequencer

Figure 6. Simulated RLN Sequencer

The State Updater scaled with each Kubernetes Pod listening to an assigned Approved Proposal topic (one-to-one mapping) and receiving all sequenced hashes from the Hash Queue. Achieving 1 million TPS throughput required 10 nodes configured with c5.4xlarge, as seen in Figure 7.

Figure 7. Simulated RLN State Updater

Figure 7. Simulated RLN State Updater

Successful throughput testing

Test results demonstrate that each component can sustain over 1 million TPS. While horizontal scaling was applied, even a single instance achieved over 10,000 TPS. Individual component performance test results can be seen in Table 1.

Component Single Instance (TPS) # of Instances Multiple Instances (TPS)
Generator ~ 629,219 2 1,258,438
Scheduler ~11,019 100 1,101,928
Approver ~10,348 100 1,034,868
Assembler ~10,691 100 1,069,100
Sequencer ~1,052,845 1 N/A
State Updater ~100,256 10 1,002,569

Table 1. Test results per individual component

Network tests were performed with all components integrated to create a simulated RLN network. The tests were executed with transaction generation of ~1,000,000 TPS. Throughput was measured at the output of the final component’s (State Updater) instances (see Figure 8). It was also measured in aggregate at over 1.2 million TPS (see Figure 9).

Figure 8. Individual TPS for State Update component nodes

Figure 8. Individual TPS for State Update component nodes

Figure 9. Aggregate TPS for State Update component nodes

Figure 9. Aggregate TPS for State Update component nodes

While this study was designed to demonstrate capacity and throughput, the tests did replicate Kafka partitions across three Availability Zones in the selected AWS Region. Thus, testing demonstrated that the proposed architecture supports geographic resilience while meeting the throughput benchmark. Resiliency and security are areas of focus for testing in the next phases of architecting a production-ready version of RLN.

Looking forward

During the month of joint work between the SETL and AWS technical teams, we stood up and tuned basic RLN functionality. We successfully demonstrated the scalability of the environment to at least 1M TPS. By using Kafka queues and the horizontal and vertical scalability of AWS services, we addressed the primary concern of reaching production-level throughput.

The industry group collaborating on the RLN concept now has a solid technical foundation on which to build a production-ready RLN architecture. Future experimentation will include integration with financial institutions and technology partners that will provide the capabilities needed to build a successful RLN ecosystem.

Further explorations include enabling digital signing, verification at scale, and expanding the network to include financial institution environments integrated with their own RLN partitions.

Debian tweaks its resolution process

Post Syndicated from original https://lwn.net/Articles/883345/rss

The vote has
concluded
in the Debian project on a general resolution affecting the
way such resolutions are discussed in the future. The changes, as proposed
by Russ Allbery, have been adopted with the required three-to-one
supermajority, though the overall level of voting was low.
The new process is mostly
as described in this article from October
with a few changes. The end result may be to shorten the discussion period
for controversial issues and make the end of that period more predictable.

2021 Cybersecurity Superlatives: An InsightIDR Year in Review

Post Syndicated from KJ McCann original https://blog.rapid7.com/2022/01/31/2021-cybersecurity-superlatives-an-insightidr-year-in-review/

2021 Cybersecurity Superlatives: An InsightIDR Year in Review

We laughed, we cried, we added over 750 new detections. It’s been a rollercoaster of a year for everyone. So let’s have some fun with our 2021 year in review — shall we?

The last year was an exciting one for InsightIDR, Rapid7’s industry-leading extended detection and response (XDR) and SIEM solution. We used the past 12 months to continually invest in the product to help customers level up their security programs and achieve success in their desired outcomes. A major highlight for InsightIDR was being named as a Leader in the 2021 Gartner Magic Quadrant for SIEM for the second year in a row. We are honored to be recognized as one of the six 2021 Magic Quadrant Leaders — and in celebration, we’d like to announce a few awards ourselves for 2021, high-school-superlative style.

Presenting our 2021 superlatives (drum roll, please)…

Most likely to be overworked: Cybersecurity professionals

“We need more time!” exhausted cybersecurity specialists shout into the void. Luckily, we deployed our Insight Agent into the void, so we heard you. While we were in there, we also picked up the following alerts:

  • There aren’t enough people to do it all.
  • More than 3 out of 4 CISOs have 16 or more cybersecurity products, and 12% have 46 or more (my head is spinning).
  • It is getting more difficult to recruit and hire new professionals onto security teams.
  • The workload is growing, and teams are suffering from burnout.

We heard the problem — and took action with our products. Our product updates focused on the following:

  • Improved detection and response capabilities: We added strong detections with a more comprehensive view of threats.
  • Greater efficiency: We helped teams cut down the number of disparate tools and events they have to manage, providing automation and leveling up analysts by giving them embedded guidance and a common experience.
  • Improved scale and agility: When your organization evolves and grows, so do we.
  • Customization: Every environment is unique, and we want to make sure InsightIDR not only works well but works the way you want it.

All sounds good, right? Let’s keep going down the list to see how we continued to evolve our product to align these themes.

Most likely to (help you) succeed: MITRE ATT&CK mapping in InsightIDR

Red pill or blue pill… Psych! They are both the same pill. Welcome to the matrix — the MITRE ATT&CK matrix, that is.

As of Q4 2021, all of our Attacker Behavior Analytics (ABA) map to the ATT&CK framework in InsightIDR.

OK, great… so what does that mean for you?

MITRE ATT&CK matrix for detection rules: Within the Detection Rules tab, you now have a direct view into where you have coverage with Rapid7’s out-of-the-box detection library across common attacker tactics and techniques, and you can also quickly unlock more context and intelligence about detections.

Refreshed Investigation Management experience: Now, you can click into the new MITRE ATT&CK tab of the Evidence panel in Investigation to see descriptions of each tactic, technique, and sub-technique curated by MITRE. Then go directly to attack.mitre.org for more information.

Learn more about InsightIDR and the MITRE ATT&CK matrix.

Best glow-up: Our Investigation Management experience

A security analyst’s time is precious and limited. That’s why we upgraded our Investigation Management experience to help you manage, prioritize, and triage investigations faster. Make sure you check out the following:

  • A revamped user interface with expandable cards displaying investigation information
  • The ability to view, set, and update the priority, status, or disposition of an investigation
  • Filtering by the following fields: date range, assignee, status, priority level
  • That sweet MITRE integration we talked about earlier

Most sophisticated: Our customization capabilities

InsightIDR customers now have more customization and increased visibility for ABA detections. We’re continuing to make improvements and additions to our detections management experience.

  • Detection rules: Filter detection rules by threat group, rule behavior, and attributes for more visibility into your alerts and investigations.
  • Create exceptions to a detection rule: With exceptions for ABA alerts, you can filter out noise very precisely using data from the alert.
  • New detection rules management interface: With this new interface, you can see a priority field for each detection provided by InsightIDR with new actions available.
  • Customizable priorities for UBA detection rules and custom alerts: Associate a rule priority (Critical, High, Medium, or Low) for all UBA and custom alert detection rules.
  • A simplified way to create exceptions: We added a new section to detection rule details within “create exception” to better inform on which data to write exceptions against. So now, when you go to write exceptions, you have all the information you may need within one window.

Most likely to brighten up your day: Pre-built dashboards and enhanced search capabilities

InsightIDR’s Dashboard Library has a growing repository of pre-built dashboards to save you time and eliminate the need for you to build them from scratch. Our pre-built dashboards are accessible to all users. We added the following dashboards to provide you with immediate value, out of the box.

  • Compliance (PCI, HIPAA, ISO)
  • General Security (Firewall, Asset Authentication)
  • Security Tools (Okta, Palo Alto, Crowdstrike)
  • Enhanced Network Traffic Analysis
  • Cloud Security

Check out the whole dashboard library here.

Speaking of saving time, we continued to make investments in Log Search to make searching for actionable information faster and easier for customers. Spend less time searching and more time fighting off the bad guys. You’ve never seen Spiderman spend an hour searching an address in a phone book, have you?

Power couple: IntSights Threat Intelligence and Rapid7’s Insight Platform

This year Rapid7 acquired IntSights, a leading provider of external threat intelligence and remediation. Their flagship external threat intelligence product, Threat Command, is now part of our Rapid7 portfolio.

Threat Command allows any SecOps team, regardless of size or capability maturity, to expand identification and remediation across an ever-expanding attack surface, while automating threat mitigation.

IntSights is already leveling up threat intelligence at Rapid7 — and we are so excited for more synergies on the road ahead in 2022.

We know this romance is going to last. Congrats to the lovely couple!

Brightest future: Rapid7 customers

Our 2022 New Year’s resolution is to not just be your technology vendor but to be your strategic partner. Complacency is not in our vocabulary, so make sure you keep up to date with all of our upcoming releases as we continue to level up your InsightIDR experience with more capabilities, context, customization while keeping our intuitive user experience.

Our customers’ outcomes define our success, and we wouldn’t have it any other way. We are looking forward to accelerating together.

Have a great year!

Additional reading

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Security updates for Monday

Post Syndicated from original https://lwn.net/Articles/883322/rss

Security updates have been issued by Debian (apache-log4j1.2, expat, libraw, prosody, and python-nbxmpp), Fedora (chromium, hiredis, java-11-openjdk, java-latest-openjdk, lua, rust-afterburn, rust-ammonia, rust-askalono-cli, rust-below, rust-cargo-c, rust-cargo-insta, rust-fd-find, rust-insta, rust-lsd, rust-oxipng, rust-python-launcher, rust-ripgrep, rust-ron, rust-ron0.6, rust-similar, rust-similar-asserts, rust-skim, rust-thread_local, rust-tokei, vim, wpa_supplicant, and zola), Gentoo (chromium, chrome), openSUSE (log4j12), Oracle (log4j and polkit), Scientific Linux (java-1.8.0-openjdk), SUSE (log4j12), and Ubuntu (ldns).

Cloudflare Partner Program Now Supports SASE & Zero Trust Managed Services

Post Syndicated from Matthew Harrell original https://blog.cloudflare.com/zero-trust-managed-services/

Cloudflare Partner Program Now Supports SASE & Zero Trust Managed Services

Cloudflare Partner Program Now Supports SASE & Zero Trust Managed Services

The importance of the Cloudflare Partner Network was on full display in 2021, with record level partner growth in 2021 and aiming even higher in 2022. We’ve been listening to our partners and working to constantly strengthen our ability to deliver value for businesses of all types. An area we identified we could do better, is a program to support “service partners” that want to wrap managed and professional services around Cloudflare products. Today, we are excited to announce the next evolution of the Cloudflare Channel and Alliances Partner Program to specifically enable partners that provide services around Cloudflare products with recurring revenue streams as they equip businesses of all sizes and types with Cloudflare’s leading Zero Trust and SASE solutions.

Cloudflare Partner Program Now Supports SASE & Zero Trust Managed Services

Core to enabling Services Partners are some exciting enhancements:

  • New Program Paths
  • New Managed Services Partner (MSP) Accreditation.
  • New Support & Go-To-Market Motions

New Program Paths

We have seen a 29% increase in ransom DDoS attacks over the past year and a 175% increase just last quarter. Partners continue to be on the front lines helping mitigate and prevent disruption from these events as they extend our services. Our goal for 2022 is to arm our partners with the tools to address these increased threats to their customers’ infrastructure and applications, like Office365. We know the painstaking efforts that Services Partners make to build compelling offerings and provide the best customer outcomes. To that end, we’ve enhanced the Cloudflare Channel and Alliances Partner Program to not only focus on extending security, performance, and reliability to more customers, but provide a services-first approach for partners. We are committing to the growth of MSP & MSSP partners who share Cloudflare’s mission of helping to build a better Internet.

We’ve been building out the Cloudflare Partner Network for years, working alongside businesses of all sizes and types including our world-wide system integrator partners like Rackspace, as well as partners across EMEA, APAC like Blazeclan, and Kingsoft in China, to name a few. Working with enthusiastic, targeted partners has given us the opportunity to build a program to deliver a robust yet tailored program to support our partners’ ever-evolving needs from network transformations and application modernization, to Zero Trust solutions and onwards.

“Rackspace offers expert consultative services and 24×7 managed support based on Cloudflare’s web application security, secure Internet access, gateway and browsing technologies. With Rackspace Elastic Engineering for Security and Cloudflare’s cutting-edge cloud solutions, we’ll be able to help our customers fast-forward to a modern, Zero Trust approach to securing their teams, applications and infrastructure.”
Gary Alterson, Vice President of Security Services at Rackspace Technology

“We are excited to participate in Cloudflare’s new Services Partner Program.  Cloudflare provides the advanced cloud technology at the center of digital business models.  As a Cloudflare MSP Elite partner, we are able to leverage their innovative cloud solutions with embedded, network edge security to give our customers single-pane-of-glass reporting and policy management across internet applications.”
Veeraj Thaploo, CTO, Blazeclan technologies

Advanced Managed Services Partners

The entry point for partners that are providing managed IT services for small to midsize organizations who are fanatical about providing the best customer experience by leveraging Cloudflare’s extensive platform. Advanced MSP Partners will have access to tools that will help embed Cloudflare into their services offerings to deliver unique, outcome-based SASE and Zero Trust solutions.

Elite Managed Services Partners

Reserved for global partners that possess resource-intense engagement in IT consulting, advisory, and large scale services in the Large Enterprise segment. Elite MSP Partners will have access to dedicated technical and sales resources as well as a named Partner Success Manager to guide growth and business development around their Cloudflare-based services practice. In addition, these partners will have access to executive leadership and marketing programs to develop joint go-to-market functions.

Services-Only Partnerships

Invite-only for partners to deliver general professional services, highly technical services such as Application Modernization via Cloudflare Workers development, Implementation and Migration Services, and Security Assessments.

New Managed Services Partner Accreditation (AMSP)

Helping build a better Internet isn’t simple. Providing managed services for a wide range of customer types and sizes requires a targeted but ever-evolving approach. To support the new Services Program, we’ve introduced a new Certification course that provides the foundational tools for creating the Cloudflare customer experience. Course highlights include best practices for services design, implementation, and providing ongoing technical and relationship management. Services Partners provide customers with a seamless experience from implementation to configuration to ongoing services; here are some of the highlights:

  • New Enablement path for MSP Partners – L1/ L2 Support + Account Management
  • New Customer Success Enablement path for Services Partners
  • New Specialist Accreditation for Cloudflare Zero Trust
Cloudflare Partner Program Now Supports SASE & Zero Trust Managed Services

New Support & Go-To-Market Motions

Technical Services Managers & Partner Success

Regional resources dedicated to helping partners develop, drive, and expand their Cloudflare-based services offerings, specifically centered around SASE and Zero Trust.

Tenant API Provisioning

Our Partner Platform was built to be extensible, and we’re excited to announce that all of our partners can support the provisioning and deployment of a leading Zero Trust suite of products.

MSP-only Licensing Plan

MSPs and MSSPs deliver finished goods and need cost-efficient ways to deliver customer outcomes. We’ve heard this from our partners, and we are developing a licensing plan that enables MSPs to sell their services powered by the Cloudflare One platform.

Transform your Customer’s Network

We’ve been working diligently to build the pilot program throughout this past year, and are eager to work with the next cohort of service partners that want to help build a better Internet for customers. We’re just getting started in providing a tailored and targeted foundation for the Cloudflare Partner Ecosystem.

Cloudflare Services Partner Offering Menu

Cloudflare Partner Program Now Supports SASE & Zero Trust Managed Services

Come join us!

During the initial launch phase, we are working with a select group of partners, and are looking forward to expanding and growing with even more partnerships. Our goal is to enable partners that are creating cloud-based security solutions and that want to join us, to further our mission to help build a better Internet. If you believe your organization would be a great fit for this program, we would love to chat with you.

More Information:

Become a Partner: Partner Portal or reach out to [email protected].

The collective thoughts of the interwebz