Security updates for Wednesday

Post Syndicated from original https://lwn.net/Articles/873462/rss

Security updates have been issued by Debian (ffmpeg, smarty3, and strongswan), Fedora (udisks2), openSUSE (flatpak, strongswan, util-linux, and xstream), Oracle (redis:5), Red Hat (java-1.8.0-openjdk, java-11-openjdk, openvswitch2.11, redis:5, redis:6, and rh-redis5-redis), SUSE (flatpak, python-Pygments, python3, strongswan, util-linux, and xstream), and Ubuntu (linux, linux-aws, linux-aws-5.11, linux-azure, linux-azure-5.11, linux-gcp, linux-gcp-5.11, linux-hwe-5.11, linux-kvm, linux-raspi and strongswan).

A Matter of Perspective: Agent-Based and Agentless Approaches to Cloud Security, Part 1

Post Syndicated from Amit Bawer original https://blog.rapid7.com/2021/10/20/a-matter-of-perspective-agent-based-and-agentless-approaches-to-cloud-security-part-1/

A Matter of Perspective: Agent-Based and Agentless Approaches to Cloud Security, Part 1

When it comes to securing your cloud assets’ activities at runtime, the first step is deciding how. There are enough possible solutions that you’re likely to find yourself at a crossroads trying to decide between them. The factors that may affect your choice include:

  • Friction level — How time-consuming or disruptive is it to instrument the solution within the existing environment? What happens to normal operations when the solution malfunctions or becomes misconfigured?
  • Costs — How much am I going to pay for an effective solution?
  • Scalability — Would I have to bother with instrumenting this solution over and over again as more assets are being added to my environment (which might have just happened without any intervention)?
  • Blind spots — What coverage does a single instrumentation provide? Does it cover all an asset’s activities and communications? How is overall visibility impaired when it stops working?
  • Depth of view — How deep does the inspection go per asset? Is it capable of retrieving all viable information required for detection of vulnerabilities and ongoing malicious activities? Is it sufficient for a reliable detection of behavioral anomalies?
  • Breadth of view — Do I get the big picture of what is going on? Can suspicious activity be linked to all assets involved? And again, can the solution reliably detect behavioral anomalies?
  • Forensics — Am I able to keep the data for post-mortem analysis on the crime afterward? Does the solution allow me to make smart conclusions about the next steps for mitigation?

In addition to such questions, there are also practical aspects of your existing cloud platform settings that may affect your selection. For example, working on a serverless setup, in which the hosting instances are completely segregated from your reach, will rule out solutions involving security agents designed to run at underlying host-level scopes.

Agent-based solutions

A solution involving an applied process or a module that resides within the asset’s scope. These kinds of solutions are aware of the hosting platform and are tailored to probe its application, as well as valuable runtime information regarding process activities, incoming or outgoing network traffic of the monitored asset, and possibly other local resources that may play a part in denial-of-service attacks, such as CPU and memory consumption. Along with information access, an agent and its host also share resources. The agent is usually also deployed with privileged access rights (a “let’s run as root” kind of routine) so it can operate at the operating system level and even in the kernel space of its host.

So how does an agent-based solution stack up against the deciding factors we listed earlier?

  • Friction — Due to the nature of their deployment, agent-based solutions can cause a considerable amount of friction. In the absence of an inherent central management, these solutions are required to expose some APIs, which may be non-trivial to operate in order to control and configure their activation.
  • Cost — Agent-based deployments are not part of the provider service stack and, as such, are mostly subject to solution vendor licensing fees. These fees can be unrelated to the actual data billing and thus may save on costs in practice.
  • Scalability — Agent-based solutions are usually bound per specific asset, so they’re required to scale up and down as assets are added or removed. This may incur more resource consumption from the shared resources pool of the assets and agents.
  • Blind spots — Agents usually operate as a mesh of probes, providing the big picture by correlating their spot coverages, typically within a database and a management application. When a probe goes down, its spot coverage vanishes as well, and the runtime information will be missing for the assets it covers.
  • Depth of view — Agent-based solutions are tailored tightly to the asset’s platform, which they monitor and secure. As such, they can fish out information an external service would never be able to find from the depths of the operating system, kernel, and other local devices.
  • Breadth of view — As mentioned, the benefit of providing the overall picture comes from correlating a mesh of agent information within a single point. Agents can’t do this by themselves. It requires the help of external applications and possibly a central database to maintain the findings and make useful links between them.
  • Forensics — Having a realistic retention policy for findings can play a role in determining how far back to investigate an incident and what conclusions can be drawn about it.

Agentless solutions

Instead of deploying an agent per asset or per subset of assets, agentless solutions are deployed at the cluster or cloud account level. As they live outside of the assets themselves, these kinds of solutions are based on the cloud provider’s native APIs and services. They’re also affected by and confined to the provider’s specified functionality. In this case, the protected assets and the agentless service share no resources, and there’s a strong reliance on the cloud provider role schemes and APIs for accessing valuable information.

  • Friction — Agentless solutions are provided as part of the official provider services stack. The only effort on the operator’s end is to enable and customize it, and customization can usually be simplified to the default settings for less experienced users. Other aspects of updates and fixes should be seamless as part of the overall provider’s user experience suite.
  • Costs — The provider’s pricing model may vary, but there’s a trend of billing per data capacity used for tracking, so agentless solutions can become quite an expense for a full-blown active network of assets.
  • Scalability — Agentless solutions take advantage of centralized services within the provider’s platform. As such, they can scale well, since they aren’t relying on the same resource pool as the assets themselves.
  • Blind spots — Provider-based solutions aim to deliver the big picture of cloud activity as a whole. In this context, a blind spot would be related to poor depth of view per asset internals, so you might consider blind spots to actually be “blind layers.”
  • Depth of view — As we’ve discussed, agentless solutions rely mostly on platform data handled by the provider itself, such as the cluster networking and user accounts. This being the case, these solutions are less aware of the containerized internals, such as the processes being executed within the containers.
  • Breadth of view: Agentless solutions have a cluster-level view of asset activities, usually from a single collection point. This makes it easy to correlate between different cluster assets’ activities.
  • Forensics: The forensic capabilities of an agentless solution are mostly related to how well the solution is integrated with the data retention facilities of the provider.

So if both approaches have their strengths and weaknesses, which do you choose? Stay tuned for part 2, where we’ll discuss a third option and draw conclusions.

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

How a simple Linux kernel memory corruption bug can lead to complete system compromise (Project Zero)

Post Syndicated from original https://lwn.net/Articles/873410/rss

Over at the Project Zero blog, Jann Horn has a lengthy blog post on a kernel bug, ways to exploit it, and various ideas on mitigation. While the exploitation analysis is highly detailed, more than half of the post looks at various defenses to this kind of bug.

This blog post describes a straightforward Linux kernel locking bug and how I exploited it against Debian Buster’s 4.19.0-13-amd64 kernel. Based on that, it explores options for security mitigations that could prevent or hinder exploitation of issues similar to this one.

I hope that stepping through such an exploit and sharing this compiled knowledge with the wider security community can help with reasoning about the relative utility of various mitigation approaches.

A lot of the individual exploitation techniques and mitigation options that I am describing here aren’t novel. However, I believe that there is value in writing them up together to show how various mitigations interact with a fairly normal use-after-free exploit.

Getting Cloudflare Tunnels to connect to the Cloudflare Network with QUIC

Post Syndicated from Sudarsan Reddy original https://blog.cloudflare.com/getting-cloudflare-tunnels-to-connect-to-the-cloudflare-network-with-quic/

Getting Cloudflare Tunnels to connect to the Cloudflare Network with QUIC

Getting Cloudflare Tunnels to connect to the Cloudflare Network with QUIC

I work on Cloudflare Tunnel, which lets customers quickly connect their private services and networks through the Cloudflare network without having to expose their public IPs or ports through their firewall. Tunnel is managed for users by cloudflared, a tool that runs on the same network as the private services. It proxies traffic for these services via Cloudflare, and users can then access these services securely through the Cloudflare network.

Recently, I was trying to get Cloudflare Tunnel to connect to the Cloudflare network using a UDP protocol, QUIC. While doing this, I ran into an interesting connectivity problem unique to UDP. In this post I will talk about how I went about debugging this connectivity issue beyond the land of firewalls, and how some interesting differences between UDP and TCP came into play when sending network packets.

How does Cloudflare Tunnel work?

Getting Cloudflare Tunnels to connect to the Cloudflare Network with QUIC

cloudflared works by opening several connections to different servers on the Cloudflare edge. Currently, these are long-lived TCP-based connections proxied over HTTP/2 frames. When Cloudflare receives a request to a hostname, it is proxied through these connections to the local service behind cloudflared.

While our HTTP/2 protocol mode works great, we’d like to improve a few things. First, TCP traffic sent over HTTP/2 is susceptible to Head of Line (HoL) blocking — this affects both HTTP traffic and traffic from WARP routing. Additionally, it is currently not possible to initiate communication from cloudflared’s HTTP/2 server in an efficient way. With the current Go implementation of HTTP/2, we could use Server-Sent Events, but this is not very useful in the scheme of proxying L4 traffic.

The upgrade to QUIC solves possible HoL blocking issues and opens up avenues that allow us to initiate communication from cloudflared to a different cloudflared in the future.

Naturally, QUIC required a UDP-based listener on our edge servers which cloudflared could connect to. We already connect to a TCP-based listener for the existing protocols, so this should be nice and easy, right?

Failed to dial to the edge

Things weren’t as straightforward as they first looked. I added a QUIC listener on the edge, and the ability for cloudflared to connect to this new UDP-based listener. I tried to run my brand new QUIC tunnel and this happened.

$  cloudflared tunnel run --protocol quic my-tunnel
2021-09-17T18:44:11Z ERR Failed to create new quic connection, err: failed to dial to edge: timeout: no recent network activity

cloudflared wasn’t even establishing a connection to the edge. I started looking at the obvious places first. Did I add a firewall rule allowing traffic to this port? Check. Did I have iptables rules ACCEPTing or DROPping appropriate traffic for this port? Check. They seemed to be in order. So what else could I do?

tcpdump all the packets

I started by logging for UDP traffic on the machine my server was running on to see what could be happening.

$  sudo tcpdump -n -i eth0 port 7844 and udp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
14:44:27.742629 IP 173.13.13.10.50152 > 198.41.200.200.7844: UDP, length 1252
14:44:27.743298 IP 203.0.113.0.7844 > 173.13.13.10.50152: UDP, length 37

Looking at this tcpdump helped me understand why I had no connectivity! Not only was this port getting UDP traffic but I was also seeing traffic flow out. But there seemed to be something strange afoot. Incoming packets were being sent to 198.41.200.200:7844 while responses were being sent back from 203.0.113.0:7844 (this is an example IP used for illustration purposes)  instead.

Why is this a problem? If a host (in this case, the server) chooses an address from a network unable to communicate with a public Internet host, it is likely that the return half of the communication will never arrive. But wait a minute. Why is some other IP getting prioritized over a source address my packets were already being sent to? Let’s take a deeper look at some IP addresses. (Note that I’ve deliberately oversimplified and scrambled results to minimally illustrate the problem)

$  ip addr list
eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1600 qdisc noqueue state UP group default qlen 1000
inet 203.0.113.0/32 scope global eth0
inet 198.41.200.200/32 scope global eth0 

$ ip route show
default via 203.0.113.0 dev eth0

So this was clearly why the server was working fine on my machine but not on the Cloudflare edge servers. It looks like I have multiple IPs on the interface my service is bound to. The IP that is the default route is being sent back as the source address of the packet.

Why does this work for TCP but not UDP?

Connection-oriented protocols, like TCP, initiate a connection (connect()) with a three-way handshake. The kernel therefore maintains a state about ongoing connections and uses this to determine the source IP address at the time of a response.

Because UDP (unless SOCK_SEQPACKET is involved) is connectionless, the kernel cannot maintain state like TCP does. The recvfrom  system call is invoked from the server side and tells who the data comes from. Unfortunately, recvfrom  does not tell us which IP this data is addressed for. Therefore, when the UDP server invokes the sendto system call to respond to the client, we can only tell it which address to send the data to. The responsibility of determining the source-address IP then falls to the kernel. The kernel has certain heuristics that it uses to determine the source address. This may or may not work, and in the ip routes example above, these heuristics did not work. The kernel naturally (and wrongly) picks the address of the default route to respond with.

Telling the kernel what to do

I had to rely on my application to set the source address explicitly and therefore not rely on kernel heuristics.

Linux has some generic I/O system calls, namely recvmsg  and sendmsg. Their function signatures allow us to both read or write additional out-of-band data we can pass the source address to. This control information is passed via the msghdr struct’s msg_control field.

ssize_t sendmsg(int socket, const struct msghdr *message, int flags)
ssize_t recvmsg(int socket, struct msghdr *message, int flags);
 
struct msghdr {
     void    *   msg_name;   /* Socket name          */
     int     msg_namelen;    /* Length of name       */
     struct iovec *  msg_iov;    /* Data blocks          */
     __kernel_size_t msg_iovlen; /* Number of blocks     */
     void    *   msg_control;    /* Per protocol magic (eg BSD file descriptor passing) */
    __kernel_size_t msg_controllen; /* Length of cmsg list */
     unsigned int    msg_flags;
};

We can now copy the control information we’ve gotten from recvmsg back when calling sendmsg, providing the kernel with information about the source address.The library I used (https://github.com/lucas-clemente/quic-go) had a recent update that did exactly this! I pulled the changes into my service and gave it a spin.

But alas. It did not work! A quick tcpdump showed that the same source address was being sent back. It seemed clear from reading the source code that the recvmsg and sendmsg were being called with the right values. It did not make sense.

So I had to see for myself if these system calls were being made.

strace all the system calls

strace is an extremely useful tool that tracks all system calls and signals sent/received by a process. Here’s what it had to say. I’ve removed all the information not relevant to this specific issue.

17:39:09.130346 recvmsg(3, {msg_name={sa_family=AF_INET6,
sin6_port=htons(35224), inet_pton(AF_INET6, "::ffff:171.54.148.10", 
&sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, msg_namelen=112->28, msg_iov=
[{iov_base="_\5S\30\273]\275@\34\24\322\243{2\361\312|\325\n\1\314\316`\3
03\250\301X\20", iov_len=1452}], msg_iovlen=1, msg_control=[{cmsg_len=36, 
cmsg_level=SOL_IPV6, cmsg_type=0x32}, {cmsg_len=28, cmsg_level=SOL_IP, 
cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=if_nametoindex("eth0"),
ipi_spec_dst=inet_addr("198.41.200.200"),ipi_addr=inet_addr("198.41.200.200")}},
{cmsg_len=17, cmsg_level=SOL_IP, 
cmsg_type=IP_TOS, cmsg_data=[0]}], msg_controllen=96, msg_flags=0}, 0) = 28 <0.000007>
17:39:09.165160 sendmsg(3, {msg_name={sa_family=AF_INET6, 
sin6_port=htons(35224), inet_pton(AF_INET6, "::ffff:171.54.148.10", 
&sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, msg_namelen=28, 
msg_iov=[{iov_base="Oe4\37:3\344 &\243W\10~c\\\316\2640\255*\231 
OY\326b\26\300\264&\33\""..., iov_len=1302}], msg_iovlen=1, msg_control=
[{cmsg_len=28, cmsg_level=SOL_TCP, cmsg_type=0x8}], msg_controllen=28, 
msg_flags=0}, 0) = 1302 <0.000054>

Let’s start with recvmsg . We can clearly see that the ipi_addr for the source is being passed correctly: ipi_addr=inet_addr(“172.16.90.131”). This part works as expected. Looking at sendmsg  almost instantly tells us where the problem is. The field we want, ip_spec_dst is not being set as we make this system call. So the kernel continues to make wrong guesses as to what the source address may be.

This turned out to be a bug where the library was using IPROTO_TCP instead of IPPROTO_IPV4 as the control message level while making the sendmsg call. Was that it? Seemed a little anticlimactic. I submitted a slightly more typesafe fix and sure enough, straces now showed me what I was expecting to see.

18:22:08.334755 sendmsg(3, {msg_name={sa_family=AF_INET6, 
sin6_port=htons(37783), inet_pton(AF_INET6, "::ffff:171.54.148.10", 
&sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, msg_namelen=28, 
msg_iov=
[{iov_base="Ki\20NU\242\211Y\254\337\3107\224\201\233\242\2647\245}6jlE\2
70\227\3023_\353n\364"..., iov_len=33}], msg_iovlen=1, msg_control=
[{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, cmsg_data=
{ipi_ifindex=if_nametoindex("eth0"), 
ipi_spec_dst=inet_addr("198.41.200.200"),ipi_addr=inet_addr("0.0.0.0")}}
], msg_controllen=32, msg_flags=0}, 0) =
33 <0.000049>

cloudflared is now able to connect with UDP (QUIC) to the Cloudflare network from anywhere in the world!

$  cloudflared tunnel --protocol quic run sudarsans-tunnel
2021-09-21T11:37:30Z INF Starting tunnel tunnelID=a72e9cb7-90dc-499b-b9a0-04ee70f4ed78
2021-09-21T11:37:30Z INF Version 2021.9.1
2021-09-21T11:37:30Z INF GOOS: darwin, GOVersion: go1.16.5, GoArch: amd64
2021-09-21T11:37:30Z INF Settings: map[p:quic protocol:quic]
2021-09-21T11:37:30Z INF Initial protocol quic
2021-09-21T11:37:32Z INF Connection 3ade6501-4706-433e-a960-c793bc2eecd4 registered connIndex=0 location=AMS

While the programmatic bug causing this issue was a trivial one, the journey into systematically discovering the issue and understanding how Linux internals worked for UDP along the way turned out to be very rewarding for me. It also reiterated my belief that tcpdump and strace are indeed invaluable tools in anybody’s arsenal when debugging network problems.

What’s next?

You can give this a try with the latest cloudflared release at https://github.com/cloudflare/cloudflared/releases/latest. Just remember to set the protocol flag to quic. We plan to leverage this new mode to roll out some exciting new features for Cloudflare Tunnel. So upgrade away and keep watching this space for more information on how you can take advantage of this.

Textbook Rental Scam

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2021/10/textbook-rental-scam.html

Here’s a story of someone who, with three compatriots, rented textbooks from Amazon and then sold them instead of returning them. They used gift cards and prepaid credit cards to buy the books, so there was no available balance when Amazon tried to charge them the buyout price for non-returned books. They also used various aliases and other tricks to bypass Amazon’s fifteen-book limit. In all, they stole 14,000 textbooks worth over $1.5 million.

The article doesn’t link to the indictment, so I don’t know how they were discovered.

Take part in the UK Bebras Challenge 2021 for schools!

Post Syndicated from Duncan Maidens original https://www.raspberrypi.org/blog/uk-bebras-challenge-2021-for-schools/

The annual UK Bebras Computational Thinking Challenge is back to provide fun, brain-teasing puzzles for schools from 8 to 19 November!

The UK Bebras Challenge 2021 runs from 8 to 19 November.

In the free Bebras Challenge, your students get to practise their computational thinking skills while solving a set of accessible, puzzling, and engaging tasks over 40 minutes. It’s tailored for age groups from 6 to 18.

“I just want to say how much the children are enjoying this competition. It is the first year we have entered, and I have students aged 8 to 11 participating in my Computing lessons, with some of our older students also taking on the challenges. It is really helping to challenge their thinking, and they are showing great determination to try and complete each task!”

– A UK-based teacher

Ten key facts about Bebras

  1. It’s free!
  2. The challenge takes place in school, and it’s a great whole-school activity
  3. It’s open to learners aged 6 to 18, with activities for different age groups
  4. The challenge is made up of a set of short tasks, and completing it takes 40 minutes
  5. The closing date for registering your school is 4 November
  6. Your learners need to complete the challenge between 8 and 19 November 2021
  7. All the marking is done for you (hurrah!)
  8. You’ll receive the results and answers the week after the challenge ends, so you can go through them with your learners and help them learn more
  9. The tasks are logical thinking puzzles, so taking part does not require any computing knowledge
  10. There are practice questions you can use to help your learners prepare for the challenge, and throughout the year to help them practice their computational thinking

Do you want to support your learners to take on the Bebras Challenge? Then register your school today!

Remember to sign up by 4 November!

The benefits of Bebras

Bebras is an international challenge that started in Lithuania in 2004 and has grown into a worldwide event. The UK became involved in Bebras for the first time in 2013, and the number of participating students has increased from 21,000 in the first year to more than half a million over the last two years! Internationally, nearly 2.5 million learners took part in 2020 despite the disruptions to schools.

On the left, a drawing of a bracelet made of stars and moons.
On the left, a bracelet design from an activity for ages 10–12. On the right, a password checker from an activity for ages 14–16.

Bebras, brought to you in the UK by us and Oxford University, is a great way to give your learners of all age groups a taste of the principles behind computing by engaging them in fun problem-solving activities. The challenge results highlight computing principles, so Bebras can be educational for you as a teacher too.

Throughout the year, questions from previous years of the challenge are available to registered teachers on the bebras.uk website, where you can create self-marking quizzes to help you deliver the computational thinking part of the curriculum for your classes.

You can register your school at bebras.uk/admin.

Learn more about our work to support learners with computational thinking.

The post Take part in the UK Bebras Challenge 2021 for schools! appeared first on Raspberry Pi.

Zero Trust — Not a Buzzword

Post Syndicated from Fernando Serto original https://blog.cloudflare.com/zero-trust-not-a-buzzword/

Zero Trust — Not a Buzzword

Zero Trust — Not a Buzzword

Over the last few years, Zero Trust, a term coined by Forrester, has picked up a lot of steam. Zero Trust, at its core, is a network architecture and security framework focusing on not having a distinction between external and internal access environments, and never trusting users/roles.

In the Zero Trust model, the network only delivers applications and data to authenticated and authorised users and devices, and gives organisations visibility into what is being accessed and to apply controls based on behavioural analysis. It gained popularity as the media reported on several high profile breaches caused by misuse, abuse or exploitation of VPN systems, breaches into end-users’ devices with access to other systems within the network, or breaches through third parties — either by exploiting access or compromising software repositories in order to deploy malicious code. This would later be used to provide further access into internal systems, or to deploy malware and potentially ransomware into environments well within the network perimeter.

When we first started talking to CISOs about Zero Trust, it felt like it was just a buzzword, and CISOs were bombarded with messaging from different cybersecurity vendors offering them Zero Trust solutions. Recently, another term, SASE (Secure Access Services Edge), a framework released by Gartner, also came up and added even more confusion to the mix.

Then came COVID-19 in 2020, and with it the reality of lockdowns and remote work. And while some organizations took that as an opportunity to accelerate projects around modernising their access infrastructure, others, due to procurement processes, or earlier technology decisions, ended up having to take a more tactical approach, ramping up existing remote access infrastructure by adding more licenses or capacity without having an opportunity to rethink their approach, nor having an opportunity to take into account the impact of their employees’ experience while working remotely full time in the early days of the pandemic.

So we thought it might be a good time to check on organizations in Asia Pacific, and look at the following:

  • The pandemic’s impact on businesses
  • Current IT security approaches and challenges
  • Awareness, adoption and implementation of Zero Trust
  • Key drivers and challenges in adopting Zero Trust

In August 2021, we commissioned a research company called The Leading Edge to conduct a survey that touches on these topics. The survey was conducted across five countries — Australia, India, Japan, Malaysia, and Singapore, and 1,006 IT and cybersecurity decision-makers and influencers from companies with more than 500 employees participated.

For example, 54% of organisations said they saw an increase in security incidents in 2021, when compared to the previous year, with 83% of respondents who experienced security incidents saying they had to make significant changes to their IT security procedures as a result.

Zero Trust — Not a Buzzword
Increase in security incidents when compared to 2020. ▲▼ Significantly higher/lower than total sample

And while the overall APAC stats are already quite interesting, I thought it would be even more fascinating to look at the unique characteristics of each of the five countries, so let’s have a look:

Australia

Australian organisations reported the highest impact of COVID-19 when it comes to their IT security approach, with 87% of the 203 respondents surveyed saying the pandemic had a moderate to significant impact on their IT security posture. The two biggest cities in Australia (Sydney and Melbourne) were in lockdown for over 100 days, each in the second half of 2021 alone. With the extensive lockdowns, it’s not a surprise that 48% of respondents reported challenges with maximising remote workers’ productivity without exposing them or their devices to new risks.

With 94% of organisations in Australia having reported they will be implementing a combination of return to office and work from home, building an effective and uniform security approach can be quite challenging. If you combine that with the fact that 62% saw an increase in security incidents over the last year, we can safely assume IT and cybersecurity decision-makers and influencers in Australia have been working on improving their security posture over the last year, even though 40% of respondents indicated they struggled to secure the right level of funding for such projects.

Australia seems to be well advanced on the journey into implementing Zero Trust when compared to other four countries included in the report, with 45% of the organisations that have adopted Zero Trust starting their Zero Trust journey over the last one to four years. Australian organisations have always been known for fast cloud adoption, and even in the early 2010s Australians were already consuming IaaS quite heavily.

India

When compared to the other countries in the report, India has a very challenging environment when it comes to working from home, with Internet connectivity being inconsistent, even though there’s been significant improvement in internet speeds in the country, and problems like power outages regularly occurring in certain areas outside of city centres. Surprisingly, the biggest challenge reported by Indian organisations was that they could benefit from newer security functionality, which goes to show that legacy security approaches are still widely present in India. Likewise, 37% of the respondents reported that their access technologies are too complex, which supports the previous point that newer security functionality would be beneficial to the same organisations.

When asked about their concerns around the shift in how their users will access applications, one of the biggest concerns raised by 59% of the respondents was around applications being protected by VPN or IP address controls alone. This shows Zero Trust would fit really well with their IT strategy moving forward, as controls can now be applied to users and their devices.

Another interesting point to make, and where Zero Trust can be leveraged, is 65% of respondents saying internal IT and security staff shortage and cuts is a huge challenge. Most security technologies out there would require special skills to build, maintain and operate, and this is where simplifying access with the right Zero Trust approach could really help improve the productivity of those teams.

Japan

When we look at the results of the survey across all five countries, it’s fairly obvious that Japan didn’t seem to have quite the same challenges as the other countries when the pandemic started. Businesses continued to operate normally for most of 2020 and 2021, which would explain why the impact wasn’t in line with the other countries. Having said that, 51% of the respondents surveyed in Japan still reported they saw a moderate to significant impact in their IT security approach, which is still significant, even though lower than the other countries.

Japanese organisations also reported an increase in the number of security incidents, which supports the fact that even though the impact of the pandemic wasn’t as severe as in other countries, 45% of the respondents still reported an increase in security incidents, and 63% still had to make changes to their IT security procedures as a direct result of incidents.

Malaysia

Malaysia rated second highest (at 80%) in our report on the impact the pandemic has had on organisations’ IT security approach, and rated highest on both employees using their home networks and using personal devices for work, at 94% and 92% respectively. From a security perspective, that poses a significant impact to an organization’s security posture and increases the attack surface for an organisation substantially.

From a risk perspective, Malaysian organisations rated lack of management over employees’ devices pretty highly, with 65% of them expressing concerns over it. Other areas worth calling out were applications and data being exposed to the public Internet, and lack of visibility into staff activity inside applications.

With 57% of the respondents calling out an increase in security incidents when compared to the previous year, 89% of the respondents said they had to make significant changes to their IT security procedures due to either security incidents or attack attempts against their environments.

Singapore

In Singapore, 79% of IT and cybersecurity decision-makers and influencers reported that the pandemic has impacted their IT security approach, and two in five organisations said they could benefit from more modern security functionality as a direct result of the impact caused by the pandemic. 52% of the organisations also reported an increase in security incidents compared to 2020, with almost half having seen an increase in phishing attempts.

Singaporean organisations were also not immune to a significant increase in IT security spend as a direct result of the pandemic, with 62% of them having reported more investment in security. Some of the challenges these organisations were facing were related to applications being directly exposed to the public Internet, limited oversight on third party access and applications being protected by username and password only.

While Singapore is known for high speed home Internet, it was quite a surprise for me to see that 40% of organisations surveyed reported issues with latency or slow connectivity into applications via VPN. This goes to show that the problem of concentrating traffic into a single location can impact application performance even across relatively small geographies, and even if bandwidth is not necessarily a problem, like what happens in Singapore.

The work in IT security never stops

While there were distinct differences in each country around IT security posture and Zero Trust adoption, across Asia Pacific, the similarities are what stand out the most:

  • Cyberattacks continue to rise
  • Flexible work is here to stay
  • Skilled in-house IT security workers are a scarce resource
  • Need to educate stakeholders around Zero Trust

These challenges are not easy to tackle, add to these the required focus on improving employee experience, reducing operational complexities, better visibility into 3rd party activity, and tighter controls due to the increase in security incidents, and you’ve got a heck of a huge responsibility for IT.

And this is where Cloudflare comes in. Not only have we been helping our employees work security throughout the pandemic, we have also been helping organisations all over the globe streamline their IT security operations when it comes to users accessing applications through Cloudflare Access, or securing their activity on the Internet through our Secure Web Gateway services, which even includes controls around SaaS applications and browser isolation, all with the best possible user experience.

So come talk to us!

[$] Moving toward Qubes OS 4.1

Post Syndicated from original https://lwn.net/Articles/873255/rss

On October 11, the first release candidate for Qubes OS version 4.1 was announced. Qubes OS
is a security-oriented desktop operating system that uses multiple virtual
machines (VMs or “qubes“) to isolate
various types of functionality. The idea is to compartmentalize different
applications and operating-system subsystems to protect them from each
other and to limit access to the user’s data if an application is
compromised. Version 4.1 will bring several important enhancements to
help Qubes OS continue to live up to its motto: “A reasonably secure operating
system
“.

s/bash/zsh/g

Post Syndicated from arp242.net original https://www.arp242.net/why-zsh.html

You would expect this to work, no?

bash% echo $(( .1 + .2 ))
bash: .1 + .2 : syntax error: operand expected (error token is ".1 + .2 ")

Well, bash says no, but zsh just works:

zsh% echo $(( .1 + .2 ))
0.30000000000000004      # Well, "works" insofar IEEE-754 works.

There is simply no way you can do calculations with fractions in bash without
relying on bc, dc, or some hacks. Compared to simply being able to use a +
b
it’s ugly, slow, and difficult.

There are other pretty frustrating omissions in bash; NUL bytes is another fun
one:

zsh% x=$(printf 'N\x00L'); printf $x | xxd -g1 -c3
00000000: 4e 00 4c  N.L

bash% x=$(printf 'N\x00L'); printf $x | xxd -g1 -c3
bash: warning: command substitution: ignored null byte in input
00000000: 4e 4c     NL

It looks like bash added a warning recently-ish (4.4-patch 2); this one
bit me pretty hard a few years ago; back then it would just get silently
discarded without warning; I guess a warning is an “improvement” of sorts
(fixing it, however, would be an actual improvement[1]).

NUL bytes aren’t that uncommon, think of e.g. find -print0, xargs -0, etc.
That all works grand, right up to the point you try to assign it to a variable.
You can use NUL bytes for array assignments though, if you evoke the right
incantation:

bash% read -rad arr < <(find . -print0)

There are all sorts of edge-cases where you need to resort to read or
readarray rather than being able to just assign in. In zsh it’s just
arr=($(find . -print0)).

Don’t even think of doing something like:

img=$(curl https://example.com/image.png)
if [[ $cond ]]; then
    optpng <<<"$img" > out.png
else
    cat <<<"$img" > out.png
fi

Of course you can refactor this to avoid the variable (and the example is a bit
contrived), but it really ought to work. I once wrote a script to import emails
from the Mailgun API. It worked great, yet sometimes images were mangled and I
just couldn’t figure out why. Turns out Mailgun “helpfully” decodes attachments
(i.e. removes the base64) and sends binary content, which bash (at the time)
would silently discard. It took me a very long time to figure out. I forgot why,
but it was hard to avoid storing the response in a variable. I ended up
rewriting it to Python because of this, which was just a waste of time really.
It’s this, specifically, that really soured me on bash and prompted The shell
scripting trap
in 2017. However, many items listed there are solved by
zsh, including this one.


zsh also fixes most of the quoting:

zsh% for f in *; ls -l $f
-rw-rw-r-- 1 martin martin 0 Oct 19 06:51 asd.txt
-rw-rw-r-- 1 martin martin 0 Oct 19 06:51 with space.txt

bash% for f in *; do ls -l $f; done
-rw-rw-r-- 1 martin martin 0 Oct 19 06:51 asd.txt
ls: cannot access 'with': No such file or directory
ls: cannot access 'space.txt': No such file or directory

It’s not POSIX compatible, but who cares? bash doesn’t follow POSIX in all
sorts of ways by default
either because it just makes more sense, and with
both you can still tell them to be POSIX-compatible if you must for one reason
or the other.

Also note the convenient short version of the
for loop: no need for do, done and muckery with ; before the done,
which is much more convenient for quick one-liners you interactively type in.
You can still do word splitting, but you need to do it explicitly:

zsh% for i in *; ls -l $=i
-rw-rw-r-- 1 martin martin 0 Oct 19 06:51 asd.txt
ls: cannot access 'with': No such file or directory
ls: cannot access 'space.txt': No such file or directory

[[ is supposed to fix [, but it still has weird quoting quirks:

zsh% a=foo; b=*;
zsh% if [[ $a = $b ]]; then
       print 'Equal!'
     else
       print 'Nope!'
     fi
Nope!

bash% a=foo; b=*
bash% if [[ $a = $b ]]; then
        echo 'Equal!'
      else
        echo 'Nope!'
      fi
Equal!

This is equal because without quotes it still interprets the right-hand side as
a pattern (i.e. glob). In zsh you need to use $~var to explicitly enable
pattern matching, which is a much better model than remembering when you do and
don’t need to quote things – sometimes you do want the pattern matching and
then you don’t want quotes; it’s not always immediately obvious if an if [[ ...
statement is correct if it’s lacking quotes.

“But Martin, you should always quote things, you’re being disingenuous!” Well, I
could make a comfortable living if I were paid to add quotes to other people’s
shell scripts. Telling people to “always quote things” is what we’ve been doing
for 40 years now and irrefutable observational evidence has demonstrated that it
just does not work.

Most people aren’t shell scripting wizards; they make a living writing Python or
C or Go or PHP programs, or maybe they’re sysadmins or scientists, and oh,
occasionally they also write a shell script. They just see something that works
and assume it has sane behaviour and don’t realize the subtle differences
between $@, $*, and "$@". I think that’s actually quite reasonable,
because the behaviour is odd, surprising, and confusing.

It’s also a lot more complex than just “quote your variables”, especially if you
use $(..) since command substitution often needs quotes too, as well as any
variables inside it. Before you know it you’ve got double, triple, or more
levels of nested quotes and if you forget one set of them you’re in trouble.

It’s such a fertile source of real-world bugs that it would merit entomologist
examination. If there’s a system that causes this many bugs then that system is
at fault, and not the people using it. Computers and software should adapt to
humans, not the other way around.

And “always quote things!” isn’t even right either, because you should always
quote things except when you shouldn’t:

zsh% a=foo; b=.*;
zsh% if [[ "$a" =~ "$b" ]]; then
       print 'Equal!'
     else
       print 'Nope!'
     fi
Equal!

bash% a=foo; b=.*
bash% if [[ "$a" =~ "$b" ]]; then
        echo 'Equal!'
      else
        echo 'Nope!'
      fi
Nope!

If there are quotes around a regexp then it’s treated as a literal string. I
mean, it’s consistent with = pattern matching, but also confusing because I
explicitly use =~ to match a regular expression.

Another famous quoting trap is $@ vs. "$@" vs. $* vs. "$*":

zsh% cat args
echo "unquoted @:"
for a in $@; do echo "  => $a"; done

echo "quoted @:"
for a in "$@"; do echo "  => $a"; done

echo "quoted *:"
for a in $*; do echo "  => $a"; done

echo "quoted *:"
for a in "$*"; do echo "  => $a"; done

bash% bash args
unquoted @:
  => hello
  => world
  => test
  => space
  => Guust1129.jpg
  => IEEESTD.2019.8766229.pdf
  [.. rest of my $HOME ..]
quoted @:
  => hello
  => world
  => test space
  => *
unquoted *:
  => hello
  => world
  => test
  => space
  => Guust1129.jpg
  => IEEESTD.2019.8766229.pdf
  [.. rest of my $HOME ..]
quoted *:
  => hello world test space *

Experiences shellers will know to (almost) always use "$@", but how often do
you see it being done wrong? It’s not that strange people do it wrong either; if
you learned about quoting and word splitting then $@ without quotes is
actually the logical thing to use because you would expect "$@" to be
treated as one argument (as "$*"). You tell people to “always add quotes to
prevent word splitting and treat things as a single argument”, and then you tell
them “oh, except in this one special case when the addition of quotes invokes
a special kind of splitting and doesn’t follow any of the rules we previously
told you about”.

In zsh, $@ and $* (and $argv) are all (read-only) arrays and it all
behaves identical as you would expect with no surprises:

zsh% zsh args hello world 'test space' '*'
unquoted @:
  => hello
  => world
  => test space
  => *
quoted @:
  => hello
  => world
  => test space
  => *
unquoted *:
  => hello
  => world
  => test space
  => *
quoted *:
  => hello world test space *

Actually in bash you can do argv=("$@") and then you have an array. This is
really how it should work by default.

You still need to loop over it with:

bash% for a in "${argv[@]}"; do
        echo "=> $a"
      done

Rather than just for a in $argv like in zsh. Aside from the pointless [@],
why would you ever want to word-split every element of an array? There is
probably some use case somewhere, but it’s exceedingly rare. Better to just skip
all of that by default unless explicitly enabled with = and/or ~.

Oh, here’s another interesting tidbit:

zsh% n=3; for i in {1..$n}; print $i
1
2
3

bash% n=3; for i in {1..$n}; do echo "$i"; done
{1..3}

bash% n=3; for i in {1..3}; do echo "$i"; done
1
2
3

Why does it work like that? That’s left as an exercise for the reader 😉


Aside from all sorts caveats that are handled much better, a lot of common
things are just much easier in zsh:

zsh% arr=(123 5 1 9)
zsh% echo ${(o)arr}     # Lexical order
1 123 5 9
zsh% echo ${(on)arr}    # Numeric order
1 5 9 123

bash% IFS=$'\n'; echo "$(sort <<<"${arr[*]}")"; unset IFS
1 123 5 9
bash% IFS=$'\n'; echo "$(sort -n <<<"${arr[*]}")"; unset IFS
1 5 9 123

I had to look up how to do it in bash; the Stack Overflow answer for that one
starts with “you don’t really need all that much code”. lol? I guess that’s in
reply to some of the other horrendously complex answers which implement “pure
bash” sorting algorithms and the like. I guess compared to that this is “not
that much code”. And of course the entire thing is a minefield of expansion
again; and if you forget a set of nested quotes you end up in trouble.

Arrays in general are just awkward in bash:

bash% arr=(first second third fourth)

bash% echo ${arr[0]}
first
bash% echo ${arr[@]::2}
first second

I mean, it works, and it’s not that much typing, but why do I need that [@]?
Probably some (historical) reason, but zsh implements it much more readable and
easier:

zsh% arr=(first second third fourth)

zsh% print ${arr[1]}        # Yeah, it's 1-based. Deal with it.
first
zsh% print ${arr[1,2]}
first second

The bash array syntax was copied from ksh; so I guess we have to blame David
Korn (zsh supports it too, if you must use it). But regular subscripts are just
so much easier.

And then there’s the useful features:

zsh% ls *.go
format.go  format_test.go  gen.go  old.go  uni.go  uni_test.go

zsh% ls *.go~*_test.go
format.go  gen.go  old.go  uni.go

zsh% ls *.go~*_test.go~f*
gen.go  old.go  uni.go

*.go gets expanded, and filters the pattern after the ~; *_test.go in this
case. Looks a bit obscure at first glance, but bash’s ksh-style extglobs are far
harder:

bash% ls !(*_test).go
format.go  gen.go  old.go  uni.go

bash% ls !(*_test|f*).go
gen.go  old.go  uni.go

!(..) is “match anything except the pattern”; the * is implied here (zsh
supports !(..) if you set ksh_glob). While it works, the the
pattern~filter~filter model is much easier, and also more flexible since you
don’t need to start with all matches.

There are many useful things you can do with globbing; you can replace many uses
of find with it, and you don’t need to worry about the caveats, -print0
hacks, etc. For example to recursively list all regular files:

zsh% ls **/*(.)
LICENSE         go.sum           unidata/gen.go             wasm/make*
README.md       old.go           unidata/gen_codepoints.go  wasm/srv*
[..]

Or directories:

zsh% ls -d /etc/**/*(/)
/etc/OpenCL/                      /etc/runit/runsvdir/default/dnscrypt-proxy/log/
/etc/OpenCL/vendors/              /etc/runit/runsvdir/default/ntpd/log/
/etc/X11/                         /etc/runit/runsvdir/default/postgresql/supervise/
[..]

Or files that were changed in the last week:

zsh% ls -l /etc/***(.m-7)    # *** is a shortcut for **/*; needs GLOB_STAR_SHORT
-rw-r--r-- 1 root root 28099 Oct 13 03:47 /etc/dnscrypt-proxy.toml.new-2.1.1_1
-rw-r--r-- 1 root root    97 Oct 13 03:47 /etc/environment
-rw-r--r-- 1 root root 37109 Oct 17 10:34 /etc/ld.so.cache
-rw-r--r-- 1 root root 77941 Oct 19 01:01 /etc/public-resolvers.md
-rw-r--r-- 1 root root  6011 Oct 19 01:01 /etc/relays.md
-rw-r--r-- 1 root root   142 Oct 19 07:57 /etc/shells

You can even order them by modified date with om (***(.m-7om)), although
that’s a bit pointless here as ls will reorder them again, but if you’re
looping over files it’s useful.

There is no way to do any of this in bash, you’ll have to use something like:

bash% find /etc -type f -mtime 7 -exec ls -l {} +
find: ‘/etc/sv/docker/supervise’: Permission denied
find: ‘/etc/sv/docker/log/supervise’: Permission denied
find: ‘/etc/sv/bluetoothd/log/supervise’: Permission denied
find: ‘/etc/sv/postgresql/supervise’: Permission denied
find: ‘/etc/sv/runsvdir-martin/supervise’: Permission denied
find: ‘/etc/wpa_supplicant/supervise’: Permission denied
find: ‘/etc/lvm/cache’: Permission denied
-rw-rw-r-- 1 root root  167 Oct 12 22:17 /etc/default/postgresql
-rw-r--r-- 1 root root  817 Oct 12 09:11 /etc/fstab
-rw-r--r-- 1 root root 1398 Oct 12 22:19 /etc/passwd
-rw-r--r-- 1 root root 1397 Oct 12 22:19 /etc/passwd.OLD
-rw-r--r-- 1 root root  307 Oct 12 23:10 /etc/public-resolvers.md.minisig
-rw-r--r-- 1 root root  297 Oct 12 23:10 /etc/relays.md.minisig
-r-------- 1 root root  932 Oct 12 09:57 /etc/shadow
-rwxrwxr-x 1 root root  397 Oct 12 22:23 /etc/sv/postgresql/run

Not sure how to make it ignore these errors too without redirecting stderr (more
typing!) And if you think adding single letters in (..) after a pattern is
hard then try understanding find’s weird flag syntax. Glob qualifies are
great.

csh-style parameter substitution is pretty useful:

zsh% for f in ~/photos/*.png; convert $f ${f:t:r}.jpeg

:t to get the tail, and :r to get the root (without extension). csh could do
this before I was even born, but bash can’t (it can for history expansion, but
not variables). According to the bash FAQ “Posix has specified a more powerful,
albeit somewhat more cryptic, mechanism cribbed from ksh”, which I find a
somewhat curious statement as the above in bash is:

bash% for f in ~/photos/*.png; do convert "$f" "$(basename "${f%%.*}")"; done

Technically “more powerful” in the sense that you can do other things with it,
but not really “more useful for common operations” (zsh, of course, implements
% and # as well).

Note you can’t nest ${..} in bash; e.g. "${${f%%.*}##*/}" is an error:

zsh% f=~/asd.png; print "${${f%%.*}##*/}"
asd

bash% f=~/a.png; echo "${${f%%.*}##*/}"
bash: ${${f%%.*}##*/}: bad substitution

While this can quickly lead to very unreadable ASCII vomit, it’s useful on
occasion, when used with care and wisdom. You can click below for a more
advanced example if the children are already in bed.

Click to see NSFW content. Not suitable for children under 18!

For example, this can be used to show the longest element in an array:

print ${array[(r)${(l.${#${(O@)array//?/X}[1]}..?.)}]}

Cribbed from the zsh User’s Guide.


There are many more things. I’m not going to list them all here. None of this is
new; much (if not all?) of this has around for 20 years, if not longer. I don’t
know why bash is the de-facto default, or why people spend time on complex
solutions to work around bash problems when zsh solves them. I guess because
Linux used a lot of GNU stuff and bash was came with it, and GNU stuff was (and
is) using bash. Not a very good reason, certainly not one 30 years later.

zsh still has plenty of limitations; the syntax isn’t always something you’d
want to show your mother for starters, as well as a number of other things.
Still, it’s clearly better. I genuinely can’t find a single thing bash does
better beyond “it’s installed on many systems already”.

Ubiquitousness is overrated anyway; zsh has no dependencies beyond libc and
curses and is 970K on my system[2] and is available for pretty much all
systems. Compared to most other interpreters it’s tiny, with only Lua being
smaller (275K). “Stick to POSIX sh for compatibility” was good advice in 1990
when you had some SunOS system with some sun-sh and that’s what you were stuck
with. Those days are long gone, and while there are a few poor souls still suck
on those systems (sometimes even with csh!) chances are they’re not going to try
and run your Docker or Arch Linux shell script or whatnot on those systems
anyway.

Contorting yourself in all sorts of strange bends to perhaps possibly maybe make
it work for a tiny fraction of users who are unlikely to use your script anyway
does not seem like a good trade-off, especially since these kind of limitations
tend to be organisational rather than technical, which is not my problem anyway
to be honest.

Using zsh is also more portable; since it allows you to avoid many shell tools
and the (potential) incompatibilities, and by explicitly setting zsh as the
interpreter you can rely on zsh behaviour, rather than hoping the /bin/sh on
$random_system behaves the same (even dash has some extensions to POSIX sh, such
as local).

What I typically do is save files as script.zsh with:

#!/usr/bin/env zsh
[ "${ZSH_VERSION:-}" = "" ] && echo >&2 "Only works with zsh" && exit 1

This makes sure it gets run by zsh when used as ./script.zsh, and gives an
error in case people type sh script.zsh or bash script.zsh in case the .zsh
extension isn’t enough of a clue.

So in conclusion: s/bash/zsh/g and everything will be just a little bit
better.

P.S. maybe fish is even better by the way, but I could never get over the bright
colouring and all these things popping in and out of my screen; it’s such a
“busy” chaotic experience! I’d have to spend time disabling all that stuff to
make it usable for me, and I never bothered – and if you’re disabling the key
selling points of a program then it’s probably not intended for you anyway.
Maybe I’ll have a crack at it at some point though.

Footnotes

  1. Which I assume isn’t so easy, otherwise it would have been done already.
    The reason it doesn’t work is an artifact from C’s NUL-terminated strings,
    but this kind of stuff really shouldn’t be exposed in a high-level language
    like the shell. It’s also a bit ironic since one of Stephen Bourne’s
    original goals with his shell was to get rid of arbitrary size limits on
    strings, which were common at the time. 

  2. A full install if ~8M, mostly in the optional completion functions it
    ships with. Bash is about 1.3M by the way. 

New – AWS Data Exchange for Amazon Redshift

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-aws-data-exchange-for-amazon-redshift/

Back in 2019 I told you about AWS Data Exchange and showed you how to Find, Subscribe To, and Use Data Products. Today, you can choose from over 3600 data products in ten categories:

In my introductory post I showed you how could subscribe to data products and then download the data sets into an Amazon Simple Storage Service (Amazon S3) bucket. I then suggested various options for further processing, including AWS Lambda functions, a AWS Glue crawler, or an Amazon Athena query.

Today we are making it even easier for you to find, subscribe to, and use third-party data with the introduction of AWS Data Exchange for Amazon Redshift. As a subscriber, you can directly use data from providers without any further processing, and no need for an Extract Transform Load (ETL) process. Because you don’t have to do any processing, the data is always current and can be used directly in your Amazon Redshift queries. AWS Data Exchange for Amazon Redshift takes care of managing all entitlements and payments for you, with all charges billed to your AWS account.

As a provider, you now have a new way to license your data and make it available to your customers.

As I was writing this post, it was cool to realize just how many existing aspects of Redshift, and Data Exchange played central roles. Because Redshift has a clean separation of storage and compute, along with built-in data sharing features, the data provider allocates and pays for storage, and the data subscriber does the same for compute. The provider does not need to scale their cluster in proportion to the size of their subscriber base, and can focus on acquiring and providing data.

Let’s take a look at this feature from two vantage points: subscribing to a data product, and publishing a data product.

AWS Data Exchange for Amazon Redshift – Subscribing to a Data Product
As a data subscriber I can browse through the AWS Data Exchange catalog and find data products that are relevant to my business, and subscribe to them.

Data providers can also create private offers and extend them to me for access via the AWS Data Exchange Console. I click My product offers, and review the offers that have been extended to me. I click on Continue to subscribe to proceed:

Then I complete my subscription by reviewing the offer and the subscription terms, noting the data sets that I will get, and clicking Subscribe:

Once the subscription is completed, I am notified and can move forward:

From the Redshift Console, I click Datashares, select From other accounts, and I can see the subscribed data set:

Next, I associate it with one or more of my Redshift clusters by creating a database that points to the subscribed datashare, and use the tables, views, and stored procedures to power my Redshift queries and my applications.

AWS Data Exchange for Amazon Redshift – Publishing a Data Product
As a data provider I can include Redshift tables, views, schemas and user-defined functions in my AWS Data Exchange product. To keep things simple, I’ll create a product that includes just one Redshift table.

I use the spiffy new Redshift Query Editor V2 to create a table that maps US area codes to a city and a state:

Then I examine the list of existing datashares for my Redshift cluster, and click Create datashare to make a new one:

Next, I go through the usual process for creating a datashare. I select AWS Data Exchange datashare, assign a name (area_code_reference), pick the database within the cluster, and make the datashare accessible to publicly accessible clusters:

Then I scroll down and click Add to move forward:

I choose my schema (public), opt to include only tables and views in my datashare, and then add the area_codes table:

At this point I can click Add to wrap up, or Add and repeat to make a more complex product that contains additional objects.

I confirm that the datashare contains the table, and click Create datashare to move forward:

Now I am ready to start publishing my data! I visit the AWS Data Exchange Console, expand the navigation on the left, and click Owned data sets:

I review the Data set creation steps, and click Create data set to proceed:

I select Amazon Redshift datashare, give my data set a name (United States Area Codes), enter a description, and click Create data set to proceed:

I create a revision called v1:

I select my datashare and click Add datashare(s):

Then I finalize the revision:

I showed you how to create a datashare and a dataset, and to publish a product using the console. If you are publishing multiple products and/or making regular revisions, you can automate all of these steps using the AWS Command Line Interface (CLI) and the Amazon Data Exchange APIs.

Initial Data Products
Multiple data providers are working to make their data products available to you through AWS Data Exchange for Amazon Redshift. Here are some of the initial offerings and the official descriptions:

  • FactSet Supply Chain Relationships – FactSet Revere Supply Chain Relationships data is built to expose business relationship interconnections among companies globally. This feed provides access to the complex networks of companies’ key customers, suppliers, competitors, and strategic partners, collected from annual filings, investor presentations, and press releases.
  • Foursquare Places 2021: New York City Sample – This trial dataset contains Foursquare’ss integrated Places (POI) database for New York City, accessible as a Redshift Data Share. Instantly load Foursquare’s Places data in to a Redshift table for further processing and analysis. Foursquare data is privacy-compliant, uniquely sourced, and trusted by top enterprises like Uber, Samsung, and Apple.
  • Mathematica Medicare Pilot Dataset – Aggregate Medicare HCC counts and prevalence by state, county, payer, and filtered to the diabetic population from 2017 to 2019.
  • COVID-19 Vaccination in Canada – This listing contains sample datasets for COVID-19 Vaccination in Canada data.
  • Revelio Labs Workforce Composition and Trends Data (Trial data) – Understand the workforce composition and trends of any company.
  • Facteus – US Card Consumer Payment – CPG Backtest – Historical sample from panel of SKU-level transaction detail from cash and card transactions across hundreds of Consumer-Packaged Goods sold at over 9,000 urban convenience stores and bodegas across the U.S.
  • Decadata Argo Supply Chain Trial Data – Supply chain data for CPG firms delivering products to US Grocery Retailers.

Jeff;

Replace traditional email mailbox polling with real-time reads using Amazon SES and Lambda

Post Syndicated from agardezi original https://aws.amazon.com/blogs/messaging-and-targeting/replace-traditional-email-mailbox-polling-with-real-time-reads-using-amazon-ses-and-lambda/

Integrating emails into an automated workflow for automated processing can be challenging. Traditionally, applications have had to use the POP protocol to connect to mail servers and poll for emails to arrive in a mailbox and then process the messages inline and perform actions on the message. This can be an inefficient mechanism and prone to errors that result in the workflow missing messages. Since this method requires polling it’s not great if you need real-time processing of messages and introduces inefficiencies in the design. Amazon Simple Email Service (Amazon SES) is a cost effective, scalable and flexible email service with support for different workflows including the ability to perform spam checks and virus scans. In this blog you will see how to use Amazon SES with AWS Lambda and Amazon S3 in order to automate the processing of emails in real time and integrate with an application without the need for polling.

The use case explored in this blog focuses on automation for CRM or order processing platforms and processing of email related to customer contact or direct email requests. An example of this use case is copying a client engagement email to Salesforce (or any other database) where it is recorded and can later be categorized or attached to the appropriate client account or opportunity. When designing an application that needs to read emails from a mailbox, a developer would traditionally have to use a mail library (like JavaMail if using Java) to make a call to the mailbox, authenticate and then pull messages into an application object. This would mean polling the mailbox every 10 – 15 minutes to check for new messages, handle errors when the mailbox is unavailable and maintaining a fully functioning mailbox. This solution can help you implement automated processing of emails arriving in a mailbox without the need to poll the mailbox. The entire solution will be implemented in a serverless fashion.

Solution

This blog post shows how to use SES to perform automated processing of email in an application workflow. I will use the option in SES to save received emails in S3 and trigger a Lambda function to process the message without having to poll a mailbox. This sample application demo is using email to receive simple orders which get automatically processed and the details stored in DynamoDB. The following diagram shows the high-level architecture:

Step 1: Create an S3 Bucket for Email Storage

Start by creating an S3 bucket where received emails will be stored in order for the full email to be processed by the lambda. The bucket must have a policy attached so SES can put objects in the bucket on your behalf:

{
  "Version":"2012-10-17",
  "Statement":[
    {
      "Sid":"AllowSESPuts",
      "Effect":"Allow",
      "Principal":{
        "Service":"ses.amazonaws.com"
      },
      "Action":"s3:PutObject",
      "Resource":"arn:aws:s3:::myBucket/*",
      "Condition":{
        "StringEquals":{
          "aws:Referer":"111122223333"
        }
      }
    }
  ]
}

Make the following changes to the preceding policy example:

  1. Replace myBucket with the name of the Amazon S3 bucket that you want to write to.
  2. Replace 111122223333 with your AWS account ID.

You can find out more about the policy here.

Step 2: Create DynamoDB Table to Simulate Application

Next, add a DynamoDB table. The DynamoDB table will store the incoming order info. For this sample we will keep it simple and have a table with email as a partition key. Here is the data model:

{   
    "email_order_received": {
        "email": "string",
        "itemname": "string",
        "quantity": "number"
    }   
}

Step 3: Create Lambda Function triggered by SES to Process Email

Now the DynamoDB table is ready, create the Lambda function to process the email and send data to the DynamoDB table. The lambda function needs an execution role that has permissions to access the S3 bucket, the DynamoDB table and create the CloudWatch log group. It also needs a Resource-based Policy so SES can invoke the Lambda function. In the final step when we configure SES to call the lambda, SES automatically adds the necessary permissions to the function as detailed here.  This is a sample policy statement:

{
  "Version": "2012-10-17",
  "Id": "default",
  "Statement": [
    {
      "Sid": "allowSesInvoke",
      "Effect": "Allow",
      "Principal": {
        "Service": "ses.amazonaws.com"
      },
      "Action": "lambda:InvokeFunction",
      "Resource": "arn:aws:lambda:eu-west-1:111122223333:function:email-event-ses",
      "Condition": {
        "StringEquals": {
          "AWS:SourceAccount": "111122223333"
        }
      }
    }
  ]
}

Sample Lambda code in python:

import boto3
import email


def lambda_handler(event, context):
    s3 = boto3.client("s3")
    dynamodb = boto3.resource("dynamodb")
    table = dynamodb.Table('email_order_received')
    
    print("Spam filter")
    # Check the SES spam and virus filter settings
    if (
        event["Records"][0]["ses"]["receipt"]["spfVerdict"]["status"] == "FAIL" or
        event["Records"][0]["ses"]["receipt"]["dkimVerdict"]["status"] == "FAIL" or
        event["Records"][0]["ses"]["receipt"]["spamVerdict"]["status"] == "FAIL" or
        event["Records"][0]["ses"]["receipt"]["virusVerdict"]["status"] == "FAIL"
       ):
        print("Dropping Spam")
    else:
        print("Not Spam")
        email_bucket = "email-handling-test"
        bucketkey = "monitor/" + event["Records"][0]["ses"]["mail"]["messageId"]
    
        fileObj = s3.get_object(Bucket = email_bucket, Key=bucketkey)
    
        msg = email.message_from_bytes(fileObj['Body'].read())
        From = msg['From']
        itemname = msg['Subject']
        body = ""
        if msg.is_multipart():
            for part in msg.walk():
                type = part.get_content_type()
                disp = str(part.get('Content-Disposition'))
                # look for plain text parts, but skip attachments
                if type == 'text/plain' and 'attachment' not in disp:
                    charset = part.get_content_charset()
                    # decode the base64 unicode bytestring into plain text
                    body = part.get_payload(decode=True).decode(encoding=charset, errors="ignore")
                    # if we've found the plain/text part, stop looping thru the parts
                    break
        else:
            # not multipart - i.e. plain text, no attachments
            charset = msg.get_content_charset()
            body = msg.get_payload(decode=True).decode(encoding=charset, errors="ignore")
            
        table.put_item(
            Item={
                'email': From,
                'itemname': itemname,
                'quantity': body
            }
        )
        print("inserted data into dynamodb")

When you add a Lambda action to a receipt rule, Amazon SES sends an event record to Lambda every time it receives an incoming message. This event contains information about the email headers for the incoming message, as well as the results of tests (spam filtering and virus scanning) that Amazon SES performs on incoming messages, however it omits the body of the incoming email. This is why the lambda has to process the body form the email stored in S3. You can see details of the event here. In this demo app we assume the item name is in the subject and the body of the email has the quantity of the items and this data is written to the DynamoDB table.

Step 4: Configure SES to Send Emails to S3 and Trigger Lambda Function

The final step is to configure Amazon SES. Start by verifying a domain so SES can use it to send and receive emails. Domain verification helps ensure you are the owner of the domain and are thus authorised to manage the sending and receiving of the emails from addresses in the domain. To verify your domain:

  1. In the SES console in the navigation pane under Identity Management, choose Domains.
  2. Choose Verify new Domain
  3. In the Verify new Domain dialog enter your domain name
  4. Choose Verify This Domain
  5. In the dialogue box you will see a Domain verification record set. You need to add this record to your domain DNS server. You will also have to add the email receiving record (MX Record) to you domain DNS server.
  6. If your DNS server is Route53 and it is registered under the same account then SES also gives you the option to update your DNS server from within the SES console.

Once the domain is verified its status goes from “pending verification” to “verified” and now it can used it to send and receive emails.

Next, create a recipient rule set. The Rule Set lets you specify what SES does with emails it receives on domains you own. You can create rules for individual addresses or any address under the domain. To create the Rule Set:

  1. In the left navigation pane, under Email Receiving, choose Rule Sets.
  2. Choose Create Rule.
  3. Enter the recipient email address you want to configure the rule for. You can add up to a maximum of 100 recipient addresses or just set it up for any address in the domain using just the domain name as a wildcard.
  4. Once the addresses have been added, add the actions for the rule. Add two actions:
    1. First one is of type S3, this is to save a copy of the email to the S3 bucket created in step 1. Select the bucket name created in step 1 from the drop-down list. You can add a prefix to the filename as well to categorise the output of different rules.
    2. Second is of type Lambda to trigger the lambda for processing the email. Select the lambda created in step 3 from the drop-down list.

Once the SES Rule is configured, we have the full workflow in place. Now any email sent to the [email protected] address will be processed by the Lambda. In this way you can configure email processing to be part of your application workflow without having to perform polling.

Clean-up

To clean up the resources used in your account:

  1. Navigate to Amazon S3 and delete the contents of the bucket you created where your emails are stored.
  2. Once the bucket is empty, delete the bucket.
  3. Navigate to the DynamoDB console and delete the table you created above. Make sure you select the option to “Delete all CloudWatch alarms for this table”
  4. Remove the domain from Amazon SES. To do this, navigate to the Amazon SES Console and choose Domains from the left navigation. Select the domain you want to remove and choose Remove button to remove it from Amazon SES.
  5. From the Amazon SES Console, navigate to the Rule Sets from the left navigation. On the Active Rule Set section, choose View Active Rule Set button and delete all the rules you have created, by selecting the rule and choosing Action, Delete.
  6. On the Rule Sets page choose Disable Active Rule Set button to disable listening for incoming email messages.
  7. On the Rule Sets page, Inactive Rule Sets section, delete the only rule set, by selecting the rule set and choosing Action, Delete.
  8. Navigate to the Lambda console and delete the Lambda you created earlier. Select the Lambda and choose Delete from the Actions menu.
  9. Navigate to CloudWatch console and from the left navigation choose Logs, Log groups. Find the log group that belongs to the resources and delete it by selecting it and choosing Actions, Delete log group(s).

Conclusion

In this post, we have shown you how to integrate email processing into an application workflow without having to resort to polling a mail box.

By using SES to receive emails you can create a modular serverless architecture that allows emails to be processed and checked for spam plus viruses and the output can then be sent to any downstream system or stored in a database for application use.


About the Author

Syed Ali Abbas Gardezi is a Sr. Solution Architect for AWS based in London, United Kingdom. He works with AWS GSI Partners architecting, designing and implementing various large-scale IT solution. Before joining AWS he worked in several Architecture roles in a tier 1 financial organisation in London.

Designing a High-volume Streaming Data Ingestion Platform Natively on AWS

Post Syndicated from Soonam Jose original https://aws.amazon.com/blogs/architecture/designing-a-high-volume-streaming-data-ingestion-platform-natively-on-aws/

The total global data storage is projected to exceed 200 zettabytes by 2025. This exponential growth of data demands increased vigilance against cybercrimes. Emerging cybersecurity trends include increasing service attacks, ransomware, and critical infrastructure threats. Businesses are changing how they approach cybersecurity and are looking for new ways to tackle these threats. In the past, they have relied on internal IT or engaged a managed security services provider (MSSP) to monitor and prevent unauthorized access and attacks.

An end-to-end analytics solution should ingest and process log data streaming from various computing and IoT devices. It can then make processed data available to analytics systems users in near-real-time. However, the sheer volume of data in the future makes this difficult to address in a reliable and cost-effective manner.

In this blog post, we present three approaches for a high-volume log data ingestion and processing platform natively on Amazon Web Services (AWS). We also compare the pros and cons of each. We’ll discuss factors to consider when evaluating the different options and their associated flexibility, to take full advantage of AWS. We will showcase a fictional use case for a top MSSP who ingests high volumes of logs from security devices to cloud. This MSSP also performs downstream analytics and threat detection modeling.

The options we present here have a log collection platform (LCP) on-premises. It collects logs from security devices and sensors, performs necessary translations and tokenization, and pushes compressed log files to the processing tier on cloud. The collection platform can also be modernized to have the IoT-enabled devices send logs to AWS IoT services. This will push the data to Amazon Kinesis, a managed service for collecting and analyzing streaming data.

Approach 1: Amazon Kinesis for log ingestion and format conversion

Figure 1 illustrates a comprehensive solution that uses managed and serverless services on AWS.

Figure 1. Amazon Kinesis for log ingestion and format conversion

Figure 1. Amazon Kinesis for log ingestion and format conversion

1. LCP will invoke a scalable producer application for Amazon Kinesis Data Streams running on AWS Fargate behind an Application Load Balancer. The producer application will use the Amazon Kinesis Producer Library (KPL). KPL aggregates and batches data records to make ingestion into the data stream more efficient. The application may provide compressed records to the KPL to have it manage object compression.

The application can be set up as an HTTP endpoint that receives log files and processes them using KPL. Customer ID sent as part of an HTTP request header can be used to maintain affinity. The application can run in a Docker container, which is orchestrated by Amazon ECS on AWS Fargate. A target tracking scaling policy can manage the number of parallel running data ingestion containers to manage scalability of the ingestion process.

2. Amazon Kinesis Scaling Utility can be used to scale data streams up or down by a count, or as a percentage of the total fleet. The scaling utility archive file can be imported as a library to AWS Lambda. It will automatically manage the number of shards in the stream based on the observed PUT or GET rate of the stream. The combination of customer ID and security device ID may be used to define the partition key.

3. Records uploaded to the stream by the producer application will be consumed by Lambda. It will perform gateway transformations (required by all downstream consumers) and the normalization of record format. Any additional consumer level transformations may be handled separately, associated with respective consumers.

A combination of batch window and batch size configurations can improve efficiency of function invocations. Batch windows are the maximum amount of time in seconds to gather records before invoking the function. Batch size is the number of records to send to the function in each batch. The Lambda function will throttle sending records to Amazon Kinesis Data Firehose. Error handling will be accomplished via retries with a smaller batch size, with number of retries limited as appropriate. It will discard records that are too old.

An Amazon Simple Queue Service (SQS) queue can be configured as a failed-event destination for further offline analysis. A Lambda function can read from the error SQS queue to do basic checks and determine appropriate follow-up actions. This can be an initiated email for additional investigation or a command to discard the message.

4. Output of transformations by Lambda will be saved to the short term (hot) storage Amazon S3 bucket via Kinesis Data Firehose. This can efficiently handle Parquet format conversion required by downstream analytics applications. Kinesis Data Firehose delivery streams will be created per customer and configured with associated AWS Glue Data Catalog table, to perform parquet format conversion.

5. AWS Glue jobs will be used to consolidate and write larger files to the long term (cold) storage bucket.

6. The data in the cold storage bucket will be accessed by internal SOC analysts for threat detection and mitigation.

7. The data in cold storage buckets will also be accessed by end customers via dashboards in Amazon QuickSight.

8. This architecture also provides additional options to modernize streaming analytics using Amazon Kinesis Data Analytics or AWS Glue streaming jobs as appropriate.

While this architecture proposes a fully managed, end-to-end solution, the sheer volume of log messages may drive up the total cost of the solution. This is especially true for Kinesis Data Streams and Kinesis Data Firehose costs.

Approach 2: Containerized application on AWS Fargate for ingestion and Amazon Kinesis for format conversion

An alternative approach shown in Figure 2 replaces the gateway Kinesis Data Streams and transformations, with a containerized application on Fargate. Conversion to Parquet format and writing to the S3 bucket is still handled by Kinesis Data Firehose.

Figure 2. Containerized application for ingestion and Amazon Kinesis for format conversion

Figure 2. Containerized application for ingestion and Amazon Kinesis for format conversion

1. LCP will upload log files to a raw storage bucket in Amazon S3.

2. A Lambda function will process Event Notifications from the raw data storage bucket. It can insert Amazon S3 object pointers to a Kinesis Data Stream partitioned by Customer ID and Device ID.

3. The producer application will retrieve the Event Notifications from the Data Stream and retrieve corresponding log files from S3. It will perform initial aggregations and transformations, and output to Kinesis Data Firehose. The application can run in a Docker container that is orchestrated by Amazon ECS on Fargate. A target tracking scaling policy can manage the number of parallel running data ingestion containers, to manage scalability of the ingestion process. ECS cluster capacity can be scaled up or down based on Amazon CloudWatch alarms.

4. Kinesis Data Firehose converts to Parquet format, zips the data, and persists to a short-term storage bucket in S3. This is backed by Glue Data Catalog.

Steps 5, 6 and 7 perform consolidation and availability of the processed data to downstream consumers, as in the previous approach.

This option uses the built-in capabilities of Kinesis Data Firehose to transform to Parquet format and deliver to S3. Note that higher costs associated with the service may still be cost prohibitive for larger data volumes.

Approach 3: Containerized application on AWS Fargate for ingestion and format conversion

Figure 3 uses a containerized application running on Fargate for both gateway transformations. This app also provides conversion to Parquet format before writing the files to a short term (hot) storage bucket. All the other steps are the same as in option 2.

Figure 3. Containerized application for ingestion and format conversion

Figure 3. Containerized application for ingestion and format conversion

This option offers the least expensive way to transform, aggregate, and enrich the incoming log records, as well as convert them to Parquet format. But it comes with additional overhead for custom development of format conversion, checkpointing, error handling, and application management. Evaluate based on your business needs and workflow.

Conclusion

In this post, we discussed multiple approaches to design a platform on AWS to ingest and process high-volume security log records. We compared the pros and cons for each option. Amazon Kinesis is a fully managed and scalable service that helps easily collect, process, and analyze video and data streams in real time. A solution primarily based on Kinesis may become cost prohibitive due to large data volumes. Consider alternate approaches that use containerized applications on AWS Fargate. The trade-off would be the ability for custom development versus application management overhead.

To improve your security log analysis solution, explore one of the approaches we illustrate and customize as appropriate to fit your unique needs.

SFC files suit against Vizio over GPL violations

Post Syndicated from original https://lwn.net/Articles/873338/rss

Software Freedom Conservancy has announced that it filed suit against TV maker Vizio over “repeated failures to fulfill even the basic requirements of the General Public License (GPL)“. The organization raised the problems with Vizio in August 2018, but the company stopped responding in January 2020, according to the announcement.

“We are asking the court to require Vizio to make good on its obligations under copyleft compliance requirements,” says [Software Freedom Conservancy executive director Karen[ Sandler. She explains that in past litigation, the plaintiffs have always been copyright holders of the specific GPL code. In this case, Software Freedom Conservancy hopes to demonstrate that it’s not just the copyright holders, but also the receivers of the licensed code who are entitled to rights.

The lawsuit suit seeks no monetary damages, but instead seeks access to the technical information that the copyleft licenses require Vizio to provide to all customers who purchase its TVs (specifically, the plaintiff is asking for the technical information via “specific performance” rather than “damages”).

The complaint is also available.

Software Freedom Conservancy files lawsuit against Vizio Inc.

Post Syndicated from original https://lwn.net/Articles/873336/rss

Software Freedom Conservancy announced
it has filed a lawsuit
against California TV manufacturer Vizio
Inc. for GPL violations.

The lawsuit alleges that Vizio’s TV products, built on its SmartCast
system, contain software that Vizio unfairly appropriated from a community
of developers who intended consumers to have very specific rights to
modify, improve, share, and reinstall modified versions of the software.

[…] Software Freedom Conservancy, a nonprofit organization focused on
ethical technology, is filing the lawsuit as the purchaser of a product
which has copylefted code. This approach makes it the first legal case that
focuses on the rights of individual consumers as third-party beneficiaries
of the GPL.

The collective thoughts of the interwebz